Description of problem: Kubernetes is not spreading pods out evenly across nodes, and is instead favoring putting pods on the same nodes, until those nodes become overloaded and unresponsive Version-Release number of selected component (if applicable): atomic-openshift-3.6.173.0.5-1.git.0.f30b99e.el7.x86_64 How reproducible: Sometimes on a cluster with 265 nodes Steps to Reproduce: 1. Create a large cluster 2. Create a large, diverse load on the cluster Actual results: Pods are fed to the same nodes until those nodes become overloaded and die Expected results: Pods should be spread out across nodes unless affinity or other similar factors say otherwise Additional info: Pods created from different clients but at roughly the same time all scheduled to the same node. That node became unresponsive as a result of the load. [sturpin@starter-us-east-1-master-25064 ~]$ sudo oc get pods -n ops-health-monitoring -o wide [sudo] password for sturpin: NAME READY STATUS RESTARTS AGE IP NODE pull-08251850z-34-1-deploy 0/1 ContainerCreating 0 13m <none> ip-172-31-55-238.ec2.internal pull-08251850z-tl-1-deploy 0/1 ContainerCreating 0 29m <none> ip-172-31-55-238.ec2.internal pull-08251850z-w3-1-deploy 0/1 ContainerCreating 0 13m <none> ip-172-31-55-238.ec2.internal [sturpin@starter-us-east-1-master-25064 ~]$ sudo oc get pods -n ops-health-monitoring -o wide NAME READY STATUS RESTARTS AGE IP NODE build-08251900z-1x-1-deploy 0/1 Terminating 0 18m <none> ip-172-31-59-26.ec2.internal build-08251901z-0d-1-deploy 0/1 Terminating 0 18m <none> ip-172-31-59-26.ec2.internal build-08251901z-xi-1-build 0/1 Terminating 0 20m <none> ip-172-31-59-26.ec2.internal pull-08251910z-oj-1-deploy 0/1 Terminating 0 11m 10.129.23.174 ip-172-31-59-26.ec2.internal pull-08251920z-26-1-deploy 1/1 Running 0 1m <none> ip-172-31-54-226.ec2.intern Odd distribution across the top 50 nodes: [sturpin@starter-us-east-1-master-25064 ~]$ sudo oc get pods -o wide --all-namespaces | awk '{print $8}' | sort | uniq -c | sort -rn | head -50 198 ip-172-31-50-178.ec2.internal 87 ip-172-31-49-48.ec2.internal 86 ip-172-31-57-154.ec2.internal 75 ip-172-31-53-214.ec2.internal 75 ip-172-31-51-213.ec2.internal 73 ip-172-31-60-39.ec2.internal 67 ip-172-31-56-61.ec2.internal 65 ip-172-31-61-142.ec2.internal 56 ip-172-31-59-1.ec2.internal 55 ip-172-31-59-57.ec2.internal 55 ip-172-31-58-211.ec2.internal 55 ip-172-31-57-98.ec2.internal 55 ip-172-31-57-139.ec2.internal 53 ip-172-31-59-91.ec2.internal 51 ip-172-31-60-66.ec2.internal 49 ip-172-31-60-135.ec2.internal 47 ip-172-31-61-150.ec2.internal 47 ip-172-31-60-189.ec2.internal 47 ip-172-31-52-92.ec2.internal 46 ip-172-31-60-13.ec2.internal 46 ip-172-31-57-24.ec2.internal 46 ip-172-31-56-114.ec2.internal 46 ip-172-31-50-89.ec2.internal 46 ip-172-31-49-133.ec2.internal 45 ip-172-31-58-246.ec2.internal 45 ip-172-31-57-102.ec2.internal 45 ip-172-31-55-29.ec2.internal 45 ip-172-31-54-211.ec2.internal 45 ip-172-31-52-181.ec2.internal 44 ip-172-31-62-163.ec2.internal 44 ip-172-31-56-206.ec2.internal 44 ip-172-31-55-228.ec2.internal 43 ip-172-31-53-87.ec2.internal 43 ip-172-31-48-176.ec2.internal 42 ip-172-31-59-69.ec2.internal 42 ip-172-31-51-3.ec2.internal 42 ip-172-31-49-165.ec2.internal 41 ip-172-31-56-207.ec2.internal 41 ip-172-31-55-219.ec2.internal 41 ip-172-31-49-42.ec2.internal 41 ip-172-31-49-220.ec2.internal 40 ip-172-31-61-46.ec2.internal 40 ip-172-31-57-220.ec2.internal 40 ip-172-31-54-26.ec2.internal 40 ip-172-31-53-187.ec2.internal 39 ip-172-31-54-98.ec2.internal 38 ip-172-31-61-209.ec2.internal 38 ip-172-31-61-162.ec2.internal 38 ip-172-31-59-113.ec2.internal 38 ip-172-31-57-141.ec2.internal [sturpin@starter-us-east-1-master-25064 ~]$ sudo oc get pods -o wide --all-namespaces | awk '{print $8}' | sort | uniq -c | sort -rn | tail -10 31 ip-172-31-60-103.ec2.internal 31 ip-172-31-53-133.ec2.internal 30 ip-172-31-54-226.ec2.internal 14 ip-172-31-59-26.ec2.internal 5 ip-172-31-60-14.ec2.internal 4 ip-172-31-56-38.ec2.internal 4 ip-172-31-50-116.ec2.internal 4 ip-172-31-48-214.ec2.internal 1 NODE 1 ip-172-31-51-95.ec2.internal
Another interesting point: we run end-to-end tests that create apps from images and do STI builds at the top and middle of the hour. Here's our top of the hour run. Note how many of the deploy pods are on the same node: [sturpin@starter-us-east-1-master-25064 ~]$ sudo oc get pods -n ops-health-monitoring -o wide --watch [sudo] password for sturpin: NAME READY STATUS RESTARTS AGE IP NODE pull-08252050z-u7-1-1tv4x 1/1 Running 0 2m 10.129.22.48 ip-172-31-59-26.ec2.internal pull-08252050z-u7-1-1tv4x 1/1 Terminating 0 2m 10.129.22.48 ip-172-31-59-26.ec2.internal pull-08252050z-u7-1-1tv4x 0/1 Terminating 0 3m <none> ip-172-31-59-26.ec2.internal pull-08252050z-u7-1-1tv4x 0/1 Terminating 0 3m <none> ip-172-31-59-26.ec2.internal pull-08252050z-u7-1-1tv4x 0/1 Terminating 0 3m <none> ip-172-31-59-26.ec2.internal pull-08252100z-o7-1-deploy 0/1 Pending 0 0s <none> pull-08252100z-o7-1-deploy 0/1 Pending 0 0s <none> ip-172-31-59-26.ec2.internal pull-08252100z-o7-1-deploy 0/1 ContainerCreating 0 0s <none> ip-172-31-59-26.ec2.internal pull-08252100z-01-1-deploy 0/1 Pending 0 0s <none> pull-08252100z-01-1-deploy 0/1 Pending 0 1s <none> ip-172-31-59-26.ec2.internal pull-08252100z-01-1-deploy 0/1 ContainerCreating 0 1s <none> ip-172-31-59-26.ec2.internal pull-08252100z-sp-1-deploy 0/1 Pending 0 0s <none> pull-08252100z-sp-1-deploy 0/1 Pending 0 0s <none> ip-172-31-59-26.ec2.internal pull-08252100z-sp-1-deploy 0/1 ContainerCreating 0 0s <none> ip-172-31-59-26.ec2.internal build-08252100z-gr-1-build 0/1 Pending 0 0s <none> build-08252100z-gr-1-build 0/1 Pending 0 1s <none> ip-172-31-55-238.ec2.internal build-08252100z-gr-1-build 0/1 ContainerCreating 0 1s <none> ip-172-31-55-238.ec2.internal pull-08252100z-o7-1-k1jj1 0/1 Pending 0 0s <none> pull-08252100z-o7-1-k1jj1 0/1 Pending 0 0s <none> ip-172-31-59-26.ec2.internal pull-08252100z-o7-1-k1jj1 0/1 ContainerCreating 0 0s <none> ip-172-31-59-26.ec2.internal build-08252100z-gr-1-build 1/1 Running 0 24s 10.131.89.27 ip-172-31-55-238.ec2.internal build-08252101z-7b-1-build 0/1 Pending 0 0s <none> build-08252101z-7b-1-build 0/1 Pending 0 0s <none> ip-172-31-59-69.ec2.internal build-08252101z-7b-1-build 0/1 ContainerCreating 0 1s <none> ip-172-31-59-69.ec2.internal pull-08252100z-01-1-0m7gx 0/1 Pending 0 0s <none> pull-08252100z-01-1-0m7gx 0/1 Pending 0 0s <none> ip-172-31-59-26.ec2.internal pull-08252100z-01-1-0m7gx 0/1 ContainerCreating 0 0s <none> ip-172-31-59-26.ec2.internal pull-08252100z-sp-1-lxktm 0/1 Pending 0 0s <none> pull-08252100z-sp-1-lxktm 0/1 Pending 0 0s <none> ip-172-31-57-103.ec2.internal pull-08252100z-sp-1-lxktm 0/1 ContainerCreating 0 0s <none> ip-172-31-57-103.ec2.internal pull-08252100z-sp-1-deploy 1/1 Running 0 1m 10.129.22.67 ip-172-31-59-26.ec2.internal pull-08252100z-01-1-deploy 1/1 Running 0 1m 10.129.22.66 ip-172-31-59-26.ec2.internal build-08252101z-6w-1-build 0/1 Pending 0 0s <none> build-08252101z-6w-1-build 0/1 Pending 0 0s <none> ip-172-31-51-213.ec2.internal build-08252101z-6w-1-build 0/1 ContainerCreating 0 1s <none> ip-172-31-51-213.ec2.internal pull-08252100z-sp-1-lxktm 1/1 Running 0 12s 10.128.67.200 ip-172-31-57-103.ec2.internal build-08252100z-gr-1-deploy 0/1 Pending 0 0s <none> build-08252100z-gr-1-deploy 0/1 Pending 0 0s <none> ip-172-31-51-151.ec2.internal build-08252100z-gr-1-deploy 0/1 ContainerCreating 0 0s <none> ip-172-31-51-151.ec2.internal build-08252100z-gr-1-build 0/1 Completed 0 1m 10.131.89.27 ip-172-31-55-238.ec2.internal pull-08252100z-sp-1-deploy 0/1 Completed 0 1m 10.129.22.67 ip-172-31-59-26.ec2.internal pull-08252100z-sp-1-deploy 0/1 Terminating 0 1m 10.129.22.67 ip-172-31-59-26.ec2.internal pull-08252100z-sp-1-deploy 0/1 Terminating 0 1m 10.129.22.67 ip-172-31-59-26.ec2.internal pull-08252100z-o7-1-deploy 1/1 Running 0 2m 10.129.22.65 ip-172-31-59-26.ec2.internal build-08252101z-6w-1-build 1/1 Running 0 49s 10.128.151.199 ip-172-31-51-213.ec2.internal build-08252101z-7b-1-deploy 0/1 Pending 0 0s <none> build-08252101z-7b-1-deploy 0/1 Pending 0 0s <none> ip-172-31-55-238.ec2.internal build-08252101z-7b-1-deploy 0/1 ContainerCreating 0 0s <none> ip-172-31-55-238.ec2.internal build-08252101z-7b-1-deploy 1/1 Running 0 9s 10.131.89.30 ip-172-31-55-238.ec2.internal build-08252101z-7b-1-l3l5c 0/1 Pending 0 0s <none> build-08252101z-7b-1-l3l5c 0/1 Pending 0 1s <none> ip-172-31-59-26.ec2.internal build-08252101z-7b-1-l3l5c 0/1 ContainerCreating 0 1s <none> ip-172-31-59-26.ec2.internal build-08252101z-6w-1-deploy 0/1 Pending 0 0s <none> build-08252101z-6w-1-deploy 0/1 Pending 0 0s <none> ip-172-31-57-24.ec2.internal build-08252101z-6w-1-deploy 0/1 ContainerCreating 0 0s <none> ip-172-31-57-24.ec2.internal build-08252101z-6w-1-build 0/1 Completed 0 1m 10.128.151.199 ip-172-31-51-213.ec2.internal build-08252100z-gr-1-b1lhg 0/1 Pending 0 0s <none> build-08252100z-gr-1-deploy 1/1 Running 0 1m 10.130.104.192 ip-172-31-51-151.ec2.internal build-08252101z-6w-1-deploy 1/1 Running 0 17s 10.131.133.144 ip-172-31-57-24.ec2.internal build-08252100z-gr-1-b1lhg 0/1 Pending 0 0s <none> ip-172-31-57-139.ec2.internal build-08252100z-gr-1-b1lhg 0/1 ContainerCreating 0 0s <none> ip-172-31-57-139.ec2.internal build-08252101z-6w-1-7lxpb 0/1 Pending 0 0s <none> build-08252101z-6w-1-7lxpb 0/1 Pending 0 0s <none> ip-172-31-57-102.ec2.internal build-08252101z-6w-1-7lxpb 0/1 ContainerCreating 0 0s <none> ip-172-31-57-102.ec2.internal build-08252101z-7b-1-build 0/1 Completed 0 2m 10.128.60.6 ip-172-31-59-69.ec2.internal pull-08252100z-o7-1-k1jj1 0/1 Terminating 0 2m <none> ip-172-31-59-26.ec2.internal pull-08252100z-o7-1-k1jj1 0/1 Terminating 0 2m <none> ip-172-31-59-26.ec2.internal pull-08252100z-o7-1-deploy 1/1 Terminating 0 3m 10.129.22.65 ip-172-31-59-26.ec2.internal pull-08252100z-o7-1-k1jj1 0/1 Terminating 0 2m <none> ip-172-31-59-26.ec2.internal pull-08252100z-o7-1-k1jj1 0/1 Terminating 0 2m <none> ip-172-31-59-26.ec2.internal build-08252100z-gr-1-b1lhg 1/1 Running 0 16s 10.128.44.137 ip-172-31-57-139.ec2.internal build-08252101z-6w-1-7lxpb 1/1 Running 0 15s 10.128.85.118 ip-172-31-57-102.ec2.internal build-08252100z-gr-1-deploy 0/1 Completed 0 1m 10.130.104.192 ip-172-31-51-151.ec2.internal build-08252100z-gr-1-deploy 0/1 Terminating 0 1m 10.130.104.192 ip-172-31-51-151.ec2.internal build-08252100z-gr-1-deploy 0/1 Terminating 0 1m 10.130.104.192 ip-172-31-51-151.ec2.internal build-08252101z-6w-1-deploy 0/1 Completed 0 37s 10.131.133.144 ip-172-31-57-24.ec2.internal build-08252101z-6w-1-deploy 0/1 Terminating 0 37s 10.131.133.144 ip-172-31-57-24.ec2.internal build-08252101z-6w-1-deploy 0/1 Terminating 0 37s 10.131.133.144 ip-172-31-57-24.ec2.internal pull-08252100z-sp-1-lxktm 1/1 Terminating 0 2m 10.128.67.200 ip-172-31-57-103.ec2.internal pull-08252100z-01-1-0m7gx 0/1 Terminating 0 2m <none> ip-172-31-59-26.ec2.internal pull-08252100z-01-1-0m7gx 0/1 Terminating 0 2m <none> ip-172-31-59-26.ec2.internal pull-08252100z-01-1-deploy 1/1 Terminating 0 3m 10.129.22.66 ip-172-31-59-26.ec2.internal pull-08252100z-01-1-0m7gx 0/1 Terminating 0 2m <none> ip-172-31-59-26.ec2.internal pull-08252100z-01-1-0m7gx 0/1 Terminating 0 2m <none> ip-172-31-59-26.ec2.internal pull-08252100z-sp-1-lxktm 0/1 Terminating 0 2m <none> ip-172-31-57-103.ec2.internal pull-08252100z-sp-1-lxktm 0/1 Terminating 0 2m <none> ip-172-31-57-103.ec2.internal pull-08252100z-sp-1-lxktm 0/1 Terminating 0 2m <none> ip-172-31-57-103.ec2.internal
Do the pods you are scheduling specify resource requirements? Can you provide a prototypical pod you are using in the test scenario?
reassigned to the Pod component as they handle scheduling issues.
I am looking into it.
Hi Sten, First, I would like to understand why there are 198 pods on the node ip-172-31-50-178.ec2.internal. For that, could you provide me following information: 1. Information (oc describe) about all 198 pods on the ip-172-31-50-178.ec2.internal. 2. And oc describe about all nodes. I will do data mining by myself once you provide above info to avoid going back and forth. Regarding your next comment, https://bugzilla.redhat.com/show_bug.cgi?id=1485464#c1 , I'd say I am not surprised with this behavior. For the data in your comment https://bugzilla.redhat.com/show_bug.cgi?id=1485464#c1, I have: 33 ip-172-31-59-26.ec2.internal 15 7 ip-172-31-57-103.ec2.internal 7 ip-172-31-55-238.ec2.internal 6 ip-172-31-57-24.ec2.internal 6 ip-172-31-51-151.ec2.internal 4 ip-172-31-51-213.ec2.internal 3 ip-172-31-59-69.ec2.internal 3 ip-172-31-57-139.ec2.internal 3 ip-172-31-57-102.ec2.internal Your original comment comment shows that ip-172-31-59-26 has only 14 pods, whereas other nodes are above 30, so having 33 pods on the node ip-172-31-59-26 is not a surprise. Anyway, once you provide me the info, I asked, I will see what is going on, and it could be anything: 1) issue with the scheduler, 2) not enough build nodes in the cluster if controlled by node selectors 3) or some incorrect labels on the nodes 4) or something related to pods resource requirements. But as I said above first I will start with investigating why there are 198 pods on the node ip-172-31-50-178.ec2.internal. Let me know if you have questions.
(In reply to Derek Carr from comment #2) > Do the pods you are scheduling specify resource requirements? > > Can you provide a prototypical pod you are using in the test scenario? We do not specify resource requirements during our e2e testing. The pod is a simple deploy of https://github.com/openshift/origin/tree/master/examples/hello-openshift
Sten, are you still experiencing the node death due to overload originally reported? If so, and you still suspect that the scheduler is not placing pods properly, we will probably need to get verbose scheduler logs to observe the assignment logic during a seemingly improper placement. If you are not experiencing this anymore, please let me know so I can close.
we're still seeing this on 3.6, but it looks like not on 3.7.