Description of problem: During a performance benchmark we've realized that pods are not evenly spread across worker nodes. This issue was previously warned at https://bugzilla.redhat.com/show_bug.cgi?id=2005076 However, after doing some more tests using a workload consisting of deploying a certain number of pause pods with requests (cpu and memory) we've seen that pods are not scheduled correctly. The workload described below deployed 446 pause pods in the same namespace (node-density-bbe06b64-991a-4a74-8d9d-75aa23f45415). # Number of pods in the workload's namespace rsevilla@wonderland ~ $ oc get pod -n node-density-bbe06b64-991a-4a74-8d9d-75aa23f45415 --no-headers | wc -l 446 # All of the deployed pods have resource requests configured rsevilla@wonderland ~ $ oc get pod -n node-density-bbe06b64-991a-4a74-8d9d-75aa23f45415 node-density-1 -o jsonpath="{.spec.containers[*].resources}" {"requests":{"cpu":"1m","memory":"10Mi"}} # Worker nodes total $ oc describe node -l node-role.kubernetes.io/worker | grep -E "(^Name:|^Non-terminated)" Name: ip-10-0-147-142.eu-west-3.compute.internal Non-terminated Pods: (249 in total) Name: ip-10-0-158-24.eu-west-3.compute.internal Non-terminated Pods: (249 in total) Name: ip-10-0-187-55.eu-west-3.compute.internal Non-terminated Pods: (25 in total) Name: ip-10-0-218-220.eu-west-3.compute.internal Non-terminated Pods: (31 in total) # Number of pods per node in the workload's namespace rsevilla@wonderland ~ $ oc get pod -n node-density-bbe06b64-991a-4a74-8d9d-75aa23f45415 -o wide --no-headers | awk '{node[$7]++ }END{ for (n in node) print n": "node[n]; }' ip-10-0-147-142.eu-west-3.compute.internal: 218 ip-10-0-187-55.eu-west-3.compute.internal: 5 ip-10-0-158-24.eu-west-3.compute.internal: 223 As shown above, the pods were deployed across 3 worker nodes (one of the nodes didn't get any pod), and one of those nodes only got 3 pods, while the other two got scheduled 218 and 223 respectively. # Cluster version rsevilla@wonderland ~ $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-rc.4 True False 16d Cluster version is 4.9.0-rc.4
Hi Raul, Is this reporting a different issue than the one identified in https://bugzilla.redhat.com/show_bug.cgi?id=2005076 and https://github.com/kubernetes/kubernetes/issues/105220? From the discussions we've had, this appears to be a duplicate of that, though shown through a different test. Could you please elaborate on the difference? Thanks!
Hey Mike. No, I'm not actually reporting a different issue. I've opened this BZ just to highlight that this issue is also happening even after setting pod requests as you suggested. I can move the information from this one to the old one if you consider it necessary.
Thanks Raul, Yeah let's keep all the information to one thread so nothing gets lost. If you want to copy this over to that bug, we can close this as a duplicate then. Thanks *** This bug has been marked as a duplicate of bug 2005076 ***