Created attachment 1788194 [details] stack trace node 168 Description of problem: Under load test kube-scheduler is crashing repeatedly. It occurred during a stress test, which involved spawning thousands of namespaces, volumes and pods in batches so that the control plane does not get overloaded. It happens at a time where the dynamic creation of volumes seems not to cope well with the demand: I0527 16:40:54.352168 1 factory.go:445] "Unable to schedule pod; no fit; waiting" pod="xxxxxxxxxx/test-kbqqz" err="0/110 nodes are available: 110 pod has unbound immediate PersistentVolumeClaims." Also one PVC seemed to be missing at the time: E0527 23:48:37.617023 1 scheduler.go:490] "Error selecting node for pod" err="persistentvolumeclaim \"pvc-tpl-opti-prod\" not found" pod="namespace/param-opti-tpl-param-opti-cluster-4-16-14-16-19-13-4-81-0-vzf75" $ grep "pvc-tpl-opti-prod.*not found" openshift-kube-scheduler-xxxxxxxxxxxxxx.log | wc -l 4434 The first scheduler having the lease: I0528 02:40:21.553112 1 factory.go:445] "Unable to schedule pod; no fit; waiting" pod="cl-na-tsa-1a6ef87a-bf5c-11eb-ba43-0a580a800336-1-4416/cl-na-tsa-1a6ef87a-bf5c-11eb-ba43-0a580a800336-1-4416-pbcdk" err="0/110 nodes are available: 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate, 20 Insufficient memory, 3 Insufficient cpu, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 83 node(s) exceed max volume count." fatal error: concurrent map writes goroutine 1386763 [running]: runtime.throw(0x1cb9cc0, 0x15) /usr/lib/golang/src/runtime/panic.go:1116 +0x72 fp=0xc00947f6b8 sp=0xc00947f688 pc=0x4378d2 runtime.mapassign_faststr(0x1a206a0, 0xc02c384db0, 0xc01b188040, 0x1f, 0x4) /usr/lib/golang/src/runtime/map_faststr.go:211 +0x3f1 fp=0xc00947f720 sp=0xc00947f6b8 pc=0x414871 k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding.(*VolumeBinding).Filter(0xc0009868c0, 0x1f188a0, 0xc0064ff9c0, 0xc01e237d10, 0xc004061400, 0xc00eff41b0, 0x0) /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding/volume_binding.go:178 +0x365 fp=0xc00947f838 sp=0xc00947f720 pc=0x1842ba5 Then the other schedulers get the lease and crash immediately I0527 16:35:40.443390 1 leaderelection.go:253] successfully acquired lease openshift-kube-scheduler/kube-scheduler I0527 16:35:40.463350 1 factory.go:445] "Unable to schedule pod; no fit; waiting" pod="openshift-ingress/router-default-64f7c6665b-n2x5v" err="0/110 nodes are available: 105 node(s) didn't match node selector, 5 node(s) didn't have free ports for the requested pod ports." I0527 16:35:40.491987 1 scheduler.go:597] "Successfully bound pod to node" pod="cl-na-202105271627-3dbszyfwoy-asykbee7n-4344/cl-na-202105271627-3dbszyfwoy-asykbee7n-4344-srpz9" node="xxxxxx000006.xxxxxxx.org" evaluatedNodes=110 feasibleNodes=1 fatal error: concurrent map writes fatal error: concurrent map writes goroutine 766 [running]: runtime.throw(0x1cb9cc0, 0x15) /usr/lib/golang/src/runtime/panic.go:1116 +0x72 fp=0xc012e417c8 sp=0xc012e41798 pc=0x4378d2 runtime.mapassign_faststr(0x1a206a0, 0xc008b0bb90, 0xc010c79b00, 0x1f, 0x4) /usr/lib/golang/src/runtime/map_faststr.go:291 +0x3d8 fp=0xc012e41830 sp=0xc012e417c8 pc=0x414858 k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding.(*VolumeBinding).Filter(0xc00088b3c0, 0x1f188a0, 0xc00dc46640, 0xc013f436e0, 0xc00d42c800, 0xc00d2a7050, 0x0) /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding/volume_binding.go:178 +0x365 fp=0xc012e41948 sp=0xc012e41830 pc=0x1842ba5 This looks similar to: https://github.com/kubernetes/kubernetes/issues/96832 https://github.com/kubernetes/kubernetes/pull/96809 but I was nota able to figure out whether this fix was picked up in OpenShift and did not see it mentioned in erratas post 4.6.27. Version-Release number of selected component (if applicable): OpenShift 4.6.27 How reproducible: under the condition described above Actual results: scheduler continuously crashing Expected results: scheduler does not crash Additional info: The customer reported that deleting the pods that had already terminated, seemed to stop the schedulers from further crashing.
Created attachment 1788195 [details] stack trace node 169
Created attachment 1788196 [details] stack trace node 170
Created attachment 1788197 [details] stack trace node 170
For what it's worth: on OCP 4.6.27 kube-scheduler version is 1.19.4. It does not say about potential patches that may have been cherry picked.
I'm currently looking into possible bumps of k8s dependencies for both ocp 4.7 and 4.6, but it will take a few more days to sort this out.
Thanks @maszulik. It is already good information to know that you are looking at it. I will wait for an update from you in a few more days
Mike, I'm sending this one your way, this will be for delivering the 4.6/1,19 k8s bump.
I am still working on the 4.6/1.19 k8s bump to address this, but got occupied with other high priority bugs, apologies for the delay
@Steve, sorry, the 4.6 rebase we were thinking would fix this did merge (https://github.com/openshift/kubernetes/pull/962) but it wasn't linked to this BZ, so it looks like this never got moved to QA. Setting to Modified so that QE can pick it up and verify that the rebase fixed the issue
Hello Frederic, Could you please let us know the steps for reproducing the issue ? Thanks kasturi
Hi Kasturi, this is a pretty old bugzilla. I don't remember more than what is in the description: "It occurred during a stress test, which involved spawning thousands of namespaces, volumes and pods in batches so that the control plane does not get overloaded. It happens at a time where the dynamic creation of volumes seems not to cope well with the demand: I0527 16:40:54.352168 1 factory.go:445] "Unable to schedule pod; no fit; waiting" pod="xxxxxxxxxx/test-kbqqz" err="0/110 nodes are available: 110 pod has unbound immediate PersistentVolumeClaims." @snetting, who is now the contact person for the customer may be able to get additional information if they still face the issue. Regards, Frédéric
I reran this test on a 4.6.49 cluster with 110 nodes and was able to create 1200 projects each with 10 pods and 10 pvc's with no issues. Not sure how many namespaces or pods were created during the test where the issue first occurred. Going to rerun on 3 nodes and try to get the number of pods per node up
Again, reran on a 4.6.49 cluster with only 3 nodes. I did not see any concurrent mapping issues or other kube-scheduler errors in the logs even when pods were not able to get scheduled when there was no more space on the node. Got issue but this was expected with size of noded: "AttachVolume.Attach failed for volume "pvc-81afea20-501f-4fef-ae13-4d6046d77788" : too many EBS volumes attached to node ip-10-0-173-82.us-east-2.compute.internal" Everything looks good to me here; functionality seems to be working as expected
Since we're attributing this to the fixes from Bug 2008266 which included the Kube 1.19.14 bump lets go ahead and mark this as a dupe of that bug. If we allow this to go through the normal flow it would show in the next errata and someone may presume they need to update to that version rather than 4.6.48 where the bump was delivered. *** This bug has been marked as a duplicate of bug 2008266 ***