Bug 1965972 - kube-scheduler, fatal error: concurrent map writes
Summary: kube-scheduler, fatal error: concurrent map writes
Keywords:
Status: CLOSED DUPLICATE of bug 2008266
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.z
Assignee: Maciej Szulik
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-31 09:47 UTC by Frederic Giloux
Modified: 2022-01-17 14:57 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-17 14:57:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
stack trace node 168 (264.50 KB, text/plain)
2021-05-31 09:47 UTC, Frederic Giloux
no flags Details
stack trace node 169 (274.45 KB, text/plain)
2021-05-31 09:48 UTC, Frederic Giloux
no flags Details
stack trace node 170 (258.62 KB, text/plain)
2021-05-31 09:49 UTC, Frederic Giloux
no flags Details
stack trace node 170 (258.62 KB, text/plain)
2021-05-31 09:55 UTC, Frederic Giloux
no flags Details

Description Frederic Giloux 2021-05-31 09:47:04 UTC
Created attachment 1788194 [details]
stack trace node 168

Description of problem:
Under load test kube-scheduler is crashing repeatedly. It occurred during a stress test, which involved spawning thousands of namespaces, volumes and pods in batches so that the control plane does not get overloaded.

It happens at a time where the dynamic creation of volumes seems not to cope well with the demand:
I0527 16:40:54.352168       1 factory.go:445] "Unable to schedule pod; no fit; waiting" pod="xxxxxxxxxx/test-kbqqz" err="0/110 nodes are available: 110 pod has unbound immediate PersistentVolumeClaims."

Also one PVC seemed to be missing at the time:
E0527 23:48:37.617023       1 scheduler.go:490] "Error selecting node for pod" err="persistentvolumeclaim \"pvc-tpl-opti-prod\" not found" pod="namespace/param-opti-tpl-param-opti-cluster-4-16-14-16-19-13-4-81-0-vzf75"
$ grep "pvc-tpl-opti-prod.*not found" openshift-kube-scheduler-xxxxxxxxxxxxxx.log | wc -l
4434

The first scheduler having the lease:
I0528 02:40:21.553112       1 factory.go:445] "Unable to schedule pod; no fit; waiting" pod="cl-na-tsa-1a6ef87a-bf5c-11eb-ba43-0a580a800336-1-4416/cl-na-tsa-1a6ef87a-bf5c-11eb-ba43-0a580a800336-1-4416-pbcdk" err="0/110 nodes are available: 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate, 20 Insufficient memory, 3 Insufficient cpu, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 83 node(s) exceed max volume count."
fatal error: concurrent map writes

goroutine 1386763 [running]:
runtime.throw(0x1cb9cc0, 0x15)
	/usr/lib/golang/src/runtime/panic.go:1116 +0x72 fp=0xc00947f6b8 sp=0xc00947f688 pc=0x4378d2
runtime.mapassign_faststr(0x1a206a0, 0xc02c384db0, 0xc01b188040, 0x1f, 0x4)
	/usr/lib/golang/src/runtime/map_faststr.go:211 +0x3f1 fp=0xc00947f720 sp=0xc00947f6b8 pc=0x414871
k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding.(*VolumeBinding).Filter(0xc0009868c0, 0x1f188a0, 0xc0064ff9c0, 0xc01e237d10, 0xc004061400, 0xc00eff41b0, 0x0)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding/volume_binding.go:178 +0x365 fp=0xc00947f838 sp=0xc00947f720 pc=0x1842ba5

Then the other schedulers get the lease and crash immediately

I0527 16:35:40.443390       1 leaderelection.go:253] successfully acquired lease openshift-kube-scheduler/kube-scheduler
I0527 16:35:40.463350       1 factory.go:445] "Unable to schedule pod; no fit; waiting" pod="openshift-ingress/router-default-64f7c6665b-n2x5v" err="0/110 nodes are available: 105 node(s) didn't match node selector, 5 node(s) didn't have free ports for the requested pod ports."
I0527 16:35:40.491987       1 scheduler.go:597] "Successfully bound pod to node" pod="cl-na-202105271627-3dbszyfwoy-asykbee7n-4344/cl-na-202105271627-3dbszyfwoy-asykbee7n-4344-srpz9" node="xxxxxx000006.xxxxxxx.org" evaluatedNodes=110 feasibleNodes=1
fatal error: concurrent map writes
fatal error: concurrent map writes

goroutine 766 [running]:
runtime.throw(0x1cb9cc0, 0x15)
	/usr/lib/golang/src/runtime/panic.go:1116 +0x72 fp=0xc012e417c8 sp=0xc012e41798 pc=0x4378d2
runtime.mapassign_faststr(0x1a206a0, 0xc008b0bb90, 0xc010c79b00, 0x1f, 0x4)
	/usr/lib/golang/src/runtime/map_faststr.go:291 +0x3d8 fp=0xc012e41830 sp=0xc012e417c8 pc=0x414858
k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding.(*VolumeBinding).Filter(0xc00088b3c0, 0x1f188a0, 0xc00dc46640, 0xc013f436e0, 0xc00d42c800, 0xc00d2a7050, 0x0)
	/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding/volume_binding.go:178 +0x365 fp=0xc012e41948 sp=0xc012e41830 pc=0x1842ba5

This looks similar to:
https://github.com/kubernetes/kubernetes/issues/96832
https://github.com/kubernetes/kubernetes/pull/96809

but I was nota able to figure out whether this fix was picked up in OpenShift and did not see it mentioned in erratas post 4.6.27.

Version-Release number of selected component (if applicable):

OpenShift 4.6.27

How reproducible:

under the condition described above

Actual results:

scheduler continuously crashing

Expected results:

scheduler does not crash

Additional info:

The customer reported that deleting the pods that had already terminated, seemed to stop the schedulers from further crashing.

Comment 1 Frederic Giloux 2021-05-31 09:48:31 UTC
Created attachment 1788195 [details]
stack trace node 169

Comment 2 Frederic Giloux 2021-05-31 09:49:27 UTC
Created attachment 1788196 [details]
stack trace node 170

Comment 3 Frederic Giloux 2021-05-31 09:55:57 UTC
Created attachment 1788197 [details]
stack trace node 170

Comment 4 Frederic Giloux 2021-05-31 13:18:39 UTC
For what it's worth: on OCP 4.6.27 kube-scheduler version is 1.19.4. It does not say about potential patches that may have been cherry picked.

Comment 6 Maciej Szulik 2021-06-08 11:52:32 UTC
I'm currently looking into possible bumps of k8s dependencies for both ocp 4.7 and 4.6, but it will take a few more days to sort this out.

Comment 7 Frederic Giloux 2021-06-08 12:59:06 UTC
Thanks @maszulik. It is already good information to know that you are looking at it. I will wait for an update from you in a few more days

Comment 16 Maciej Szulik 2021-08-19 13:54:53 UTC
Mike, I'm sending this one your way, this will be for delivering the 4.6/1,19 k8s bump.

Comment 17 Mike Dame 2021-09-03 15:56:02 UTC
I am still working on the 4.6/1.19 k8s bump to address this, but got occupied with other high priority bugs, apologies for the delay

Comment 21 Mike Dame 2021-10-26 14:20:34 UTC
@Steve, sorry, the 4.6 rebase we were thinking would fix this did merge (https://github.com/openshift/kubernetes/pull/962) but it wasn't linked to this BZ, so it looks like this never got moved to QA.

Setting to Modified so that QE can pick it up and verify that the rebase fixed the issue

Comment 22 RamaKasturi 2021-11-12 15:48:08 UTC
Hello Frederic,

   Could you please let us know the steps for reproducing the issue ?

Thanks
kasturi

Comment 23 Frederic Giloux 2021-11-12 15:59:25 UTC
Hi Kasturi,

this is a pretty old bugzilla. I don't remember more than what is in the description:
"It occurred during a stress test, which involved spawning thousands of namespaces, volumes and pods in batches so that the control plane does not get overloaded.
It happens at a time where the dynamic creation of volumes seems not to cope well with the demand:
I0527 16:40:54.352168       1 factory.go:445] "Unable to schedule pod; no fit; waiting" pod="xxxxxxxxxx/test-kbqqz" err="0/110 nodes are available: 110 pod has unbound immediate PersistentVolumeClaims."

@snetting, who is now the contact person for the customer may be able to get additional information if they still face the issue.

Regards,

Frédéric

Comment 24 Paige Rubendall 2021-11-15 21:32:29 UTC
I reran this test on a 4.6.49 cluster with 110 nodes and was able to create 1200 projects each with 10 pods and 10 pvc's with no issues. Not sure how many namespaces or pods were created during the test where the issue first occurred. Going to rerun on 3 nodes and try to get the number of pods per node up

Comment 25 Paige Rubendall 2021-11-16 16:37:26 UTC
Again, reran on a 4.6.49 cluster with only 3 nodes. I did not see any concurrent mapping issues or other kube-scheduler errors in the logs even when pods were not able to get scheduled when there was no more space on the node. 

Got issue but this was expected with size of noded: 
"AttachVolume.Attach failed for volume "pvc-81afea20-501f-4fef-ae13-4d6046d77788" : too many EBS volumes attached to node ip-10-0-173-82.us-east-2.compute.internal"

Everything looks good to me here; functionality seems to be working as expected

Comment 27 Scott Dodson 2022-01-17 14:57:24 UTC
Since we're attributing this to the fixes from Bug 2008266 which included the Kube 1.19.14 bump lets go ahead and mark this as a dupe of that bug. If we allow this to go through the normal flow it would show in the next errata and someone may presume they need to update to that version rather than 4.6.48 where the bump was delivered.

*** This bug has been marked as a duplicate of bug 2008266 ***


Note You need to log in before you can comment on or make changes to this bug.