Created attachment 1537090 [details] Node logs where build pods are hung Description of problem: Scenario is repeated builds with 2 parallel builds/node. After a while 2 of the builds were hung with the build pod stuck in Init:0/2 status with timeouts waiting for volumes to attach. Node logs show the following sequence of messages repeating every minute. Full node logs from 2 nodes with stuck builds attached. Feb 21 13:15:36 ip-10-0-135-244 hyperkube[4143]: E0221 13:15:36.583189 4143 kubelet.go:1662] Unable to mount volumes for pod "cakephp-mysql-example-11-build_svt-91(0302c8ce-357a-11e9-a8c3-0af843294408)": timeout expired waiting for volumes to attach or mount for pod "svt-91"/"cakephp-mysql-example-11-build". list of unmounted volumes=[build-system-configs build-ca-bundles]. list of unattached volumes=[buildcachedir buildworkdir builder-dockercfg-fzst2-push builder-dockercfg-fzst2-pull build-system-configs build-ca-bundles container-storage-root build-blob-cache builder-token-jmvp9]; skipping pod Feb 21 13:15:36 ip-10-0-135-244 hyperkube[4143]: E0221 13:15:36.583263 4143 pod_workers.go:186] Error syncing pod 0302c8ce-357a-11e9-a8c3-0af843294408 ("cakephp-mysql-example-11-build_svt-91(0302c8ce-357a-11e9-a8c3-0af843294408)"), skipping: timeout expired waiting for volumes to attach or mount for pod "svt-91"/"cakephp-mysql-example-11-build". list of unmounted volumes=[build-system-configs build-ca-bundles]. list of unattached volumes=[buildcachedir buildworkdir builder-dockercfg-fzst2-push builder-dockercfg-fzst2-pull build-system-configs build-ca-bundles container-storage-root build-blob-cache builder-token-jmvp9] Feb 21 13:16:41 ip-10-0-135-244 hyperkube[4143]: E0221 13:16:41.821100 4143 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/configmap/0302c8ce-357a-11e9-a8c3-0af843294408-build-ca-bundles\" (\"0302c8ce-357a-11e9-a8c3-0af843294408\")" failed. No retries permitted until 2019-02-21 13:18:43.821058136 +0000 UTC m=+65702.994792873 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"build-ca-bundles\" (UniqueName: \"kubernetes.io/configmap/0302c8ce-357a-11e9-a8c3-0af843294408-build-ca-bundles\") pod \"cakephp-mysql-example-11-build\" (UID: \"0302c8ce-357a-11e9-a8c3-0af843294408\") : configmap \"cakephp-mysql-example-11-ca\" not found" Feb 21 13:16:42 ip-10-0-135-244 hyperkube[4143]: E0221 13:16:42.121930 4143 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/configmap/0302c8ce-357a-11e9-a8c3-0af843294408-build-system-configs\" (\"0302c8ce-357a-11e9-a8c3-0af843294408\")" failed. No retries permitted until 2019-02-21 13:18:44.121888287 +0000 UTC m=+65703.295623116 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"build-system-configs\" (UniqueName: \"kubernetes.io/configmap/0302c8ce-357a-11e9-a8c3-0af843294408-build-system-configs\") pod \"cakephp-mysql-example-11-build\" (UID: \"0302c8ce-357a-11e9-a8c3-0af843294408\") : configmap \"cakephp-mysql-example-11-sys-config\" not found" Version-Release number of selected component (if applicable): 4.0.0-0.nightly-2019-02-19-195128 How reproducible: Unknown. Will try to reproduce if nodes are not needed for debug. Steps to Reproduce: 0. 4.0 cluster with 3 masters and 10 worker nodes 1. Create 100 projects 2. Create 1 build config in each project (cakephp-mysql-example quickstart) 3. Run 20 builds at a time repeatedly (2 per node) Actual results: After 4 iterations, 2 build pods hung in Init:0/2 status with the messages above repeating in the node logs Expected results: All builds complete successfully Additional info: Node logs attached. The hung build pods on each node are both named cakephp-mysql-example-11-build
Ryan, can you take a look? Mike, do the configmaps the kubelet reports as "not found" exist or not when directly queried?
I'll take a look!
Robert found a revert (https://github.com/kubernetes/kubernetes/pull/74755) to change back to the cache behavior. I'll close this issue once there is a pick merged.
Upstream Issue: https://github.com/kubernetes/kubernetes/issues/74412 Jordan's Recommendation: https://github.com/kubernetes/kubernetes/issues/74412#issuecomment-468437599
PR against 4.0: https://github.com/openshift/machine-config-operator/pull/523
Duplicate of #1677120. *** This bug has been marked as a duplicate of bug 1677120 ***