Bug 1679585
Summary: | Build pods stuck in Init:0/2 status forever with volume mount timeouts | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> | ||||
Component: | Node | Assignee: | Ryan Phillips <rphillips> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Mike Fiedler <mifiedle> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.1.0 | CC: | aos-bugs, jokerman, mifiedle, mmccomas | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.1.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-03-08 22:24:49 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Ryan, can you take a look? Mike, do the configmaps the kubelet reports as "not found" exist or not when directly queried? I'll take a look! Robert found a revert (https://github.com/kubernetes/kubernetes/pull/74755) to change back to the cache behavior. I'll close this issue once there is a pick merged. Upstream Issue: https://github.com/kubernetes/kubernetes/issues/74412 Jordan's Recommendation: https://github.com/kubernetes/kubernetes/issues/74412#issuecomment-468437599 PR against 4.0: https://github.com/openshift/machine-config-operator/pull/523 Duplicate of #1677120. *** This bug has been marked as a duplicate of bug 1677120 *** |
Created attachment 1537090 [details] Node logs where build pods are hung Description of problem: Scenario is repeated builds with 2 parallel builds/node. After a while 2 of the builds were hung with the build pod stuck in Init:0/2 status with timeouts waiting for volumes to attach. Node logs show the following sequence of messages repeating every minute. Full node logs from 2 nodes with stuck builds attached. Feb 21 13:15:36 ip-10-0-135-244 hyperkube[4143]: E0221 13:15:36.583189 4143 kubelet.go:1662] Unable to mount volumes for pod "cakephp-mysql-example-11-build_svt-91(0302c8ce-357a-11e9-a8c3-0af843294408)": timeout expired waiting for volumes to attach or mount for pod "svt-91"/"cakephp-mysql-example-11-build". list of unmounted volumes=[build-system-configs build-ca-bundles]. list of unattached volumes=[buildcachedir buildworkdir builder-dockercfg-fzst2-push builder-dockercfg-fzst2-pull build-system-configs build-ca-bundles container-storage-root build-blob-cache builder-token-jmvp9]; skipping pod Feb 21 13:15:36 ip-10-0-135-244 hyperkube[4143]: E0221 13:15:36.583263 4143 pod_workers.go:186] Error syncing pod 0302c8ce-357a-11e9-a8c3-0af843294408 ("cakephp-mysql-example-11-build_svt-91(0302c8ce-357a-11e9-a8c3-0af843294408)"), skipping: timeout expired waiting for volumes to attach or mount for pod "svt-91"/"cakephp-mysql-example-11-build". list of unmounted volumes=[build-system-configs build-ca-bundles]. list of unattached volumes=[buildcachedir buildworkdir builder-dockercfg-fzst2-push builder-dockercfg-fzst2-pull build-system-configs build-ca-bundles container-storage-root build-blob-cache builder-token-jmvp9] Feb 21 13:16:41 ip-10-0-135-244 hyperkube[4143]: E0221 13:16:41.821100 4143 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/configmap/0302c8ce-357a-11e9-a8c3-0af843294408-build-ca-bundles\" (\"0302c8ce-357a-11e9-a8c3-0af843294408\")" failed. No retries permitted until 2019-02-21 13:18:43.821058136 +0000 UTC m=+65702.994792873 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"build-ca-bundles\" (UniqueName: \"kubernetes.io/configmap/0302c8ce-357a-11e9-a8c3-0af843294408-build-ca-bundles\") pod \"cakephp-mysql-example-11-build\" (UID: \"0302c8ce-357a-11e9-a8c3-0af843294408\") : configmap \"cakephp-mysql-example-11-ca\" not found" Feb 21 13:16:42 ip-10-0-135-244 hyperkube[4143]: E0221 13:16:42.121930 4143 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/configmap/0302c8ce-357a-11e9-a8c3-0af843294408-build-system-configs\" (\"0302c8ce-357a-11e9-a8c3-0af843294408\")" failed. No retries permitted until 2019-02-21 13:18:44.121888287 +0000 UTC m=+65703.295623116 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"build-system-configs\" (UniqueName: \"kubernetes.io/configmap/0302c8ce-357a-11e9-a8c3-0af843294408-build-system-configs\") pod \"cakephp-mysql-example-11-build\" (UID: \"0302c8ce-357a-11e9-a8c3-0af843294408\") : configmap \"cakephp-mysql-example-11-sys-config\" not found" Version-Release number of selected component (if applicable): 4.0.0-0.nightly-2019-02-19-195128 How reproducible: Unknown. Will try to reproduce if nodes are not needed for debug. Steps to Reproduce: 0. 4.0 cluster with 3 masters and 10 worker nodes 1. Create 100 projects 2. Create 1 build config in each project (cakephp-mysql-example quickstart) 3. Run 20 builds at a time repeatedly (2 per node) Actual results: After 4 iterations, 2 build pods hung in Init:0/2 status with the messages above repeating in the node logs Expected results: All builds complete successfully Additional info: Node logs attached. The hung build pods on each node are both named cakephp-mysql-example-11-build