Bug 1679585 - Build pods stuck in Init:0/2 status forever with volume mount timeouts
Summary: Build pods stuck in Init:0/2 status forever with volume mount timeouts
Keywords:
Status: CLOSED DUPLICATE of bug 1677120
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.1.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 4.1.0
Assignee: Ryan Phillips
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-21 13:28 UTC by Mike Fiedler
Modified: 2019-10-30 17:00 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-08 22:24:49 UTC
Target Upstream Version:


Attachments (Terms of Use)
Node logs where build pods are hung (1.51 MB, application/gzip)
2019-02-21 13:28 UTC, Mike Fiedler
no flags Details

Description Mike Fiedler 2019-02-21 13:28:27 UTC
Created attachment 1537090 [details]
Node logs where build pods are hung

Description of problem:

Scenario is repeated builds with 2 parallel builds/node.   After a while 2 of the builds were hung with the build pod stuck in Init:0/2 status with timeouts waiting for volumes to attach.  Node logs show the following sequence of messages repeating every minute.   Full node logs from 2 nodes with stuck builds attached.

Feb 21 13:15:36 ip-10-0-135-244 hyperkube[4143]: E0221 13:15:36.583189    4143 kubelet.go:1662] Unable to mount volumes for pod "cakephp-mysql-example-11-build_svt-91(0302c8ce-357a-11e9-a8c3-0af843294408)": timeout expired waiting for volumes to attach or mount for pod "svt-91"/"cakephp-mysql-example-11-build". list of unmounted volumes=[build-system-configs build-ca-bundles]. list of unattached volumes=[buildcachedir buildworkdir builder-dockercfg-fzst2-push builder-dockercfg-fzst2-pull build-system-configs build-ca-bundles container-storage-root build-blob-cache builder-token-jmvp9]; skipping pod
Feb 21 13:15:36 ip-10-0-135-244 hyperkube[4143]: E0221 13:15:36.583263    4143 pod_workers.go:186] Error syncing pod 0302c8ce-357a-11e9-a8c3-0af843294408 ("cakephp-mysql-example-11-build_svt-91(0302c8ce-357a-11e9-a8c3-0af843294408)"), skipping: timeout expired waiting for volumes to attach or mount for pod "svt-91"/"cakephp-mysql-example-11-build". list of unmounted volumes=[build-system-configs build-ca-bundles]. list of unattached volumes=[buildcachedir buildworkdir builder-dockercfg-fzst2-push builder-dockercfg-fzst2-pull build-system-configs build-ca-bundles container-storage-root build-blob-cache builder-token-jmvp9]
Feb 21 13:16:41 ip-10-0-135-244 hyperkube[4143]: E0221 13:16:41.821100    4143 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/configmap/0302c8ce-357a-11e9-a8c3-0af843294408-build-ca-bundles\" (\"0302c8ce-357a-11e9-a8c3-0af843294408\")" failed. No retries permitted until 2019-02-21 13:18:43.821058136 +0000 UTC m=+65702.994792873 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"build-ca-bundles\" (UniqueName: \"kubernetes.io/configmap/0302c8ce-357a-11e9-a8c3-0af843294408-build-ca-bundles\") pod \"cakephp-mysql-example-11-build\" (UID: \"0302c8ce-357a-11e9-a8c3-0af843294408\") : configmap \"cakephp-mysql-example-11-ca\" not found"
Feb 21 13:16:42 ip-10-0-135-244 hyperkube[4143]: E0221 13:16:42.121930    4143 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/configmap/0302c8ce-357a-11e9-a8c3-0af843294408-build-system-configs\" (\"0302c8ce-357a-11e9-a8c3-0af843294408\")" failed. No retries permitted until 2019-02-21 13:18:44.121888287 +0000 UTC m=+65703.295623116 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"build-system-configs\" (UniqueName: \"kubernetes.io/configmap/0302c8ce-357a-11e9-a8c3-0af843294408-build-system-configs\") pod \"cakephp-mysql-example-11-build\" (UID: \"0302c8ce-357a-11e9-a8c3-0af843294408\") : configmap \"cakephp-mysql-example-11-sys-config\" not found"


Version-Release number of selected component (if applicable): 4.0.0-0.nightly-2019-02-19-195128


How reproducible: Unknown.   Will try to reproduce if nodes are not needed for debug.


Steps to Reproduce:
0.  4.0 cluster with 3 masters and 10 worker nodes
1.  Create 100 projects
2.  Create 1 build config in each project (cakephp-mysql-example quickstart)
3.  Run 20 builds at a time repeatedly (2 per node)

Actual results:  After 4 iterations, 2 build pods hung in Init:0/2 status with the messages above repeating in the node logs


Expected results:  All builds complete successfully


Additional info: 

Node logs attached.   The hung build pods on each node are both named cakephp-mysql-example-11-build

Comment 1 Seth Jennings 2019-02-26 15:23:17 UTC
Ryan, can you take a look?

Mike, do the configmaps the kubelet reports as "not found" exist or not when directly queried?

Comment 2 Ryan Phillips 2019-02-26 15:24:42 UTC
I'll take a look!

Comment 3 Ryan Phillips 2019-03-04 15:10:42 UTC
Robert found a revert (https://github.com/kubernetes/kubernetes/pull/74755) to change back to the cache behavior. I'll close this issue once there is a pick merged.

Comment 5 Ryan Phillips 2019-03-04 19:44:22 UTC
PR against 4.0: https://github.com/openshift/machine-config-operator/pull/523

Comment 6 Ryan Phillips 2019-03-08 22:24:49 UTC
Duplicate of #1677120.

*** This bug has been marked as a duplicate of bug 1677120 ***


Note You need to log in before you can comment on or make changes to this bug.