Bug 1627456
| Summary: | Pod failed due to timeout expired waiting for volumes to attach/mount | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | jmselmi <jmselmi> | ||||||||||||||
| Component: | Storage | Assignee: | Matthew Wong <mawong> | ||||||||||||||
| Status: | CLOSED WORKSFORME | QA Contact: | Liang Xia <lxia> | ||||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||||
| Priority: | unspecified | ||||||||||||||||
| Version: | 3.7.0 | CC: | aos-bugs, aos-storage-staff, bchilds, hekumar, jmselmi, jokerman, mmccomas | ||||||||||||||
| Target Milestone: | --- | ||||||||||||||||
| Target Release: | 3.7.z | ||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||
| OS: | Unspecified | ||||||||||||||||
| Whiteboard: | |||||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||
| Last Closed: | 2019-01-22 17:46:19 UTC | Type: | Bug | ||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
| Embargoed: | |||||||||||||||||
| Attachments: |
|
||||||||||||||||
|
Description
jmselmi
2018-09-10 15:45:26 UTC
Please provide "oc get pod -o yaml" for the pod in question and the "journalctl -u atomic-openshift-node" logs from the node where the pod is trying to start. Created attachment 1482223 [details]
atomic-openshift-node.log
Created attachment 1482224 [details]
oc get pods -o yaml
Here's the requested files. The pod seems to be running:
containerStatuses:
- containerID: docker://685fbb1d8195a4185c62f5455c95b136f7b92adb2211385f10de87b7a0f7e352
image: docker.io/strimzi/kafka:0.6.0
..
restartCount: 79
state:
running:
startedAt: 2018-09-10T14:57:46Z
- containerID: docker://6f97bdc446dbb748f117b2f8fb89a65c289ea5ae4e329743f596bb956704c577
image: docker.io/strimzi/kafka-stunnel:0.6.0
...
restartCount: 5
state:
running:
startedAt: 2018-09-10T14:45:48Z
hostIP: 10.44.90.9
phase: Running
The restart count is high, likely due to the liveness probe failing.
Something is wrong with the attached node logs.
Please also include "oc describe pod" for the pod in question.
Hello Seth, Do you need additionnal informations ? cheers, Maybe you can't read private comments on bz. See comment 5. Just made it public. Thanks, They are in loop of running, creating containers. Attached the describe files. Created attachment 1482690 [details]
oc describe pod zk
zk-2
Created attachment 1482691 [details]
oc describe pod kafka
Sending to Storage to figure out why the volume manager in the kubelet is not detecting that the GPD is attached and the volume is mounted. This is blocking pod startup. This sounds same as https://bugzilla.redhat.com/show_bug.cgi?id=1627547 . Can we get output of `tree /dev/disk/by-id` when this happens? Does the attached device show up there at all? Typically when a new device is attached to a node, the udev rule on the node should create appropriate symlinks in `/dev/disk/by-id` which is used by GCE-PD driver. But there is a known race/bug in udev which can cause some iSCSI events to be missed. But this was supposed to be rare and additionally GCE-PD calls udev trigger to make sure, we process them. But that has a bug too. https://github.com/kubernetes/kubernetes/issues/68528 tree /dev/disk/by-id /dev/disk/by-id ├── dm-name-docker-8:16-33575008-074947cd333b9e6dbb160cb74ce7691c69cc148df2d1b40c62ec754035bbae6a -> ../../dm-5 ├── dm-name-docker-8:16-33575008-1c4dde946843a8a45ecaacf590dbdc8112e9140c062ad5c1463e3c20a8aeabf5 -> ../../dm-13 ├── dm-name-docker-8:16-33575008-2d910a71093935aaf326c7d5497d307c70f80abbad35a50542c3d420f09dbce7 -> ../../dm-21 ├── dm-name-docker-8:16-33575008-48341b2a16fafa4db61f456ba128d89f626fcba597cea75e0d7a5a5f222fb18d -> ../../dm-18 ├── dm-name-docker-8:16-33575008-4b41f1996a76c48a08f35752ced114cab6c5bd3df7dacc5a401362e57c320bee -> ../../dm-14 ├── dm-name-docker-8:16-33575008-5ebd827825b1b7f550c43258aaf21500566a8bdb23a55bfc8151ee445c85b081 -> ../../dm-12 ├── dm-name-docker-8:16-33575008-66316eed0560126eb38896f4953648f8e302df205cac78d803b1902c52344a84 -> ../../dm-20 ├── dm-name-docker-8:16-33575008-6e00caaee46c2f66358d5863c8a27cc8703241345dc751272b24a688076e22e9 -> ../../dm-22 ├── dm-name-docker-8:16-33575008-77ce86217fafdfdc5e565d56beff6d8656866ae89bc1f13c0fe67ae84e8e5b1a -> ../../dm-16 ├── dm-name-docker-8:16-33575008-7f0b6b8892d352fcb84e3d2e66c3db82963995d4846dd7352788f4431262dada -> ../../dm-3 ├── dm-name-docker-8:16-33575008-a18ebe17dc1e76952a293f5ff496fddd665de6861d77c31c1c0cc811bac6e905 -> ../../dm-11 ├── dm-name-docker-8:16-33575008-a458c12f3fe02643d57a8357d07a7b4d07ac14e7cf0554431c9213bf7d7e2264 -> ../../dm-4 ├── dm-name-docker-8:16-33575008-ad3a2360687fe77ab265d9f8a99475f65dbd6947705a9cde5e37ba09db3c0f96 -> ../../dm-8 ├── dm-name-docker-8:16-33575008-afc2a35a1681150f54690449795b15ad40aa3b54b32cdeeb851e06e77d69fe16 -> ../../dm-15 ├── dm-name-docker-8:16-33575008-beb625376915925f6d80a162d008706195397aef51b940f2e514b944c3e3a984 -> ../../dm-2 ├── dm-name-docker-8:16-33575008-c93274a84844acc995691505cb65df0bd716f9d21e697e7044344e6fd6587c3c -> ../../dm-6 ├── dm-name-docker-8:16-33575008-d0190aaccf7d1f1d30be8d5978c43338be850fe5b34263b2e36a8b2b00b70fb2 -> ../../dm-10 ├── dm-name-docker-8:16-33575008-d7056ff3e05f3813213214f51278ea6cfb9a025b085dc17d294c519f5d29fb29 -> ../../dm-7 ├── dm-name-docker-8:16-33575008-f7358b9d66dad8d7f7891bb9e192e83bd148ed6a021c3586275a6a92473b2a18 -> ../../dm-1 ├── dm-name-docker-8:16-33575008-f766bf31a84b8332c9bf41ba17332943146bdc85d402811abee8dfe67dc4a0c6 -> ../../dm-9 ├── dm-name-docker-8:16-33575008-f8c3f66ffb16935c6b8346cbae7eb3d4ac2354bb9c8f2bbe7c55a274f21e780c -> ../../dm-19 ├── dm-name-docker-8:16-33575008-fc08547e88a4725d7b40aeec18113178a3ae382f1ad902cc7de9b6c95362e8e4 -> ../../dm-17 ├── google-kubernetes-dynamic-pvc-00a22368-a7b5-11e8-bbde-42010a2c5a05 -> ../../sde ├── google-kubernetes-dynamic-pvc-6673b6cb-9bbe-11e8-8b2e-42010a2c5a03 -> ../../sdg ├── google-kubernetes-dynamic-pvc-8974f445-9bbe-11e8-8b2e-42010a2c5a03 -> ../../sdd ├── google-kubernetes-dynamic-pvc-bf8672a7-a7b7-11e8-b631-42010a2c5a03 -> ../../sdf ├── google-ocp-app-0 -> ../../sda ├── google-ocp-app-0-containers -> ../../sdb ├── google-ocp-app-0-local -> ../../sdc ├── google-ocp-app-0-part1 -> ../../sda1 ├── scsi-0Google_PersistentDisk_kubernetes-dynamic-pvc-00a22368-a7b5-11e8-bbde-42010a2c5a05 -> ../../sde ├── scsi-0Google_PersistentDisk_kubernetes-dynamic-pvc-6673b6cb-9bbe-11e8-8b2e-42010a2c5a03 -> ../../sdg ├── scsi-0Google_PersistentDisk_kubernetes-dynamic-pvc-8974f445-9bbe-11e8-8b2e-42010a2c5a03 -> ../../sdd ├── scsi-0Google_PersistentDisk_kubernetes-dynamic-pvc-bf8672a7-a7b7-11e8-b631-42010a2c5a03 -> ../../sdf ├── scsi-0Google_PersistentDisk_ocp-app-0 -> ../../sda ├── scsi-0Google_PersistentDisk_ocp-app-0-containers -> ../../sdb ├── scsi-0Google_PersistentDisk_ocp-app-0-local -> ../../sdc └── scsi-0Google_PersistentDisk_ocp-app-0-part1 -> ../../sda1 Yes, the disks are attached. df -h |grep 1014 /dev/sdd 1014M 63M 952M 7% /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/gce-pd/mounts/kubernetes-dynamic-pvc-8974f445-9bbe-11e8-8b2e-42010a2c5a03 /dev/sde 1014M 36M 979M 4% /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/gce-pd/mounts/kubernetes-dynamic-pvc-00a22368-a7b5-11e8-bbde-42010a2c5a05 /dev/sdf 1014M 33M 982M 4% /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/gce-pd/mounts/kubernetes-dynamic-pvc-bf8672a7-a7b7-11e8-b631-42010a2c5a03 /dev/sdg 1014M 88M 927M 9% /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/gce-pd/mounts/kubernetes-dynamic-pvc-6673b6cb-9bbe-11e8-8b2e-42010a2c5a03 cheers Okay, this appears to be a different issue. Can you please upload proper node logs. The log files you shared appear to be empty. Created attachment 1482952 [details]
node logs.
Please, find the file attached.
The logs don't include when the mount failures occurred, so if you have access to the logs from when the issue occurred (sep 10th?) or you can reproduce the issue, please attach new logs. Thank you Created attachment 1486426 [details]
node.logs
I reproduced the case.
Please; find the logs.
Closing, please reopen if the issue reoccurs with the same node logs. Logs don't show the mount error unfortunately |