Bug 1943496
Summary: | [Manila CSI driver] could not mount volume in one node while other nodes work fine | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Wei Duan <wduan> |
Component: | Storage | Assignee: | Mike Fedosin <mfedosin> |
Storage sub component: | OpenStack CSI Drivers | QA Contact: | Jon Uriarte <juriarte> |
Status: | CLOSED WORKSFORME | Docs Contact: | |
Severity: | high | ||
Priority: | medium | CC: | adduarte, aos-bugs, emacchi, mbooth, mfedosin, piqin, pprinett, tbarron |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-05-06 11:22:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Wei Duan
2021-03-26 09:07:31 UTC
Hit this issue in 4.6.0-0.nightly-2021-03-25-230637 too. One of pod is stuck in "ContainerCreating" status. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2021-03-25-230637 True False 75m Cluster version is 4.6.0-0.nightly-2021-03-25-230637 $ oc get pod NAME READY STATUS RESTARTS AGE ds-5-4jrjf 1/1 Running 0 3m27s ds-5-6qmd5 0/1 ContainerCreating 0 3m27s ds-5-m74xf 1/1 Running 0 3m27s $ oc describe pod ds-5-6qmd5 <skip> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 35s default-scheduler Successfully assigned default/ds-5-6qmd5 to piqin-0326-1-nbxx6-worker-0-st7m5 Warning FailedMount <invalid> kubelet MountVolume.SetUp failed for volume "pvc-c32cdf6e-7803-45ce-bf46-a00eb013a5f2" : rpc error: code = DeadlineExceeded desc = context deadline exceeded Warning FailedMount <invalid> kubelet Unable to attach or mount volumes: unmounted volumes=[local], unattached volumes=[local default-token-slcrx]: timed out waiting for the condition First mustgather doesn't contain manila csi driver logs, so it's hard to understand what happened there. Manila operator didn't report any errors... The second one from Qin Ping has required logs, and there is only one error message: 2021-03-26T11:50:44.707506583Z Mounting command: mount 2021-03-26T11:50:44.707506583Z Mounting arguments: -t nfs 172.16.32.1:/volumes/_nogroup/93174795-380d-4331-9437-e18de9014c86 /var/lib/kubelet/pods/cb38e6bd-2f6e-4472-aac4-a87c1d5d9297/volumes/kubernetes.io~csi/pvc-c32cdf6e-7803-45ce-bf46-a00eb013a5f2/mount 2021-03-26T11:50:44.707506583Z Output: mount.nfs: Connection timed out Could it be a network issue? Maybe. We tried the "mount -t nfs" cmd on the problematic worker node, it returned the same error. Checked the network config of the problematic worker node, lgtm. Maybe the issue of PSI cluster? Additional info: 1. Network config for piqin-0326-txhx4-worker-0-8n2t5 node sh-4.4# ip addr|grep 172 inet 172.16.34.116/20 brd 172.16.47.255 scope global dynamic noprefixroute ens4 172: veth65d6dca8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default sh-4.4# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default host-192-168-0- 0.0.0.0 UG 100 0 0 ens3 10.128.0.0 0.0.0.0 255.252.0.0 U 0 0 0 tun0 169.254.169.254 host-192-168-0- 255.255.255.255 UGH 100 0 0 ens3 169.254.169.254 172.16.34.1 255.255.255.255 UGH 101 0 0 ens4 172.16.32.0 0.0.0.0 255.255.240.0 U 101 0 0 ens4 172.30.0.0 0.0.0.0 255.255.0.0 U 0 0 0 tun0 192.168.0.0 0.0.0.0 255.255.192.0 U 100 0 0 ens3 I tried to reproduce it several times on PSI but I couldn't, so I think this issue was caused by an unstable environment. I'm going to close this bz now. Please reopen if the issue happens again. |