Description of problem:
I created 18 Pods on a single node, each of them using its own Cinder volume (=18 volumes attached to the node). Randomly, some of these pod can't start:
Warning FailedMount 19s (x16 over 25m) kubelet MountVolume.MountDevice failed for volume "pvc-9a5f1a70-2266-4876-8e64-d0fea7ef20da" : rpc error: code = Internal desc = Unable to find Device path for volume
The reason seems to be udev - it did not create /dev/disk/by-id symlinks for the attached volume.
$ udevadm info /dev/vdi
E: DEVLINKS=/dev/disk/by-path/virtio-pci-0000:00:0e.0 /dev/disk/by-path/pci-0000:00:0e.0
Another volume that was mounted correctly has more DEVLINKS:
E: DEVLINKS=/dev/disk/by-id/virtio-396d709b-f498-439a-a /dev/disk/by-path/pci-0000:00:0f.0 /dev/disk/by-uuid/dcfaa60a-7896-4c61-bce1-f1841d9acbe0 /dev/disk/by-path/virtio-pci-0000:00:0f.0
The CSI driver is trying to mitigate this by calling "udevadm trigger" on each NodeStage , however, udevadm is not installed in the CSI driver container:
From CSI driver logs:
I0218 14:00:56.664894 1 mount.go:178] Failed to find device for the volumeID: "2b0b000e-a660-48c4-a3e4-9f5153d9722b" by serial ID
I0218 14:00:57.015228 1 mount.go:113] error running udevadm trigger executable file not found in $PATH
Steps to Reproduce:
1. Create ~20 volumes + 20 pods that use them *one the same node*
2. (delete + repeat until it's reproduced)
Some pods are ContainerCreating for a long time.
All pods Running.
Should be the same Upshift...
Yes I checked the CSI driver logs on the node, all mounts are successful, no such error.
Changed status to Verified.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.