Description of problem: I created 18 Pods on a single node, each of them using its own Cinder volume (=18 volumes attached to the node). Randomly, some of these pod can't start: Warning FailedMount 19s (x16 over 25m) kubelet MountVolume.MountDevice failed for volume "pvc-9a5f1a70-2266-4876-8e64-d0fea7ef20da" : rpc error: code = Internal desc = Unable to find Device path for volume The reason seems to be udev - it did not create /dev/disk/by-id symlinks for the attached volume. $ udevadm info /dev/vdi P: /devices/pci0000:00/0000:00:0e.0/virtio11/block/vdi N: vdi S: disk/by-path/pci-0000:00:0e.0 S: disk/by-path/virtio-pci-0000:00:0e.0 E: DEVLINKS=/dev/disk/by-path/virtio-pci-0000:00:0e.0 /dev/disk/by-path/pci-0000:00:0e.0 E: DEVNAME=/dev/vdi E: DEVPATH=/devices/pci0000:00/0000:00:0e.0/virtio11/block/vdi E: DEVTYPE=disk E: ID_PATH=pci-0000:00:0e.0 E: ID_PATH_TAG=pci-0000_00_0e_0 E: MAJOR=252 E: MINOR=128 E: SUBSYSTEM=block E: TAGS=:systemd: E: USEC_INITIALIZED=5647957949 Another volume that was mounted correctly has more DEVLINKS: E: DEVLINKS=/dev/disk/by-id/virtio-396d709b-f498-439a-a /dev/disk/by-path/pci-0000:00:0f.0 /dev/disk/by-uuid/dcfaa60a-7896-4c61-bce1-f1841d9acbe0 /dev/disk/by-path/virtio-pci-0000:00:0f.0 The CSI driver is trying to mitigate this by calling "udevadm trigger" on each NodeStage [1], however, udevadm is not installed in the CSI driver container: 1: https://github.com/kubernetes/cloud-provider-openstack/blob/7b5efc481ea6b151300928c0976c336abee3b7e3/pkg/util/mount/mount.go#L110 From CSI driver logs: I0218 14:00:56.664894 1 mount.go:178] Failed to find device for the volumeID: "2b0b000e-a660-48c4-a3e4-9f5153d9722b" by serial ID I0218 14:00:57.015228 1 mount.go:113] error running udevadm trigger executable file not found in $PATH How reproducible: ~50% Steps to Reproduce: 1. Create ~20 volumes + 20 pods that use them *one the same node* 2. (delete + repeat until it's reproduced) Actual results: Some pods are ContainerCreating for a long time. Expected results: All pods Running.
Should be the same Upshift... Yes I checked the CSI driver logs on the node, all mounts are successful, no such error. Changed status to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438