Bug 1930220

Summary: Cinder CSI driver is not able to mount volumes under heavier load
Product: OpenShift Container Platform Reporter: Jan Safranek <jsafrane>
Component: StorageAssignee: Jan Safranek <jsafrane>
Storage sub component: OpenStack CSI Drivers QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, piqin
Version: 4.7   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:45:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1933659    

Description Jan Safranek 2021-02-18 14:20:24 UTC
Description of problem:
I created 18 Pods on a single node, each of them using its own Cinder volume (=18 volumes attached to the node). Randomly, some of these pod can't start:

Warning  FailedMount             19s (x16 over 25m)   kubelet                  MountVolume.MountDevice failed for volume "pvc-9a5f1a70-2266-4876-8e64-d0fea7ef20da" : rpc error: code = Internal desc = Unable to find Device path for volume

The reason seems to be udev - it did not create /dev/disk/by-id symlinks for the attached volume.

$ udevadm info /dev/vdi
P: /devices/pci0000:00/0000:00:0e.0/virtio11/block/vdi
N: vdi
S: disk/by-path/pci-0000:00:0e.0
S: disk/by-path/virtio-pci-0000:00:0e.0
E: DEVLINKS=/dev/disk/by-path/virtio-pci-0000:00:0e.0 /dev/disk/by-path/pci-0000:00:0e.0
E: DEVNAME=/dev/vdi
E: DEVPATH=/devices/pci0000:00/0000:00:0e.0/virtio11/block/vdi
E: DEVTYPE=disk
E: ID_PATH=pci-0000:00:0e.0
E: ID_PATH_TAG=pci-0000_00_0e_0
E: MAJOR=252
E: MINOR=128
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=5647957949

Another volume that was mounted correctly has more DEVLINKS:
E: DEVLINKS=/dev/disk/by-id/virtio-396d709b-f498-439a-a /dev/disk/by-path/pci-0000:00:0f.0 /dev/disk/by-uuid/dcfaa60a-7896-4c61-bce1-f1841d9acbe0 /dev/disk/by-path/virtio-pci-0000:00:0f.0

The CSI driver is trying to mitigate this by calling "udevadm trigger" on each NodeStage [1], however, udevadm is not installed in the CSI driver container:

1: https://github.com/kubernetes/cloud-provider-openstack/blob/7b5efc481ea6b151300928c0976c336abee3b7e3/pkg/util/mount/mount.go#L110

From CSI driver logs:

I0218 14:00:56.664894       1 mount.go:178] Failed to find device for the volumeID: "2b0b000e-a660-48c4-a3e4-9f5153d9722b" by serial ID
I0218 14:00:57.015228       1 mount.go:113] error running udevadm trigger executable file not found in $PATH

How reproducible:
~50%

Steps to Reproduce:
1. Create ~20 volumes + 20 pods that use them *one the same node*
2. (delete + repeat until it's reproduced)

Actual results:
Some pods are ContainerCreating for a long time.

Expected results:
All pods Running.

Comment 4 Wei Duan 2021-02-24 05:18:09 UTC
Should be the same Upshift...
Yes I checked the CSI driver logs on the node, all mounts are successful, no such error.
Changed status to Verified.

Comment 7 errata-xmlrpc 2021-07-27 22:45:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438