Bug 1930220

Summary:	Cinder CSI driver is not able to mount volumes under heavier load
Product:	OpenShift Container Platform	Reporter:	Jan Safranek <jsafrane>
Component:	Storage	Assignee:	Jan Safranek <jsafrane>
Storage sub component:	OpenStack CSI Drivers	QA Contact:	Wei Duan <wduan>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	aos-bugs, piqin
Version:	4.7
Target Milestone:	---
Target Release:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-27 22:45:14 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1933659

Description Jan Safranek 2021-02-18 14:20:24 UTC

Description of problem:
I created 18 Pods on a single node, each of them using its own Cinder volume (=18 volumes attached to the node). Randomly, some of these pod can't start:

Warning  FailedMount             19s (x16 over 25m)   kubelet                  MountVolume.MountDevice failed for volume "pvc-9a5f1a70-2266-4876-8e64-d0fea7ef20da" : rpc error: code = Internal desc = Unable to find Device path for volume

The reason seems to be udev - it did not create /dev/disk/by-id symlinks for the attached volume.

$ udevadm info /dev/vdi
P: /devices/pci0000:00/0000:00:0e.0/virtio11/block/vdi
N: vdi
S: disk/by-path/pci-0000:00:0e.0
S: disk/by-path/virtio-pci-0000:00:0e.0
E: DEVLINKS=/dev/disk/by-path/virtio-pci-0000:00:0e.0 /dev/disk/by-path/pci-0000:00:0e.0
E: DEVNAME=/dev/vdi
E: DEVPATH=/devices/pci0000:00/0000:00:0e.0/virtio11/block/vdi
E: DEVTYPE=disk
E: ID_PATH=pci-0000:00:0e.0
E: ID_PATH_TAG=pci-0000_00_0e_0
E: MAJOR=252
E: MINOR=128
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=5647957949

Another volume that was mounted correctly has more DEVLINKS:
E: DEVLINKS=/dev/disk/by-id/virtio-396d709b-f498-439a-a /dev/disk/by-path/pci-0000:00:0f.0 /dev/disk/by-uuid/dcfaa60a-7896-4c61-bce1-f1841d9acbe0 /dev/disk/by-path/virtio-pci-0000:00:0f.0

The CSI driver is trying to mitigate this by calling "udevadm trigger" on each NodeStage [1], however, udevadm is not installed in the CSI driver container:

1: https://github.com/kubernetes/cloud-provider-openstack/blob/7b5efc481ea6b151300928c0976c336abee3b7e3/pkg/util/mount/mount.go#L110

From CSI driver logs:

I0218 14:00:56.664894       1 mount.go:178] Failed to find device for the volumeID: "2b0b000e-a660-48c4-a3e4-9f5153d9722b" by serial ID
I0218 14:00:57.015228       1 mount.go:113] error running udevadm trigger executable file not found in $PATH

How reproducible:
~50%

Steps to Reproduce:
1. Create ~20 volumes + 20 pods that use them *one the same node*
2. (delete + repeat until it's reproduced)

Actual results:
Some pods are ContainerCreating for a long time.

Expected results:
All pods Running.

Comment 4 Wei Duan 2021-02-24 05:18:09 UTC

Should be the same Upshift...
Yes I checked the CSI driver logs on the node, all mounts are successful, no such error.
Changed status to Verified.

Comment 7 errata-xmlrpc 2021-07-27 22:45:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438