1930220 – Cinder CSI driver is not able to mount volumes under heavier load

Bug 1930220 - Cinder CSI driver is not able to mount volumes under heavier load

Summary: Cinder CSI driver is not able to mount volumes under heavier load

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Jan Safranek
QA Contact:	Wei Duan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1933659
TreeView+	depends on / blocked

Reported:	2021-02-18 14:20 UTC by Jan Safranek
Modified:	2021-07-27 22:47 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-27 22:45:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cloud-provider-openstack pull 45	0	None	open	Bug 1930220: Add udev to the driver image	2021-02-19 08:26:18 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 22:47:27 UTC

Description Jan Safranek 2021-02-18 14:20:24 UTC

Description of problem:
I created 18 Pods on a single node, each of them using its own Cinder volume (=18 volumes attached to the node). Randomly, some of these pod can't start:

Warning  FailedMount             19s (x16 over 25m)   kubelet                  MountVolume.MountDevice failed for volume "pvc-9a5f1a70-2266-4876-8e64-d0fea7ef20da" : rpc error: code = Internal desc = Unable to find Device path for volume

The reason seems to be udev - it did not create /dev/disk/by-id symlinks for the attached volume.

$ udevadm info /dev/vdi
P: /devices/pci0000:00/0000:00:0e.0/virtio11/block/vdi
N: vdi
S: disk/by-path/pci-0000:00:0e.0
S: disk/by-path/virtio-pci-0000:00:0e.0
E: DEVLINKS=/dev/disk/by-path/virtio-pci-0000:00:0e.0 /dev/disk/by-path/pci-0000:00:0e.0
E: DEVNAME=/dev/vdi
E: DEVPATH=/devices/pci0000:00/0000:00:0e.0/virtio11/block/vdi
E: DEVTYPE=disk
E: ID_PATH=pci-0000:00:0e.0
E: ID_PATH_TAG=pci-0000_00_0e_0
E: MAJOR=252
E: MINOR=128
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=5647957949

Another volume that was mounted correctly has more DEVLINKS:
E: DEVLINKS=/dev/disk/by-id/virtio-396d709b-f498-439a-a /dev/disk/by-path/pci-0000:00:0f.0 /dev/disk/by-uuid/dcfaa60a-7896-4c61-bce1-f1841d9acbe0 /dev/disk/by-path/virtio-pci-0000:00:0f.0

The CSI driver is trying to mitigate this by calling "udevadm trigger" on each NodeStage [1], however, udevadm is not installed in the CSI driver container:

1: https://github.com/kubernetes/cloud-provider-openstack/blob/7b5efc481ea6b151300928c0976c336abee3b7e3/pkg/util/mount/mount.go#L110

From CSI driver logs:

I0218 14:00:56.664894       1 mount.go:178] Failed to find device for the volumeID: "2b0b000e-a660-48c4-a3e4-9f5153d9722b" by serial ID
I0218 14:00:57.015228       1 mount.go:113] error running udevadm trigger executable file not found in $PATH

How reproducible:
~50%

Steps to Reproduce:
1. Create ~20 volumes + 20 pods that use them *one the same node*
2. (delete + repeat until it's reproduced)

Actual results:
Some pods are ContainerCreating for a long time.

Expected results:
All pods Running.

Comment 4 Wei Duan 2021-02-24 05:18:09 UTC

Should be the same Upshift...
Yes I checked the CSI driver logs on the node, all mounts are successful, no such error.
Changed status to Verified.

Comment 7 errata-xmlrpc 2021-07-27 22:45:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.