1676466 – LVM in the glusterfs container should not try to use udev

Bug 1676466 - LVM in the glusterfs container should not try to use udev

Summary: LVM in the glusterfs container should not try to use udev

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rhgs-server-container
Sub Component:
Version:	ocs-3.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	OCS 3.11.z Batch Update 3
Assignee:	Saravanakumar
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1678446 1698736 (view as bug list)
Depends On:	1676612 1684133 1688316
Blocks:	1674485 1698736
TreeView+	depends on / blocked

Reported:	2019-02-12 11:42 UTC by Niels de Vos
Modified:	2022-03-13 16:58 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, udev events in containers were not reliable. When Logical Volume Manager (LVM) activated the device, the LVM commands waited for udev to create the device nodes under /dev. As the device nodes did not get created, LVM was delayed or failed while activating the devices. As a fix, all interactions with udev for LVM commands executed within the Red Hat Gluster Storage server container are disabled. Hence, LVM commands do not wait for udev to create the device nodes under/dev, instead, it creates the device nodes itself.
Clone Of:
Environment:
Last Closed:	2019-06-13 19:18:59 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Attaching lvm conf file (92.80 KB, text/plain) 2019-04-03 13:46 UTC, RamaKasturi	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	gluster gluster-containers pull 126	0	'None'	closed	Prevent LVM commands from using udev	2020-02-28 15:19:32 UTC
Red Hat Product Errata	RHBA-2019:1406	0	None	None	None	2019-06-13 19:19:09 UTC

Internal Links: 1676612 1688316

Description Niels de Vos 2019-02-12 11:42:57 UTC

Description of problem:

udev in containers is not expected to work correctly or stably. There should only be one udev handled on a system, and this should normally be on the host, not in a container.

When LVM tries to use udev in a container, it will most likely fail, and can cause (long) delays. While deploying heketi through openshift-ansbile, these delays can cause deploying to fail with messages like these:

[kubeexec] DEBUG 2019/02/11 11:09:02 heketi/pkg/remoteexec/kube/exec.go:81:kube.ExecCommands: Ran command [pvcreate -qq --metadatasize=128M --dataalignment=256K '/dev/xvdf'] on [pod:glusterfs-storage-g56jg c:glusterfs ns:app-storage (from host:ip-172-16-43-224.ap-south-1.compute.internal selector:glusterfs-node)]: Stdout []: Stderr [ WARNING: Device /dev/xvdf not initialized in udev database even after waiting 10000000 microseconds.
WARNING: Device /dev/dockervg/dockerlv not initialized in udev database even after waiting 10000000 microseconds.
WARNING: Device /dev/xvda2 not initialized in udev database even after waiting 10000000 microseconds.
WARNING: Device /dev/xvdb1 not initialized in udev database even after waiting 10000000 microseconds.
WARNING: Device /dev/xvdc not initialized in udev database even after waiting 10000000 microseconds.
WARNING: Device /dev/xvdd not initialized in udev database even after waiting 10000000 microseconds.
WARNING: Device /dev/xvdf not initialized in udev database even after waiting 10000000 microseconds.
]

The /etc/lvm/lvm.conf in the container image has 'obtain_device_list_from_udev = 1', which should be disabled.

Version-Release number of selected component (if applicable):
ocs-3.11.1

How reproducible:
100%

Steps to Reproduce:
1. check /etc/lvm/lvm.conf in the container
2. verify that obtain_device_list_from_udev is set

Additional info:
Note that an update for lvm2 is also required. Details are in https://bugzilla.redhat.com/show_bug.cgi?id=1674485#c8

Comment 1 RamaKasturi 2019-03-28 14:10:31 UTC

Acking this for 3.11.3

Comment 3 RamaKasturi 2019-04-03 13:46:04 UTC

Created attachment 1551394 [details]
Attaching lvm conf file

Comment 5 Niels de Vos 2019-04-23 15:25:18 UTC

https://access.redhat.com/errata/RHBA-2019:0814 has been released and that addresses bug 1688316. With that, the downgrade of lvm2 (and dependencies) is not required anymore. Can that be done through this BZ, or should the "use standard lvm2 version" be it's own BZ?

Comment 8 Raghavendra Talur 2019-05-07 11:45:14 UTC

*** Bug 1678446 has been marked as a duplicate of this bug. ***

Comment 9 Prashant Dhange 2019-05-08 00:35:30 UTC

*** Bug 1698736 has been marked as a duplicate of this bug. ***

Comment 10 RamaKasturi 2019-05-14 17:57:23 UTC

Moving the bug to verified state as i see that the following tests have been passed and did not observe any issues.

1) i see that the lvm package included in the container is same as what is  at comment 6.

[root@ip-172-16-45-176 /]# rpm -qa | grep lvm    
lvm2-libs-2.02.180-10.el7_6.7.x86_64
lvm2-2.02.180-10.el7_6.7.x86_64

2) Installed a fresh setup on vmware and did not hit any issue during the installation.

3) upgraded the setup on AWS from 3.11.2 to 3.11.3 and did not hit any issues.

4) Do not see any issues with pvs, pvscan , lvs & vgs commands.

[root@ip-172-16-45-176 /]# sudo pvscan
  PV /dev/xvdf    VG vg_9aa5d10bb7d969c127d9df28c6e7a88c   lvm2 [1.95 TiB / <942.33 GiB free]
  PV /dev/xvdb1   VG dockervg                              lvm2 [<100.00 GiB / 0    free]
  PV /dev/xvdg    VG vg_6c969e2f8f69881531e55340dd9323da   lvm2 [999.87 GiB / 999.87 GiB free]
  Total: 3 [<3.03 TiB] / in use: 3 [<3.03 TiB] / in no VG: 0 [0   ]
[root@ip-172-16-45-176 /]# sudo vgscan
  Reading volume groups from cache.
  Found volume group "vg_9aa5d10bb7d969c127d9df28c6e7a88c" using metadata type lvm2
  Found volume group "dockervg" using metadata type lvm2
  Found volume group "vg_6c969e2f8f69881531e55340dd9323da" using metadata type lvm2
[root@ip-172-16-45-176 /]# sudo vgs
  VG                                  #PV #LV  #SN Attr   VSize    VFree   
  dockervg                              1    1   0 wz--n- <100.00g       0 
  vg_6c969e2f8f69881531e55340dd9323da   1    0   0 wz--n-  999.87g  999.87g
  vg_9aa5d10bb7d969c127d9df28c6e7a88c   1 1286   0 wz--n-    1.95t <942.33g


5) Ran heketi-cli server state examine gluster but do not see any issues there as well.

6) Rebooted the node hosting gluster pod and ran pvscan, vgs & lvs and did not see any issues.

7) Created new file and block volume and they were successful.

8) Added a device and it was successful.

9) Rebooted & added a device to the node and it was successful too.

performed above steps on aws and vmware environments and did not see any issue.

Comment 12 Anjana KD 2019-06-03 12:47:01 UTC

Have updated the doc text. Kindly review it for technical accuracy.

Comment 15 errata-xmlrpc 2019-06-13 19:18:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1406

Note You need to log in before you can comment on or make changes to this bug.