Bug 1676466 - LVM in the glusterfs container should not try to use udev
Summary: LVM in the glusterfs container should not try to use udev
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhgs-server-container
Version: ocs-3.11
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: OCS 3.11.z Batch Update 3
Assignee: Saravanakumar
QA Contact: RamaKasturi
URL:
Whiteboard:
: 1678446 1698736 (view as bug list)
Depends On: 1676612 1684133 1688316
Blocks: 1674485 1698736
TreeView+ depends on / blocked
 
Reported: 2019-02-12 11:42 UTC by Niels de Vos
Modified: 2019-06-13 19:19 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, udev events in containers were not reliable. When Logical Volume Manager (LVM) activated the device, the LVM commands waited for udev to create the device nodes under /dev. As the device nodes did not get created, LVM was delayed or failed while activating the devices. As a fix, all interactions with udev for LVM commands executed within the Red Hat Gluster Storage server container are disabled. Hence, LVM commands do not wait for udev to create the device nodes under/dev, instead, it creates the device nodes itself.
Clone Of:
Environment:
Last Closed: 2019-06-13 19:18:59 UTC
Target Upstream Version:


Attachments (Terms of Use)
Attaching lvm conf file (92.80 KB, text/plain)
2019-04-03 13:46 UTC, RamaKasturi
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:1406 None None None 2019-06-13 19:19:09 UTC
Github gluster gluster-containers pull 126 'None' 'closed' 'Prevent LVM commands from using udev' 2019-11-16 23:11:24 UTC

Internal Links: 1676612 1688316

Description Niels de Vos 2019-02-12 11:42:57 UTC
Description of problem:

udev in containers is not expected to work correctly or stably. There should only be one udev handled on a system, and this should normally be on the host, not in a container.

When LVM tries to use udev in a container, it will most likely fail, and can cause (long) delays. While deploying heketi through openshift-ansbile, these delays can cause deploying to fail with messages like these:

[kubeexec] DEBUG 2019/02/11 11:09:02 heketi/pkg/remoteexec/kube/exec.go:81:kube.ExecCommands: Ran command [pvcreate -qq --metadatasize=128M --dataalignment=256K '/dev/xvdf'] on [pod:glusterfs-storage-g56jg c:glusterfs ns:app-storage (from host:ip-172-16-43-224.ap-south-1.compute.internal selector:glusterfs-node)]: Stdout []: Stderr [  WARNING: Device /dev/xvdf not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/dockervg/dockerlv not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/xvda2 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/xvdb1 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/xvdc not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/xvdd not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/xvdf not initialized in udev database even after waiting 10000000 microseconds.
]

The /etc/lvm/lvm.conf in the container image has 'obtain_device_list_from_udev = 1', which should be disabled.

Version-Release number of selected component (if applicable):
ocs-3.11.1

How reproducible:
100%

Steps to Reproduce:
1. check /etc/lvm/lvm.conf in the container
2. verify that obtain_device_list_from_udev is set

Additional info:
Note that an update for lvm2 is also required. Details are in https://bugzilla.redhat.com/show_bug.cgi?id=1674485#c8

Comment 1 RamaKasturi 2019-03-28 14:10:31 UTC
Acking this for 3.11.3

Comment 3 RamaKasturi 2019-04-03 13:46:04 UTC
Created attachment 1551394 [details]
Attaching lvm conf file

Comment 5 Niels de Vos 2019-04-23 15:25:18 UTC
https://access.redhat.com/errata/RHBA-2019:0814 has been released and that addresses bug 1688316. With that, the downgrade of lvm2 (and dependencies) is not required anymore. Can that be done through this BZ, or should the "use standard lvm2 version" be it's own BZ?

Comment 8 Raghavendra Talur 2019-05-07 11:45:14 UTC
*** Bug 1678446 has been marked as a duplicate of this bug. ***

Comment 9 Prashant Dhange 2019-05-08 00:35:30 UTC
*** Bug 1698736 has been marked as a duplicate of this bug. ***

Comment 10 RamaKasturi 2019-05-14 17:57:23 UTC
Moving the bug to verified state as i see that the following tests have been passed and did not observe any issues.

1) i see that the lvm package included in the container is same as what is  at comment 6.

[root@ip-172-16-45-176 /]# rpm -qa | grep lvm    
lvm2-libs-2.02.180-10.el7_6.7.x86_64
lvm2-2.02.180-10.el7_6.7.x86_64

2) Installed a fresh setup on vmware and did not hit any issue during the installation.

3) upgraded the setup on AWS from 3.11.2 to 3.11.3 and did not hit any issues.

4) Do not see any issues with pvs, pvscan , lvs & vgs commands.

[root@ip-172-16-45-176 /]# sudo pvscan
  PV /dev/xvdf    VG vg_9aa5d10bb7d969c127d9df28c6e7a88c   lvm2 [1.95 TiB / <942.33 GiB free]
  PV /dev/xvdb1   VG dockervg                              lvm2 [<100.00 GiB / 0    free]
  PV /dev/xvdg    VG vg_6c969e2f8f69881531e55340dd9323da   lvm2 [999.87 GiB / 999.87 GiB free]
  Total: 3 [<3.03 TiB] / in use: 3 [<3.03 TiB] / in no VG: 0 [0   ]
[root@ip-172-16-45-176 /]# sudo vgscan
  Reading volume groups from cache.
  Found volume group "vg_9aa5d10bb7d969c127d9df28c6e7a88c" using metadata type lvm2
  Found volume group "dockervg" using metadata type lvm2
  Found volume group "vg_6c969e2f8f69881531e55340dd9323da" using metadata type lvm2
[root@ip-172-16-45-176 /]# sudo vgs
  VG                                  #PV #LV  #SN Attr   VSize    VFree   
  dockervg                              1    1   0 wz--n- <100.00g       0 
  vg_6c969e2f8f69881531e55340dd9323da   1    0   0 wz--n-  999.87g  999.87g
  vg_9aa5d10bb7d969c127d9df28c6e7a88c   1 1286   0 wz--n-    1.95t <942.33g


5) Ran heketi-cli server state examine gluster but do not see any issues there as well.

6) Rebooted the node hosting gluster pod and ran pvscan, vgs & lvs and did not see any issues.

7) Created new file and block volume and they were successful.

8) Added a device and it was successful.

9) Rebooted & added a device to the node and it was successful too.

performed above steps on aws and vmware environments and did not see any issue.

Comment 12 Anjana KD 2019-06-03 12:47:01 UTC
Have updated the doc text. Kindly review it for technical accuracy.

Comment 15 errata-xmlrpc 2019-06-13 19:18:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1406


Note You need to log in before you can comment on or make changes to this bug.