Bug 1676466

Summary: LVM in the glusterfs container should not try to use udev
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Niels de Vos <ndevos>
Component: rhgs-server-containerAssignee: Saravanakumar <sarumuga>
Status: CLOSED ERRATA QA Contact: RamaKasturi <knarra>
Severity: urgent Docs Contact:
Priority: urgent    
Version: ocs-3.11CC: akrishna, hchiramm, jmulligan, knarra, kramdoss, madam, pasik, psony, puebele, rcyriac, rhs-bugs, rtalur, sankarshan, sarumuga, xmorano
Target Milestone: ---   
Target Release: OCS 3.11.z Batch Update 3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, udev events in containers were not reliable. When Logical Volume Manager (LVM) activated the device, the LVM commands waited for udev to create the device nodes under /dev. As the device nodes did not get created, LVM was delayed or failed while activating the devices. As a fix, all interactions with udev for LVM commands executed within the Red Hat Gluster Storage server container are disabled. Hence, LVM commands do not wait for udev to create the device nodes under/dev, instead, it creates the device nodes itself.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-13 19:18:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1676612, 1684133, 1688316    
Bug Blocks: 1674485, 1698736    
Attachments:
Description Flags
Attaching lvm conf file none

Description Niels de Vos 2019-02-12 11:42:57 UTC
Description of problem:

udev in containers is not expected to work correctly or stably. There should only be one udev handled on a system, and this should normally be on the host, not in a container.

When LVM tries to use udev in a container, it will most likely fail, and can cause (long) delays. While deploying heketi through openshift-ansbile, these delays can cause deploying to fail with messages like these:

[kubeexec] DEBUG 2019/02/11 11:09:02 heketi/pkg/remoteexec/kube/exec.go:81:kube.ExecCommands: Ran command [pvcreate -qq --metadatasize=128M --dataalignment=256K '/dev/xvdf'] on [pod:glusterfs-storage-g56jg c:glusterfs ns:app-storage (from host:ip-172-16-43-224.ap-south-1.compute.internal selector:glusterfs-node)]: Stdout []: Stderr [  WARNING: Device /dev/xvdf not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/dockervg/dockerlv not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/xvda2 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/xvdb1 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/xvdc not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/xvdd not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/xvdf not initialized in udev database even after waiting 10000000 microseconds.
]

The /etc/lvm/lvm.conf in the container image has 'obtain_device_list_from_udev = 1', which should be disabled.

Version-Release number of selected component (if applicable):
ocs-3.11.1

How reproducible:
100%

Steps to Reproduce:
1. check /etc/lvm/lvm.conf in the container
2. verify that obtain_device_list_from_udev is set

Additional info:
Note that an update for lvm2 is also required. Details are in https://bugzilla.redhat.com/show_bug.cgi?id=1674485#c8

Comment 1 RamaKasturi 2019-03-28 14:10:31 UTC
Acking this for 3.11.3

Comment 3 RamaKasturi 2019-04-03 13:46:04 UTC
Created attachment 1551394 [details]
Attaching lvm conf file

Comment 5 Niels de Vos 2019-04-23 15:25:18 UTC
https://access.redhat.com/errata/RHBA-2019:0814 has been released and that addresses bug 1688316. With that, the downgrade of lvm2 (and dependencies) is not required anymore. Can that be done through this BZ, or should the "use standard lvm2 version" be it's own BZ?

Comment 8 Raghavendra Talur 2019-05-07 11:45:14 UTC
*** Bug 1678446 has been marked as a duplicate of this bug. ***

Comment 9 Prashant Dhange 2019-05-08 00:35:30 UTC
*** Bug 1698736 has been marked as a duplicate of this bug. ***

Comment 10 RamaKasturi 2019-05-14 17:57:23 UTC
Moving the bug to verified state as i see that the following tests have been passed and did not observe any issues.

1) i see that the lvm package included in the container is same as what is  at comment 6.

[root@ip-172-16-45-176 /]# rpm -qa | grep lvm    
lvm2-libs-2.02.180-10.el7_6.7.x86_64
lvm2-2.02.180-10.el7_6.7.x86_64

2) Installed a fresh setup on vmware and did not hit any issue during the installation.

3) upgraded the setup on AWS from 3.11.2 to 3.11.3 and did not hit any issues.

4) Do not see any issues with pvs, pvscan , lvs & vgs commands.

[root@ip-172-16-45-176 /]# sudo pvscan
  PV /dev/xvdf    VG vg_9aa5d10bb7d969c127d9df28c6e7a88c   lvm2 [1.95 TiB / <942.33 GiB free]
  PV /dev/xvdb1   VG dockervg                              lvm2 [<100.00 GiB / 0    free]
  PV /dev/xvdg    VG vg_6c969e2f8f69881531e55340dd9323da   lvm2 [999.87 GiB / 999.87 GiB free]
  Total: 3 [<3.03 TiB] / in use: 3 [<3.03 TiB] / in no VG: 0 [0   ]
[root@ip-172-16-45-176 /]# sudo vgscan
  Reading volume groups from cache.
  Found volume group "vg_9aa5d10bb7d969c127d9df28c6e7a88c" using metadata type lvm2
  Found volume group "dockervg" using metadata type lvm2
  Found volume group "vg_6c969e2f8f69881531e55340dd9323da" using metadata type lvm2
[root@ip-172-16-45-176 /]# sudo vgs
  VG                                  #PV #LV  #SN Attr   VSize    VFree   
  dockervg                              1    1   0 wz--n- <100.00g       0 
  vg_6c969e2f8f69881531e55340dd9323da   1    0   0 wz--n-  999.87g  999.87g
  vg_9aa5d10bb7d969c127d9df28c6e7a88c   1 1286   0 wz--n-    1.95t <942.33g


5) Ran heketi-cli server state examine gluster but do not see any issues there as well.

6) Rebooted the node hosting gluster pod and ran pvscan, vgs & lvs and did not see any issues.

7) Created new file and block volume and they were successful.

8) Added a device and it was successful.

9) Rebooted & added a device to the node and it was successful too.

performed above steps on aws and vmware environments and did not see any issue.

Comment 12 Anjana KD 2019-06-03 12:47:01 UTC
Have updated the doc text. Kindly review it for technical accuracy.

Comment 15 errata-xmlrpc 2019-06-13 19:18:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1406