Description of problem: udev in containers is not expected to work correctly or stably. There should only be one udev handled on a system, and this should normally be on the host, not in a container. When LVM tries to use udev in a container, it will most likely fail, and can cause (long) delays. While deploying heketi through openshift-ansbile, these delays can cause deploying to fail with messages like these: [kubeexec] DEBUG 2019/02/11 11:09:02 heketi/pkg/remoteexec/kube/exec.go:81:kube.ExecCommands: Ran command [pvcreate -qq --metadatasize=128M --dataalignment=256K '/dev/xvdf'] on [pod:glusterfs-storage-g56jg c:glusterfs ns:app-storage (from host:ip-172-16-43-224.ap-south-1.compute.internal selector:glusterfs-node)]: Stdout []: Stderr [ WARNING: Device /dev/xvdf not initialized in udev database even after waiting 10000000 microseconds. WARNING: Device /dev/dockervg/dockerlv not initialized in udev database even after waiting 10000000 microseconds. WARNING: Device /dev/xvda2 not initialized in udev database even after waiting 10000000 microseconds. WARNING: Device /dev/xvdb1 not initialized in udev database even after waiting 10000000 microseconds. WARNING: Device /dev/xvdc not initialized in udev database even after waiting 10000000 microseconds. WARNING: Device /dev/xvdd not initialized in udev database even after waiting 10000000 microseconds. WARNING: Device /dev/xvdf not initialized in udev database even after waiting 10000000 microseconds. ] The /etc/lvm/lvm.conf in the container image has 'obtain_device_list_from_udev = 1', which should be disabled. Version-Release number of selected component (if applicable): ocs-3.11.1 How reproducible: 100% Steps to Reproduce: 1. check /etc/lvm/lvm.conf in the container 2. verify that obtain_device_list_from_udev is set Additional info: Note that an update for lvm2 is also required. Details are in https://bugzilla.redhat.com/show_bug.cgi?id=1674485#c8
Acking this for 3.11.3
Created attachment 1551394 [details] Attaching lvm conf file
https://access.redhat.com/errata/RHBA-2019:0814 has been released and that addresses bug 1688316. With that, the downgrade of lvm2 (and dependencies) is not required anymore. Can that be done through this BZ, or should the "use standard lvm2 version" be it's own BZ?
*** Bug 1678446 has been marked as a duplicate of this bug. ***
*** Bug 1698736 has been marked as a duplicate of this bug. ***
Moving the bug to verified state as i see that the following tests have been passed and did not observe any issues. 1) i see that the lvm package included in the container is same as what is at comment 6. [root@ip-172-16-45-176 /]# rpm -qa | grep lvm lvm2-libs-2.02.180-10.el7_6.7.x86_64 lvm2-2.02.180-10.el7_6.7.x86_64 2) Installed a fresh setup on vmware and did not hit any issue during the installation. 3) upgraded the setup on AWS from 3.11.2 to 3.11.3 and did not hit any issues. 4) Do not see any issues with pvs, pvscan , lvs & vgs commands. [root@ip-172-16-45-176 /]# sudo pvscan PV /dev/xvdf VG vg_9aa5d10bb7d969c127d9df28c6e7a88c lvm2 [1.95 TiB / <942.33 GiB free] PV /dev/xvdb1 VG dockervg lvm2 [<100.00 GiB / 0 free] PV /dev/xvdg VG vg_6c969e2f8f69881531e55340dd9323da lvm2 [999.87 GiB / 999.87 GiB free] Total: 3 [<3.03 TiB] / in use: 3 [<3.03 TiB] / in no VG: 0 [0 ] [root@ip-172-16-45-176 /]# sudo vgscan Reading volume groups from cache. Found volume group "vg_9aa5d10bb7d969c127d9df28c6e7a88c" using metadata type lvm2 Found volume group "dockervg" using metadata type lvm2 Found volume group "vg_6c969e2f8f69881531e55340dd9323da" using metadata type lvm2 [root@ip-172-16-45-176 /]# sudo vgs VG #PV #LV #SN Attr VSize VFree dockervg 1 1 0 wz--n- <100.00g 0 vg_6c969e2f8f69881531e55340dd9323da 1 0 0 wz--n- 999.87g 999.87g vg_9aa5d10bb7d969c127d9df28c6e7a88c 1 1286 0 wz--n- 1.95t <942.33g 5) Ran heketi-cli server state examine gluster but do not see any issues there as well. 6) Rebooted the node hosting gluster pod and ran pvscan, vgs & lvs and did not see any issues. 7) Created new file and block volume and they were successful. 8) Added a device and it was successful. 9) Rebooted & added a device to the node and it was successful too. performed above steps on aws and vmware environments and did not see any issue.
Have updated the doc text. Kindly review it for technical accuracy.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1406