Bug 2032993
Summary: | 69-dm-lvm-metad.rules is missing from the initrd | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | David Teigland <teigland> |
Component: | lvm2 | Assignee: | David Teigland <teigland> |
lvm2 sub component: | Udev | QA Contact: | cluster-qe <cluster-qe> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | agk, ajb, cmarthal, dracut-maint-list, dtardon, enrico.tagliavini, farrotin, germano.massullo, heinzm, jbrassow, jhughes, jrd-rhbz, mcsontos, msnitzer, pasik, pasteur, phil, prajnoha, toracat, zkabelac |
Version: | 8.0 | Keywords: | Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | lvm2-2.03.14-2.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-05-10 15:22:14 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Teigland
2021-12-15 16:29:05 UTC
This issue was debugged in bug 2002640, but that bug was mistakenly used for a systemd bug that did not fix the original problem. As initial bug was reported against Stream 9 , wondering if dracut and RHEL8 is the right "product" OMG ! You scared me with that issue flagged for RHEL8 and so impacting almost all the centos.org infra fleet (using md/raid 1 device). I reinstalled with kickstart one node with 8-stream and same issue as for 9-stream (other bug) : dropped to emergency shell Do you want me to open again another bug for 8-stream ? as it's worth knowing that the added lines in /lib/dracut/modules.d/90lvm/64-lvm.rules don't fix the issue on 8-stream dracut-049-191.git20210920.el8.x86_64 systemd-239-51.el8.x86_64 the only way for me to boot the machine (until that's resolved) was to add "rd.lvm.lv=<vg_name>/home" to boot/cmdline ... Quite becoming crucial for whole centos infra now (including for mirror.stream.centos.org pool, running on top of 8-stream), as we have already one machine in such state and we can't even reboot the rest of the infra (In reply to farrotin from comment #4) > Do you want me to open again another bug for 8-stream ? You're welcome to open one, but we've never known what to do with centos stream bzs (in terms of release processes.) > as it's worth knowing that the added lines in /lib/dracut/modules.d/90lvm/64-lvm.rules > don't fix the issue on 8-stream Are you saying there is still problem booting after adding the new line to 64-lvm.rules? > dracut-049-191.git20210920.el8.x86_64 > systemd-239-51.el8.x86_64 What version of the lvm2 package are you using? http://mirror.centos.org/centos/8-stream/BaseOS/x86_64/os/Packages/lvm2-2.03.14-1.el8.x86_64.rpm is the current version for Stream 8 If you suspect lvm2, worth knowing that I gave a try with a 8.5 deploy (so like RHEL 8.5) and it's working fine there (but will disappear end of this year) : http://mirror.centos.org/centos/8/BaseOS/x86_64/os/Packages/lvm2-2.03.12-10.el8.x86_64.rpm The lvm2-2.03.14-1.el8 build was a problem, it included a new lvm udev rule (meant for rhel9) which fundamentally changes the lvm autoactivation method. That has been a severe disruption, and there should be a new lvm build reverting that change. However, the a new lvm build will not fix the lvm udev rule in the initrd which comes from the dracut package. It seems the bad lvm package somehow exposed an old bug in the dracut udev rule and the connection is not yet clear. Hopefully the good lvm build will go back to hiding the dracut udev rule bug. First a discussion about RHEL8. I've been trying to sort out how the initrd/md/lvm/udev issue seemed to be related to the bad rhel8 lvm package. I probably do not have all the details correct yet, but here's the rough theory: In RHEL8, dracut includes both 64-lvm.rules and 69-dm-lvm-metad.rules in the initrd: . 64-lvm.rules is the primary rule, specific to the initrd, which has the key job of activating the root LV. . 69-dm-lvm-metad.rules is the rule belonging to root where it has the primary job of starting the lvm2-pvscan service. The inclusion and running of the 69 rule in the initrd is puzzling, since lvm2-pvscan (and the pvscan command) are not used in the initrd. However, there are some secondary effects of the 69 rule, including setting LVM_MD_PV_ACTIVATED=1 in the udev db. Again, this rule is running in the initrd where it has no primary function. But, the udev state it has created is transferred to the root fs. After switching to root, udevadm trigger runs, a uevent is generated for the md device, and the authentic 69-dm-lvm-metad.rule runs. At this point it sees LVM_MD_PV_ACTIVATED is already 1, having been copied from the initrd, and it now continues and starts lvm2-pvscan for the md device. This leads to the autoactivation of LVs from the md device. So, there is an initial "fake" incarnation of the 69 rule run in the the initrd, only for the effect it has on udev db variables. Then after switching to root, the "real" instance of the 69 rule runs, sees state from the fake instance, and continues with its proper job. dracut leaves a clue about this by *editing* the 69 rule it copies into the initrd to insert a comment: # No LVM pvscan in dracut - lvmetad is not running yet". (Suffice it to say that my opinion of this design is less than positive.) That is all background for explaining how I think this broke. The lvm2-2.03.14-1.el8 build mistakenly included a new primary root udev rule for lvm, called 69-dm-lvm.rules, and removed 69-dm-lvm-metad.rules. After installing this bad build, if the initrd was rebuilt, 69-dm-lvm-metad.rules would (I believe) disappear from the initrd and be replaced with nothing. This means that the "fake" incarnation of 69-dm-lvm-metad.rules in the initrd would no longer exist, and we would miss the effect of setting LVM_MD_PV_ACTIVATED=1 in the initrd. This means that the lvm udev rule running in root (either old or new rule) would no longer find LVM_MD_PV_ACTIVATED to be set, and would not start the lvm2-pvscan service for the md device. Without lvm2-pvscan, LVs would not be autoactivated, leading to a boot timeout (assuming the root VG contained an LV such as home that was not activated directly by the initrd.) How this can be fixed. When a new correct RHEL8 lvm2 build is available (it seems to have been delayed), it will bring back 69-dm-lvm-meta.rules (and drop the unwanted 69-dm-lvm.rules.) However, the initrd will also need to be recreated to restore that original 69 rule back in the initrd. Next a discussion about RHEL9. In RHEL9 we are replacing 69-dm-lvm-metad.rules with 69-dm-lvm.rules. They perform different styles of lvm autoactivation [1]. Because we did not understand that the old 69-dm-lvm-metad.rules had a subtle role in the life of root-on-lvm-on-md in the initrd, it has disappeared from the initrd in RHEL9 and is replaced with nothing. So root-on-lvm-on-md is currently broken in RHEL9 also. In RHEL9 we need a new solution for setting LVM_MD_PV_ACTIVATED in the initrd so that root-on-lvm-on-md can be autoactivated by 69-dm-lvm.rules in the root fs. The solution could be the line added to 69-lvm.rules shown in comment 0, which has been shown to work. Or, the solution may be to eliminate LVM_MD_PV_ACTIVATED altogether as suggested here https://bugzilla.redhat.com/show_bug.cgi?id=2002640#c62. It would be nice to eliminate this complexity, but a complete solution for that is not yet known. [1] This new man page has a description of both autoactivation methods: https://sourceware.org/git/?p=lvm2.git;a=blob;f=man/lvmautoactivation.7_main Changing the component and subject for this RHEL8 bug since I believe this should be fixed by a new lvm2 build (and subsequent rebuilding initrd.) I'm going to use this bug for a new lvm build that restores the 69-dm-lvm-metad.rules file in rhel8 which was missing in lvm2-2.03.14-1.el8. To test this, 1. install the updated lvm package and verify that /lib/udev/rules.d/69-dm-lvm-metad.rules exists. (In the bad build, this file will not exist.) 2. reboot and verify that lvm2-pvscan services exist for each PV attached to the system. # systemctl status lvm2-pvscan* (In the bad build, these services will not exist.) 3. rebuild the initrd and verify that 69-dm-lvm-metad.rules is included in the initrd. # lsinitrd | grep 69-dm-lvm-metad.rules -r--r--r-- 1 root root 5837 Sep 20 02:54 usr/lib/udev/rules.d/69-dm-lvm-metad.rules (With the bad build, this will not exist if the initrd was rebuilt after installing the bad package.) We could also verify that effects of the missing udev rule are also resolved, e.g. installing root on lvm on md, including a home LV that requires autoactivation. Hi David .. thanks a lot for the detailed status update, really appreciated :) Once you'll have even just a test build for lvm2, I can give it a try on a machine, just rebuild initrd with dracut and I'll report feedback here. (same in the other bug for stream 9 btw but different pkg normally) I created bug 2033737 to fix this in RHEL9, where the fix needs to be made in dracut. I believe we were able to to reproduce this issue and verify with the latest rpms on one of our virt nodes with thinp root volumes. File "/usr/lib64/python3.6/site-packages/pyanaconda/threading.py", line 280, in run threading.Thread.run(self) dasbus.error.DBusError: 'LVMVolumeGroupDevice' object has no attribute 'vg' # Console [root@host-085 ~]# systemctl status lvm2-pvscan* [root@host-085 ~]# lsinitrd | grep 69-dm-lvm-metad.rules [root@host-085 ~]# rpm -qa | grep lvm2 lvm2-libs-2.03.14-1.el8.x86_64 lvm2-2.03.14-1.el8.x86_64 lvm2-lockd-2.03.14-1.el8.x86_64 # Upgrade to latest Verifying... ################################# [100%] Preparing... ################################# [100%] Updating / installing... 1:lvm2-debuginfo-8:2.03.14-2.el8 ################################# [ 4%] 2:device-mapper-libs-8:1.02.181-2.e################################# [ 8%] 3:device-mapper-8:1.02.181-2.el8 ################################# [ 13%] 4:device-mapper-event-libs-8:1.02.1################################# [ 17%] 5:device-mapper-event-8:1.02.181-2.################################# [ 21%] 6:lvm2-libs-8:2.03.14-2.el8 ################################# [ 25%] 7:lvm2-8:2.03.14-2.el8 ################################# [ 29%] 8:device-mapper-event-devel-8:1.02.################################# [ 33%] 9:device-mapper-devel-8:1.02.181-2.################################# [ 38%] 10:lvm2-devel-8:2.03.14-2.el8 ################################# [ 42%] 11:lvm2-lockd-8:2.03.14-2.el8 ################################# [ 46%] 12:device-mapper-debuginfo-8:1.02.18################################# [ 50%] 13:device-mapper-event-debuginfo-8:1################################# [ 54%] 14:device-mapper-event-libs-debuginf################################# [ 58%] 15:device-mapper-libs-debuginfo-8:1.################################# [ 63%] 16:lvm2-libs-debuginfo-8:2.03.14-2.e################################# [ 67%] 17:lvm2-lockd-debuginfo-8:2.03.14-2.################################# [ 71%] [root@host-085 ~]# dracut -f [ 2905.160490] restraintd[2552]: *** Current Time: Mon Dec 20 18:20:42 2021 Localwatchdog at: * Disabled! * [root@host-085 ~]# [root@host-085 ~]# lsinitrd | grep 69-dm-lvm-metad.rules -r--r--r-- 1 root root 5837 Sep 20 01:54 usr/lib/udev/rules.d/69-dm-lvm-metad.rules # REBOOT [root@host-085 ~]# systemctl status lvm2-pvscan* รข lvm2-pvscan@252:2.service - LVM event activation on device 252:2 Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static; vendor> Active: active (exited) since Mon 2021-12-20 18:23:29 CST; 1min 30s ago Docs: man:pvscan(8) Process: 744 ExecStart=/usr/sbin/lvm pvscan --cache --activate ay 252:2 (code> Main PID: 744 (code=exited, status=0/SUCCESS) Dec 20 18:23:28 localhost.localdomain systemd[1]: Starting LVM event activation> Dec 20 18:23:28 localhost.localdomain lvm[744]: pvscan[744] PV /dev/vda2 onli> Dec 20 18:23:28 localhost.localdomain lvm[744]: pvscan[744] VG rhel_host-085 > Dec 20 18:23:28 localhost.localdomain lvm[744]: 3 logical volume(s) in volume> Dec 20 18:23:29 localhost.localdomain systemd[1]: Started LVM event activation > [root@host-085 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices [lvol0_pmspare] rhel_host-085 ewi------- 16.00m /dev/vda2(0) pool00 rhel_host-085 twi-aotz-- <13.15g 30.36 22.85 pool00_tdata(0) [pool00_tdata] rhel_host-085 Twi-ao---- <13.15g /dev/vda2(4) [pool00_tmeta] rhel_host-085 ewi-ao---- 16.00m /dev/vda2(3370) root rhel_host-085 Vwi-aotz-- <13.15g pool00 30.36 swap rhel_host-085 -wi-ao---- 2.00g /dev/vda2(3374) Marking VERIFIED since comment #14 was run on the latest rpms. *** Bug 2026854 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (lvm2 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2038 |