RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2032993 - 69-dm-lvm-metad.rules is missing from the initrd
Summary: 69-dm-lvm-metad.rules is missing from the initrd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: lvm2
Version: 8.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: David Teigland
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
: 2026854 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-15 16:29 UTC by David Teigland
Modified: 2022-05-10 16:38 UTC (History)
20 users (show)

Fixed In Version: lvm2-2.03.14-2.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-10 15:22:14 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CLUSTERQE-5280 0 None None None 2022-01-19 17:02:06 UTC
Red Hat Issue Tracker CLUSTERQE-5513 0 None None None 2022-03-16 13:01:37 UTC
Red Hat Issue Tracker RHELPLAN-105924 0 None None None 2021-12-15 16:33:00 UTC
Red Hat Product Errata RHBA-2022:2038 0 None None None 2022-05-10 15:22:30 UTC

Description David Teigland 2021-12-15 16:29:05 UTC
Description of problem:

system boot can timeout because LVs are not activated when the root VG is on an md device.
this is because the VG is not autoactivated after switch root.
this is because systemd-udev-trigger does not generate a uevent for the md device after switch root.
this is because SYSTEMD_READY=0 in the udev db for the root md device.
this is because 64-lvm.rules in the initrd is not setting udev variables for the md device.


This line needs to be added to 64-lvm.rules in dracut:

KERNEL=="md[0-9]*", ACTION=="change", ENV{ID_FS_TYPE}=="LVM2_member", ENV{LVM_MD_PV_ACTIVATED}!="1", TEST=="md/array_state", ENV{LVM_MD_PV_ACTIVATED}="1"


If LVM_MD_PV_ACTIVATED is not set, then the lvm rule in the root fs sets SYSTEMD_READY=0, which is why no uevent is generated.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 David Teigland 2021-12-15 16:32:12 UTC
This issue was debugged in bug 2002640, but that bug was mistakenly used for a systemd bug that did not fix the original problem.

Comment 2 farrotin 2021-12-15 21:42:27 UTC
As initial bug was reported against Stream 9 , wondering if dracut and RHEL8 is the right "product"

Comment 3 farrotin 2021-12-16 07:59:13 UTC
OMG ! You scared me with that issue flagged for RHEL8 and so impacting almost all the centos.org infra fleet (using md/raid 1 device).
I reinstalled with kickstart one node with 8-stream and same issue as for 9-stream (other bug) : dropped to emergency shell

Comment 4 farrotin 2021-12-16 08:17:03 UTC
Do you want me to open again another bug for 8-stream ? as it's worth knowing that the added lines in /lib/dracut/modules.d/90lvm/64-lvm.rules don't fix the issue on 8-stream 

dracut-049-191.git20210920.el8.x86_64
systemd-239-51.el8.x86_64

the only way for me to boot the machine (until that's resolved) was to add "rd.lvm.lv=<vg_name>/home" to boot/cmdline ...
Quite becoming crucial for whole centos infra now (including for mirror.stream.centos.org pool, running on top of 8-stream), as we have already one machine in such state and we can't even reboot the rest of the infra

Comment 5 David Teigland 2021-12-16 14:37:30 UTC
(In reply to farrotin from comment #4)
> Do you want me to open again another bug for 8-stream ? 

You're welcome to open one, but we've never known what to do with centos stream bzs (in terms of release processes.)

> as it's worth knowing that the added lines in /lib/dracut/modules.d/90lvm/64-lvm.rules
> don't fix the issue on 8-stream 

Are you saying there is still problem booting after adding the new line to 64-lvm.rules?

> dracut-049-191.git20210920.el8.x86_64
> systemd-239-51.el8.x86_64

What version of the lvm2 package are you using?

Comment 6 farrotin 2021-12-16 14:42:15 UTC
http://mirror.centos.org/centos/8-stream/BaseOS/x86_64/os/Packages/lvm2-2.03.14-1.el8.x86_64.rpm is the current version for Stream 8 


If you suspect lvm2, worth knowing that I gave a try with a 8.5 deploy (so like RHEL 8.5) and it's working fine there (but will disappear end of this year) : 
http://mirror.centos.org/centos/8/BaseOS/x86_64/os/Packages/lvm2-2.03.12-10.el8.x86_64.rpm

Comment 7 David Teigland 2021-12-16 15:08:19 UTC
The lvm2-2.03.14-1.el8 build was a problem, it included a new lvm udev rule (meant for rhel9) which fundamentally changes the lvm autoactivation method.  That has been a severe disruption, and there should be a new lvm build reverting that change.

However, the a new lvm build will not fix the lvm udev rule in the initrd which comes from the dracut package. It seems the bad lvm package somehow exposed an old bug in the dracut udev rule and the connection is not yet clear.  Hopefully the good lvm build will go back to hiding the dracut udev rule bug.

Comment 8 David Teigland 2021-12-16 23:32:55 UTC
First a discussion about RHEL8.

I've been trying to sort out how the initrd/md/lvm/udev issue seemed to be related to the bad rhel8 lvm package.  I probably do not have all the details correct yet, but here's the rough theory:

In RHEL8, dracut includes both 64-lvm.rules and 69-dm-lvm-metad.rules in the initrd:

. 64-lvm.rules is the primary rule, specific to the initrd, which has the key job of activating the root LV.  

. 69-dm-lvm-metad.rules is the rule belonging to root where it has the primary job of starting the lvm2-pvscan service.

The inclusion and running of the 69 rule in the initrd is puzzling, since lvm2-pvscan (and the pvscan command) are not used in the initrd.  However, there are some secondary effects of the 69 rule, including setting LVM_MD_PV_ACTIVATED=1 in the udev db.  Again, this rule is running in the initrd where it has no primary function.  But, the udev state it has created is transferred to the root fs.

After switching to root, udevadm trigger runs, a uevent is generated for the md device, and the authentic 69-dm-lvm-metad.rule runs.  At this point it sees LVM_MD_PV_ACTIVATED is already 1, having been copied from the initrd, and it now continues and starts lvm2-pvscan for the md device.  This leads to the autoactivation of LVs from the md device.

So, there is an initial "fake" incarnation of the 69 rule run in the the initrd, only for the effect it has on udev db variables.  Then after switching to root, the "real" instance of the 69 rule runs, sees state from the fake instance, and continues with its proper job.  dracut leaves a clue about this by *editing* the 69 rule it copies into the initrd to insert a comment:  # No LVM pvscan in dracut - lvmetad is not running yet".  (Suffice it to say that my opinion of this design is less than positive.)

That is all background for explaining how I think this broke.

The lvm2-2.03.14-1.el8 build mistakenly included a new primary root udev rule for lvm, called 69-dm-lvm.rules, and removed 69-dm-lvm-metad.rules.  After installing this bad build, if the initrd was rebuilt, 69-dm-lvm-metad.rules would (I believe) disappear from the initrd and be replaced with nothing.  This means that the "fake" incarnation of 69-dm-lvm-metad.rules in the initrd would no longer exist, and we would miss the effect of setting LVM_MD_PV_ACTIVATED=1 in the initrd.  This means that the lvm udev rule running in root (either old or new rule) would no longer find LVM_MD_PV_ACTIVATED to be set, and would not start the lvm2-pvscan service for the md device.  Without lvm2-pvscan, LVs would not be autoactivated, leading to a boot timeout (assuming the root VG contained an LV such as home that was not activated directly by the initrd.)

How this can be fixed.  When a new correct RHEL8 lvm2 build is available (it seems to have been delayed), it will bring back 69-dm-lvm-meta.rules (and drop the unwanted 69-dm-lvm.rules.)  However, the initrd will also need to be recreated to restore that original 69 rule back in the initrd.

Comment 9 David Teigland 2021-12-16 23:55:38 UTC
Next a discussion about RHEL9.

In RHEL9 we are replacing 69-dm-lvm-metad.rules with 69-dm-lvm.rules.  They perform different styles of lvm autoactivation [1].

Because we did not understand that the old 69-dm-lvm-metad.rules had a subtle role in the life of root-on-lvm-on-md in the initrd, it has disappeared from the initrd in RHEL9 and is replaced with nothing.  So root-on-lvm-on-md is currently broken in RHEL9 also.  In RHEL9 we need a new solution for setting LVM_MD_PV_ACTIVATED in the initrd so that root-on-lvm-on-md can be autoactivated by 69-dm-lvm.rules in the root fs.

The solution could be the line added to 69-lvm.rules shown in comment 0, which has been shown to work.  Or, the solution may be to eliminate LVM_MD_PV_ACTIVATED altogether as suggested here https://bugzilla.redhat.com/show_bug.cgi?id=2002640#c62.  It would be nice to eliminate this complexity, but a complete solution for that is not yet known.


[1] This new man page has a description of both autoactivation methods:
https://sourceware.org/git/?p=lvm2.git;a=blob;f=man/lvmautoactivation.7_main

Comment 10 David Teigland 2021-12-17 00:00:42 UTC
Changing the component and subject for this RHEL8 bug since I believe this should be fixed by a new lvm2 build (and subsequent rebuilding initrd.)

Comment 11 David Teigland 2021-12-17 15:56:34 UTC
I'm going to use this bug for a new lvm build that restores the 69-dm-lvm-metad.rules file in rhel8 which was missing in lvm2-2.03.14-1.el8.

To test this,

1. install the updated lvm package and verify that /lib/udev/rules.d/69-dm-lvm-metad.rules exists.

(In the bad build, this file will not exist.)

2. reboot and verify that lvm2-pvscan services exist for each PV attached to the system.

# systemctl status lvm2-pvscan*

(In the bad build, these services will not exist.)

3. rebuild the initrd and verify that 69-dm-lvm-metad.rules is included in the initrd.

# lsinitrd | grep 69-dm-lvm-metad.rules 
-r--r--r--   1 root     root         5837 Sep 20 02:54 usr/lib/udev/rules.d/69-dm-lvm-metad.rules

(With the bad build, this will not exist if the initrd was rebuilt after installing the bad package.)


We could also verify that effects of the missing udev rule are also resolved, e.g. installing root on lvm on md, including a home LV that requires autoactivation.

Comment 12 farrotin 2021-12-17 17:20:08 UTC
Hi David .. thanks a lot for the detailed status update, really appreciated :)
Once you'll have even just a test build for lvm2, I can give it a try on a machine, just rebuild initrd with dracut and I'll report feedback here. (same in the other bug for stream 9 btw but different pkg normally)

Comment 13 David Teigland 2021-12-17 18:01:38 UTC
I created bug 2033737 to fix this in RHEL9, where the fix needs to be made in dracut.

Comment 14 Corey Marthaler 2021-12-21 00:30:10 UTC
I believe we were able to to reproduce this issue and verify with the latest rpms on one of our virt nodes with thinp root volumes.


   File "/usr/lib64/python3.6/site-packages/pyanaconda/threading.py", line 280, in run 
     threading.Thread.run(self) 
 dasbus.error.DBusError: 'LVMVolumeGroupDevice' object has no attribute 'vg'     

# Console
[root@host-085 ~]# systemctl status lvm2-pvscan*
[root@host-085 ~]# lsinitrd | grep 69-dm-lvm-metad.rules 
[root@host-085 ~]# rpm -qa | grep lvm2
lvm2-libs-2.03.14-1.el8.x86_64
lvm2-2.03.14-1.el8.x86_64
lvm2-lockd-2.03.14-1.el8.x86_64


# Upgrade to latest
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]
Updating / installing...
   1:lvm2-debuginfo-8:2.03.14-2.el8   ################################# [  4%]
   2:device-mapper-libs-8:1.02.181-2.e################################# [  8%]
   3:device-mapper-8:1.02.181-2.el8   ################################# [ 13%]
   4:device-mapper-event-libs-8:1.02.1################################# [ 17%]
   5:device-mapper-event-8:1.02.181-2.################################# [ 21%]
   6:lvm2-libs-8:2.03.14-2.el8        ################################# [ 25%]
   7:lvm2-8:2.03.14-2.el8             ################################# [ 29%]
   8:device-mapper-event-devel-8:1.02.################################# [ 33%]
   9:device-mapper-devel-8:1.02.181-2.################################# [ 38%]
  10:lvm2-devel-8:2.03.14-2.el8       ################################# [ 42%]
  11:lvm2-lockd-8:2.03.14-2.el8       ################################# [ 46%]
  12:device-mapper-debuginfo-8:1.02.18################################# [ 50%]
  13:device-mapper-event-debuginfo-8:1################################# [ 54%]
  14:device-mapper-event-libs-debuginf################################# [ 58%]
  15:device-mapper-libs-debuginfo-8:1.################################# [ 63%]
  16:lvm2-libs-debuginfo-8:2.03.14-2.e################################# [ 67%]
  17:lvm2-lockd-debuginfo-8:2.03.14-2.################################# [ 71%]


[root@host-085 ~]# dracut -f
[ 2905.160490] restraintd[2552]: *** Current Time: Mon Dec 20 18:20:42 2021 Localwatchdog at:  * Disabled! *
[root@host-085 ~]# 

[root@host-085 ~]# lsinitrd | grep 69-dm-lvm-metad.rules
-r--r--r--   1 root     root         5837 Sep 20 01:54 usr/lib/udev/rules.d/69-dm-lvm-metad.rules


# REBOOT

[root@host-085 ~]# systemctl status lvm2-pvscan*
รข lvm2-pvscan@252:2.service - LVM event activation on device 252:2
   Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static; vendor>
   Active: active (exited) since Mon 2021-12-20 18:23:29 CST; 1min 30s ago
     Docs: man:pvscan(8)
  Process: 744 ExecStart=/usr/sbin/lvm pvscan --cache --activate ay 252:2 (code>
 Main PID: 744 (code=exited, status=0/SUCCESS)

Dec 20 18:23:28 localhost.localdomain systemd[1]: Starting LVM event activation>
Dec 20 18:23:28 localhost.localdomain lvm[744]:   pvscan[744] PV /dev/vda2 onli>
Dec 20 18:23:28 localhost.localdomain lvm[744]:   pvscan[744] VG rhel_host-085 >
Dec 20 18:23:28 localhost.localdomain lvm[744]:   3 logical volume(s) in volume>
Dec 20 18:23:29 localhost.localdomain systemd[1]: Started LVM event activation >


[root@host-085 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  [lvol0_pmspare] rhel_host-085 ewi-------  16.00m                                                       /dev/vda2(0)   
  pool00          rhel_host-085 twi-aotz-- <13.15g               30.36  22.85                            pool00_tdata(0)
  [pool00_tdata]  rhel_host-085 Twi-ao---- <13.15g                                                       /dev/vda2(4)   
  [pool00_tmeta]  rhel_host-085 ewi-ao----  16.00m                                                       /dev/vda2(3370)
  root            rhel_host-085 Vwi-aotz-- <13.15g pool00        30.36                                                  
  swap            rhel_host-085 -wi-ao----   2.00g                                                       /dev/vda2(3374)

Comment 17 Corey Marthaler 2021-12-22 03:39:40 UTC
Marking VERIFIED since comment #14 was run on the latest rpms.

Comment 18 David Teigland 2022-01-11 19:24:44 UTC
*** Bug 2026854 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2022-05-10 15:22:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (lvm2 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2038


Note You need to log in before you can comment on or make changes to this bug.