Bug 2026854
| Summary: | [regression] CentOS 8 Stream doesn't boot any longer after update, LVM volumes are not activated at boot (only root and swap volumes are) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Enrico Tagliavini <enrico.tagliavini> | ||||||
| Component: | lvm2 | Assignee: | David Teigland <teigland> | ||||||
| lvm2 sub component: | Activating existing Logical Volumes | QA Contact: | cluster-qe <cluster-qe> | ||||||
| Status: | CLOSED DUPLICATE | Docs Contact: | |||||||
| Severity: | urgent | ||||||||
| Priority: | urgent | CC: | agk, bstinson, heinzm, jbrassow, jwboyer, kdreyer, martin, mcsontos, msnitzer, ncroxon, prajnoha, teigland, xni, zkabelac | ||||||
| Version: | CentOS Stream | Keywords: | Triaged | ||||||
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2022-01-11 19:24:44 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
It would appear this problem might happen only if said lvm volumes are on a software RAID. I've got another computer, one disk only, but same software config, and it doesn't show the problem. An incorrect lvm build got into centos 8 which used the autoactivation code meant for RHEL9. I'd like to make sure we have an automated test that reflects your usage. I think we could probably get what we need to replicate it if you could include "vgs", "lvs -a", and "pvs -o+devices". Thanks Created attachment 1844169 [details]
lvm config as requested
Sure thing, here it is. I also added /proc/mdstat just in case the RAID is important to reproduce the issue.
Let me know if you need more information.
Thank you.
Thanks, I'm working on setting up a system like that. Could you also check for any messages related to lvm or md from the failed startup if that's still available? e.g. messages from journalctl (or /var/log/messages) or systemctl status lvm-activate-* I can tell you that looking at the boot process I could see systemd was waiting for the respective devices to be available right after the switch root. The devices would never come online and eventually timed out. The emergency shell of the initramfs kicks in, I put the root password in and I could see only the root and swap LV were active. vgchange -aay and all came online immediately. It simply looks like the rd.lvm behavior is similar to the rd.luks behavior. If you don't put it in the kernel command line it's not going to be activated. This is rather not ideal as it makes fstab crypttab and so on useless and creates confusion. The journal of those boot was not saved as /var was not mounted so I cannot check them, ditto for messages. I cannot restart the machine right now as it's in use, but if I have a chance I'll do it and have a look, but I don't expect any error, it simply looks like it was not even trying. Thank you. Makes sense, thanks, I'll try to reproduce it to see what's happening. When rd.lvm LVs are specified, the initrd only activates those named LVs directly. The other LVs (which led to the timeout) are supposed to be activated when the entire VG is autoactivated following switch root. For some reason the VG was not autoactivated. That's supposed to be triggered by coldplug uevents, but I suspect that a coldplug event was not generated for the MD device for some reason (I've heard of this happening once before but didn't track it down.) All that said, these are new autoactivation issues that are really RHEL9 material, and with a correct RHEL8 build this should go away. I created a vm with a similar setup and it seemed to boot fine once, but another reboot and I got a timeout. The problem seems to be that /dev/md0 was not assembled. The /dev/md0 device node existed. There are error messages about mdadm -I /dev/sda1 and mdadm -I /dev/sdb1 (which are for md0) which probably explains why md0 wasn't assembled. Perhaps mdadm -I fails due to some interference with other programs opening the device. Some discussion in bug 1991596 sounds like it could be related. I'll attach the rdsosreport.txt from the failed boot. Some of the info I collected: [ OK ] Reached target Basic System. [ OK ] Found device /dev/mapper/vg00-root. [ OK ] Reached target Initrd Root Device. [ 135.722441] dracut-initqueue[612]: Warning: dracut-initqueue timeout - starting timeout scripts [ 135.743543] dracut-initqueue[612]: mdadm: /dev/md0 does not appear to be an md device (a lot of dracut initqueue timeout messages) Starting Dracut Emergency Shell... Warning: /dev/disk/by-id/md-uuid-5be05f0d:a4720975:d25142a1:91ebac08 does not exist dracut:/# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdb2[1] sda2[0] 8379392 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md2 : active raid1 sdb3[1] sda3[0] 1046528 blocks super 1.2 [2/2] [UU] unused devices: <none> dracut:/# ls -l /dev/disk/by-id/md-* lrwxrwxrwx 1 root root 9 Dec 3 18:17 /dev/disk/by-id/md-name-localhost.localdomain:1 -> ../../md1 lrwxrwxrwx 1 root root 9 Dec 3 18:17 /dev/disk/by-id/md-name-localhost.localdomain:2 -> ../../md2 lrwxrwxrwx 1 root root 9 Dec 3 18:17 /dev/disk/by-id/md-uuid-0f6f14be:eef09ab9:6667ef2f:da7fc59c -> ../../md2 lrwxrwxrwx 1 root root 9 Dec 3 18:17 /dev/disk/by-id/md-uuid-1f1ae97e:91e31d11:b51d73a1:71ef886e -> ../../md1 dracut:/# lvm pvs -a PV VG Fmt Attr PSize PFree /dev/md1 vg00 lvm2 a-- <7.94g 320.00m /dev/md2 --- 0 0 dracut:/# lvm vgs VG #PV #LV #SN Attr VSize VFree vg00 1 6 0 wz--n- <7.94g 320.00m dracut:/# lvm lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert home vg00 Vwi---tz-- 512.00m tpool [lvol0_pmspare] vg00 ewi------- 64.00m root vg00 Vwi-a-tz-- 4.00g tpool 15.54 tmp vg00 Vwi---tz-- 512.00m tpool tpool vg00 twi---tz-- 7.50g 38.94 12.21 [tpool_tdata] vg00 Twi-ao---- 7.50g [tpool_tmeta] vg00 ewi-ao---- 64.00m usr vg00 Vwi-a-tz-- 2.00g tpool 92.94 var vg00 Vwi---tz-- 512.00m tpool [ 2.973029] vm3 systemd-udevd[617]: Process '/sbin/mdadm -I /dev/sda1' failedd with exit code 2. [ 3.018272] vm3 systemd-udevd[608]: Process '/sbin/mdadm -I /dev/sdb1' failedd with exit code 2. dracut:/# blkid /dev/sda1 /dev/sda1: UUID="5be05f0d-a472-0975-d251-42a191ebac08" UUID_SUB="02776d33-e43e-aedd-110e-02b18e2516a1" LABEL="localhost.localdomain:0" TYPE="linux_raid_member" PARTUUID="e3ef0b21-01" dracut:/# blkid /dev/sda2 /dev/sda2: UUID="1f1ae97e-91e3-1d11-b51d-73a171ef886e" UUID_SUB="45865d07-70e0-9ee8-437c-cede90a748ff" LABEL="localhost.localdomain:1" TYPE="linux_raid_member" PARTUUID="e3ef0b21-02" dracut:/# ls -l /dev/md* brw-rw---- 1 root disk 9, 0 Dec 3 18:17 /dev/md0 brw-rw---- 1 root disk 9, 1 Dec 3 18:17 /dev/md1 brw-rw---- 1 root disk 9, 2 Dec 3 18:17 /dev/md2 /dev/md: total 0 lrwxrwxrwx 1 root root 6 Dec 3 18:17 1 -> ../md1 lrwxrwxrwx 1 root root 6 Dec 3 18:17 2 -> ../md2 dracut:/# rm /dev/md0 dracut:/# mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 mdadm: Fail to c[ 937.935626] md: md0 stopped. reate md0 when using /sys/module/md_mod/parameters/new_array, fallback to creation via node mdadm: Unable to initialize sysfs dracut:/# [ 937.987263] md/raid1:md0: active with 2 out of 2 mirrors [ 938.002304] md0: detected capacity change from 0 to 535822336 dracut:/# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[1] sda1[0] 523264 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md1 : active raid1 sdb2[1] sda2[0] 8379392 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md2 : active raid1 sdb3[1] sda3[0] 1046528 blocks super 1.2 [2/2] [UU] unused devices: <none> The initial setup prior to the reboot failure: # cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[1] sda1[0] 523264 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md1 : active raid1 sdb2[1] sda2[0] 8379392 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk md2 : active raid1 sdb3[1] sda3[0] 1046528 blocks super 1.2 [2/2] [UU] unused devices: <none> # pvs -a PV VG Fmt Attr PSize PFree /dev/md0 --- 0 0 /dev/md1 vg00 lvm2 a-- <7.94g 320.00m /dev/md2 --- 0 0 # lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert home vg00 Vwi-aotz-- 512.00m tpool 4.79 [lvol0_pmspare] vg00 ewi------- 64.00m root vg00 Vwi-aotz-- 4.00g tpool 15.54 tmp vg00 Vwi-aotz-- 512.00m tpool 4.83 tpool vg00 twi-aotz-- 7.50g 38.94 12.21 [tpool_tdata] vg00 Twi-ao---- 7.50g [tpool_tmeta] vg00 ewi-ao---- 64.00m usr vg00 Vwi-aotz-- 2.00g tpool 92.94 var vg00 Vwi-aotz-- 512.00m tpool 78.45 # grubby --info=DEFAULT index=0 kernel="/boot/vmlinuz-4.18.0-348.el8.x86_64" args="ro crashkernel=auto resume=UUID=f43b70c5-9365-4e95-b84e-36952e94b862 rd.md.uuid=1f1ae97e:91e31d11:b51d73a1:71ef886e rd.lvm.lv=vg00/root rd.md.uuid=5be05f0d:a4720975:d25142a1:91ebac08 rd.md.uuid=0f6f14be:eef09ab9:6667ef2f:da7fc59c rd.lvm.lv=vg00/usr console=ttyS0 $tuned_params" root="/dev/mapper/vg00-root" initrd="/boot/initramfs-4.18.0-348.el8.x86_64.img $tuned_initrd" title="Red Hat Enterprise Linux (4.18.0-348.el8.x86_64) 8.5 (Ootpa)" id="302e92ae52b3421b9506bdbe0d9f2a49-4.18.0-348.el8.x86_64" # cat /proc/cmdline BOOT_IMAGE=(mduuid/5be05f0da4720975d25142a191ebac08)/vmlinuz-4.18.0-348.el8.x86_64 root=/dev/mapper/vg00-root ro crashkernel=auto resume=UUID=f43b70c5-9365-4e95-b84e-36952e94b862 rd.md.uuid=1f1ae97e:91e31d11:b51d73a1:71ef886e rd.lvm.lv=vg00/root rd.md.uuid=5be05f0d:a4720975:d25142a1:91ebac08 rd.md.uuid=0f6f14be:eef09ab9:6667ef2f:da7fc59c rd.lvm.lv=vg00/usr console=ttyS0 Created attachment 1844632 [details]
rdsosreport
Adding some mdadm experts to look at the mdadm -I errors. In comment 2 I suggested that the original report was an effect of the lvm 2.03.14-1 package, but I haven't seen an actual connection between that lvm version and the details of the failure. (In reply to David Teigland from comment #8) > > > [ 2.973029] vm3 systemd-udevd[617]: Process '/sbin/mdadm -I /dev/sda1' > failedd > with exit code 2. > [ 3.018272] vm3 systemd-udevd[608]: Process '/sbin/mdadm -I /dev/sdb1' > failedd > with exit code 2. > There are many places in Incremental.c which return 2 > > dracut:/# blkid /dev/sda1 > /dev/sda1: UUID="5be05f0d-a472-0975-d251-42a191ebac08" > UUID_SUB="02776d33-e43e-aedd-110e-02b18e2516a1" > LABEL="localhost.localdomain:0" TYPE="linux_raid_member" > PARTUUID="e3ef0b21-01" > dracut:/# blkid /dev/sda2 > /dev/sda2: UUID="1f1ae97e-91e3-1d11-b51d-73a171ef886e" > UUID_SUB="45865d07-70e0-9ee8-437c-cede90a748ff" > LABEL="localhost.localdomain:1" TYPE="linux_raid_member" > PARTUUID="e3ef0b21-02" > > dracut:/# ls -l /dev/md* > brw-rw---- 1 root disk 9, 0 Dec 3 18:17 /dev/md0 > brw-rw---- 1 root disk 9, 1 Dec 3 18:17 /dev/md1 > brw-rw---- 1 root disk 9, 2 Dec 3 18:17 /dev/md2 > > /dev/md: > total 0 > lrwxrwxrwx 1 root root 6 Dec 3 18:17 1 -> ../md1 > lrwxrwxrwx 1 root root 6 Dec 3 18:17 2 -> ../md2 > > > dracut:/# rm /dev/md0 > dracut:/# mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 > mdadm: Fail to c[ 937.935626] md: md0 stopped. > reate md0 when using /sys/module/md_mod/parameters/new_array, fallback to > creation via node > mdadm: Unable to initialize sysfs This is a place that returns 2. > dracut:/# [ 937.987263] md/raid1:md0: active with 2 out of 2 mirrors > [ 938.002304] md0: detected capacity change from 0 to 535822336 But this makes me confused. If it fails already (return 2), why the md0 is created successfully here. > > > dracut:/# cat /proc/mdstat > Personalities : [raid1] > md0 : active raid1 sdb1[1] sda1[0] > 523264 blocks super 1.2 [2/2] [UU] > bitmap: 0/1 pages [0KB], 65536KB chunk > > md1 : active raid1 sdb2[1] sda2[0] > 8379392 blocks super 1.2 [2/2] [UU] > bitmap: 0/1 pages [0KB], 65536KB chunk > > md2 : active raid1 sdb3[1] sda3[0] > 1046528 blocks super 1.2 [2/2] [UU] > > unused devices: <none> > > > > > > > The initial setup prior to the reboot failure: > > # cat /proc/mdstat > Personalities : [raid1] > md0 : active raid1 sdb1[1] sda1[0] > 523264 blocks super 1.2 [2/2] [UU] > bitmap: 0/1 pages [0KB], 65536KB chunk > > md1 : active raid1 sdb2[1] sda2[0] > 8379392 blocks super 1.2 [2/2] [UU] > bitmap: 1/1 pages [4KB], 65536KB chunk > > md2 : active raid1 sdb3[1] sda3[0] > 1046528 blocks super 1.2 [2/2] [UU] > > unused devices: <none> > > # pvs -a > PV VG Fmt Attr PSize PFree > /dev/md0 --- 0 0 > /dev/md1 vg00 lvm2 a-- <7.94g 320.00m > /dev/md2 --- 0 0 There is no vg on /dev/md0? > > > # grubby --info=DEFAULT > index=0 > kernel="/boot/vmlinuz-4.18.0-348.el8.x86_64" > args="ro crashkernel=auto resume=UUID=f43b70c5-9365-4e95-b84e-36952e94b862 > rd.md.uuid=1f1ae97e:91e31d11:b51d73a1:71ef886e rd.lvm.lv=vg00/root > rd.md.uuid=5be05f0d:a4720975:d25142a1:91ebac08 > rd.md.uuid=0f6f14be:eef09ab9:6667ef2f:da7fc59c rd.lvm.lv=vg00/usr > console=ttyS0 $tuned_params" > root="/dev/mapper/vg00-root" > initrd="/boot/initramfs-4.18.0-348.el8.x86_64.img $tuned_initrd" > title="Red Hat Enterprise Linux (4.18.0-348.el8.x86_64) 8.5 (Ootpa)" > id="302e92ae52b3421b9506bdbe0d9f2a49-4.18.0-348.el8.x86_64" > > > # cat /proc/cmdline > BOOT_IMAGE=(mduuid/5be05f0da4720975d25142a191ebac08)/vmlinuz-4.18.0-348.el8. > x86_64 root=/dev/mapper/vg00-root ro crashkernel=auto > resume=UUID=f43b70c5-9365-4e95-b84e-36952e94b862 > rd.md.uuid=1f1ae97e:91e31d11:b51d73a1:71ef886e rd.lvm.lv=vg00/root > rd.md.uuid=5be05f0d:a4720975:d25142a1:91ebac08 > rd.md.uuid=0f6f14be:eef09ab9:6667ef2f:da7fc59c rd.lvm.lv=vg00/usr > console=ttyS0 Could you explain these 3 lins about rd.md and rd.lvm. It means the lv depends on the md device specified here, right? The second line doesn't specify rd.lvm. It means the boot process must assemble the md device, right? But in the dracut emergency mode we can see lvm volumes are already activated. Why does it still fail to boot? Thanks Xiao Hello, I'm the original reporter. I can say that I didn't see any MD device errors. In my case the MD devices were assembled correctly as far as I can tell. My root and swap LVM volumes are both on the MD device and were actived when I got dropped into the dracut shell. However all other LVM volumes were not and the boot failed. The issue me and David saw seems to be related, there seems to be issues with LVM volumes activation when on top of MD devices, although it doesn't look like we got exactly into the same scenario. Maybe this is due to hardware differences. Thank you for your help. Kind regards. > > dracut:/# mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 > > mdadm: Fail to c[ 937.935626] md: md0 stopped. > > reate md0 when using /sys/module/md_mod/parameters/new_array, fallback to > > creation via node > > mdadm: Unable to initialize sysfs > > This is a place that returns 2. > > > > dracut:/# [ 937.987263] md/raid1:md0: active with 2 out of 2 mirrors > > [ 938.002304] md0: detected capacity change from 0 to 535822336 > > But this makes me confused. If it fails already (return 2), why the md0 is > created successfully here. Interesting questions, and I wonder what's causing md0 to be stopped prior to it being started again. It seems like there are automated actions that are interfering with the mdadm command. > > # pvs -a > > PV VG Fmt Attr PSize PFree > > /dev/md0 --- 0 0 > > /dev/md1 vg00 lvm2 a-- <7.94g 320.00m > > /dev/md2 --- 0 0 > > There is no vg on /dev/md0? md0 is used for /boot. Here's the ks config: # md0 raid1 for boot on sda1 and sdb1 part raid.01 --size 512 --asprimary --ondrive=sda part raid.02 --size 512 --asprimary --ondrive=sdb # md1 raid1 for PV on sda2 sdb2 part raid.03 --size 8192 --ondrive=sda part raid.04 --size 8192 --ondrive=sdb # md2 raid1 for swap on sda3 sdb3 part raid.05 --size 1024 --ondrive=sda part raid.06 --size 1024 --ondrive=sdb # Raid device creation raid /boot --fstype ext4 --device=md0 --level=RAID1 raid.01 raid.02 raid pv.01 --device md1 --level=RAID1 raid.03 raid.04 raid swap --fstype swap --device=md2 --level=RAID1 raid.05 raid.06 # Volume group and logical volume creation volgroup vg00 --pesize=65536 pv.01 logvol none --thinpool --vgname vg00 --size=7168 --name=tpool logvol / --thin --poolname=tpool --fstype ext4 --vgname vg00 --size=4096 --name=root logvol /var --thin --poolname=tpool --fstype ext4 --vgname vg00 --size=512 --name=var logvol /tmp --thin --poolname=tpool --fstype ext4 --vgname vg00 --size=512 --name=tmp logvol /home --thin --poolname=tpool --fstype ext4 --vgname vg00 --size=512 --name=home logvol /usr --thin --poolname=tpool --fstype ext4 --vgname vg00 --size=2048 --name=usr > > # cat /proc/cmdline > > BOOT_IMAGE=(mduuid/5be05f0da4720975d25142a191ebac08)/vmlinuz-4.18.0-348.el8. > > x86_64 root=/dev/mapper/vg00-root ro crashkernel=auto > > resume=UUID=f43b70c5-9365-4e95-b84e-36952e94b862 > > rd.md.uuid=1f1ae97e:91e31d11:b51d73a1:71ef886e rd.lvm.lv=vg00/root > > rd.md.uuid=5be05f0d:a4720975:d25142a1:91ebac08 > > rd.md.uuid=0f6f14be:eef09ab9:6667ef2f:da7fc59c rd.lvm.lv=vg00/usr > > console=ttyS0 > > Could you explain these 3 lins about rd.md and rd.lvm. It means the lv > depends on the md device > specified here, right? The second line doesn't specify rd.lvm. It means the > boot process must > assemble the md device, right? I don't know how dracut is generating this command line. I don't think it's trying to express an explicit dependency between md devs and LVs, but rather just listing any md devs that it believes are needed to boot. It's looking like /boot on md0 might the problem. I don't have a clear understanding how this is supposed to work between grub and the initrd. I expect md0 needs to be assembled very early for /boot (by grub?), so that the initrd image can be read and run. Then in the running initrd, md1 and md2 should be assembled for / and swap. If md0 is already active from grub (for /boot), then dracut should not need to assemble it again. Perhaps the problem might be that dracut attempts to assemble md0 which is already assembled? Sorry there's quite a bit of guesswork there, I need to learn again how /boot is found and mounted. > But in the dracut emergency mode we can see lvm volumes are already > activated. Why does it still fail to boot? I'm not sure, I'd guess that dracut is waiting for all rd.md.uuid devs to exist. (In reply to Enrico Tagliavini from comment #14) > Hello, > > I'm the original reporter. I can say that I didn't see any MD device errors. > In my case the MD devices were assembled correctly as far as I can tell. My > root and swap LVM volumes are both on the MD device and were actived when I > got dropped into the dracut shell. However all other LVM volumes were not > and the boot failed. > > The issue me and David saw seems to be related, there seems to be issues > with LVM volumes activation when on top of MD devices, although it doesn't > look like we got exactly into the same scenario. Maybe this is due to > hardware differences. They are probably different issues. I don't think you had /boot on md and the errors I saw are pointing toward that. > this is supposed to work between grub and the initrd. I expect md0 needs to
> be assembled very early for /boot (by grub?), so that the initrd image can
> be read and run.
From a little searching it sounds like grub is not actually going through the md device, but just using one of the md images. So, including rd.md.uuid for md0(/boot) looks unnecessary, and my vm boots fine when I remove it from the command line. It may still be worthwhile to understand why the mdadm error occured, in case it might occur for the md dev that's actually needed for boot. I haven't been able to reproduce the failure.
(In reply to David Teigland from comment #16) > (In reply to Enrico Tagliavini from comment #14) > They are probably different issues. I don't think you had /boot on md and > the errors I saw are pointing toward that. I do have /boot on an MD device :) $ grep boot /proc/mounts /dev/md127 /boot xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0 /dev/md125 /boot/efi vfat rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro 0 0 as far as I know GRUB can simply read Linux's RAID1 MD devices without issues. Not sure they need to be assembled as one disk is sufficient in this case. # mdadm -Q --detail /dev/md127 /dev/md127: Version : 1.2 Creation Time : Tue May 18 17:06:07 2021 Raid Level : raid1 Array Size : 1047552 (1023.00 MiB 1072.69 MB) Used Dev Size : 1047552 (1023.00 MiB 1072.69 MB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Wed Dec 8 12:52:03 2021 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : bitmap Name : f014l-060482.fmi.ch:boot UUID : f61758af:de6bccd5:717720e7:1f915818 Events : 142 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 /boot/efi is the ESP and that uses a different metadata version for the MD device # mdadm -Q --detail /dev/md125 /dev/md125: Version : 1.0 Creation Time : Tue May 18 17:06:01 2021 Raid Level : raid1 Array Size : 615360 (600.94 MiB 630.13 MB) Used Dev Size : 615360 (600.94 MiB 630.13 MB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Wed Dec 8 12:51:52 2021 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : bitmap Name : f014l-060482.fmi.ch:boot_efi UUID : 1fcc1d6b:732ca1e0:40c5a04d:04c2e7f6 Events : 148 Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 8 18 1 active sync /dev/sdb2 I suspect this issue is the same as was fixed here: https://bugzilla.redhat.com/show_bug.cgi?id=2002640#c65 The dracut lvm udev rule needs an additional line related to md devices. (There are other unrelated issues and fixes discussed in that bug.) (In reply to David Teigland from comment #21) > I suspect this issue is the same as was fixed here: > https://bugzilla.redhat.com/show_bug.cgi?id=2002640#c65 > > The dracut lvm udev rule needs an additional line related to md devices. > (There are other unrelated issues and fixes discussed in that bug.) Mhm, I'm skeptical. The affected file in that bug is /lib/dracut/modules.d/90lvm/64-lvm.rules which belongs to dracut, but dracut was not updated in the week before the machine broke. The last update I can find on dract was the 11th of October, how can it be it showed up only 4-5 weeks later? There were plenty of reboots and new kernels (so also new initamfs) being generated in between. Unless that is only a bug that shows up after the additional updates I've listed above in the big description. What do you think? Thank you. Kind regards. I still think there's a good chance this is the same issue. I don't understand why this suddenly appeared, since the bug has existed forever. I believe it's somehow related to the lvm udev rule change on the root fs, and continues to be a problem even after reverting the udev rule change on root. I'll try to find some time to go back and analyze how this managed to work correctly in the past. I noticed this bug this week also on my home server. I'm running LVM on software raid. Booting into 4.18.0-348.el8 works fine, all mounts work. Booting into 4.18.0-348.2.1.el8 does not work. systemd tries to mount for a while, gives up, and drops to an emergency shell. Running "vgchange -a y" brings all my LVs and mounts back. I can toggle between these two kernels and reproduce this problem. /proc/mdstat looks fine. Different kernel issue can be explained by having differences in initrd, so it is either fallout of lvm2 change or there is a problem in the kernel. Try with lvm2-2.03.14-2.el8 reverting changes not supposed for rhel8 (there is a more fresh build lvm2-2.03.14-3.el8 coming soon) Make sure to run dracut -f to rebuild initrd Please try the new build lvm2-2.03.14-2.el8 (including initrd rebuild), if that fixes the issue then this will be closed as a duplicate of bug 2032993. Thanks On 4.18.0-348.el8, I updated all my packages (including lvm2-2.03.14-2.el8), then rebooted to try the new kernels. Booting to 4.18.0-348.7.1.el8_5 works without problems. Booting to 4.18.0-348.2.1.el8 fails, so I went to the emergency shell, ran "dracut -f", then rebooted to 4.18.0-348.2.1.el8 again. It worked that time. So I consider this to be resolved now. Enrico, can you verify this works for you also? I can confirm that in all initramfs generated before the latest update of LVM2 the file usr/lib/udev/rules.d/69-dm-lvm-metad.rules is missing. Now it's there. LVM2 was also one of the packages updated immediately before the system started showing the issue. If I can manage I will also try to reboot the workstation and restore the previous boot parameters to double check if the issue is solved. It's a bit hard right now with COVID restrictions and so on. Thank you for your time and help with this. Kind regards. Will reopen if problem reappears. *** This bug has been marked as a duplicate of bug 2032993 *** I had the chance to do some testing on the affected system today. I can confirm the problem is solved. I added the arguments back to the kernel command line using grubby, confirmed they where there after the boot. Boot was successful, thank you very much for the help. Kind regards. |
Description of problem: Disclaimer: I'm not entirely sure what package is responsible for this issue. It could be lvm2 or one of the device-mapper. I think the most likely is lvm2, so this is what I selected, but if it's from another package, please move as appropriate. After updating one of my CentOS 8 Stream systems on Monday I found out the boot process was failing, hanging waiting for devices for /var/ /tmp/ /opt and others to appear, giving up after a few minutes. This system is updated every Monday morning, so the change must have happened in the week before, so between the 15th and the 21st of November. I also have the list of the updated packages in the last update run (see below). I found the following workaround to make the system boot again: remove all references to any rd.lvm.lv=* from the kernel command line in grub. This made the system boot again. Version-Release number of selected component (if applicable): See below for the full list of packages that are involved in the update that broke the boot. Not sure which package among the one that got updated. How reproducible: It might not affect all setups, but on the one affected the boot process always fails Steps to Reproduce: Install CentOS 8 stream with some additional system folder on separate LV volumes (e.g. /var, /opt, /tmp). Apply the update and restart the system, boot should fail. Actual results: After the OS update boot fails. Expected results: After the OS update boot should work as before Additional info: It would look like that after the update the system will only activate the volumes listed on the kernel command line, ignoring fstab or lvm config files. I left the LVM config file and the kernel command line as set by the installation, I never changed them. The installation adds the root and swap LVs to the kernel command line, but not all others, even if created during the installation. If LVM devices are listed in fstab I think those should also be activated by default, regardless or not they were specified in the kernel command line. This was also the previous behavior and this update broke all installation with additional LVM volumes. This is the full list of packages update in the last update before the issue as reported by DNF automatic: Installing: kernel x86_64 4.18.0-348.2.1.el8_5 baseos 7.0 M kernel-core x86_64 4.18.0-348.2.1.el8_5 baseos 38 M kernel-devel x86_64 4.18.0-348.2.1.el8_5 baseos 20 M kernel-modules x86_64 4.18.0-348.2.1.el8_5 baseos 30 M Upgrading: bpftool x86_64 4.18.0-348.2.1.el8_5 baseos 7.7 M clamav x86_64 0.103.4-1.el8 epel 2.7 M clamav-filesystem noarch 0.103.4-1.el8 epel 46 k clamav-lib x86_64 0.103.4-1.el8 epel 862 k clamav-update x86_64 0.103.4-1.el8 epel 129 k clamd x86_64 0.103.4-1.el8 epel 124 k device-mapper x86_64 8:1.02.181-1.el8 baseos 377 k device-mapper-event x86_64 8:1.02.181-1.el8 baseos 271 k device-mapper-event-libs x86_64 8:1.02.181-1.el8 baseos 270 k device-mapper-libs x86_64 8:1.02.181-1.el8 baseos 409 k device-mapper-multipath x86_64 0.8.4-19.el8 baseos 198 k device-mapper-multipath-libs x86_64 0.8.4-19.el8 baseos 323 k fedpkg noarch 1.41-2.el8 epel 113 k java-1.8.0-openjdk-headless-slowdebug x86_64 1:1.8.0.312.b07-2.el8_5 appstream 36 M java-1.8.0-openjdk-slowdebug x86_64 1:1.8.0.312.b07-2.el8_5 appstream 345 k kernel-headers x86_64 4.18.0-348.2.1.el8_5 baseos 8.3 M kernel-tools x86_64 4.18.0-348.2.1.el8_5 baseos 7.2 M kernel-tools-libs x86_64 4.18.0-348.2.1.el8_5 baseos 7.0 M kpartx x86_64 0.8.4-19.el8 baseos 113 k libstoragemgmt x86_64 1.9.1-3.el8 baseos 246 k lvm2 x86_64 8:2.03.14-1.el8 baseos 1.7 M lvm2-libs x86_64 8:2.03.14-1.el8 baseos 1.2 M python3-libstoragemgmt x86_64 1.9.1-3.el8 baseos 176 k python3-perf x86_64 4.18.0-348.2.1.el8_5 baseos 7.1 M rsyslog x86_64 8.2102.0-6.el8 appstream 752 k rsyslog-gnutls x86_64 8.2102.0-6.el8 appstream 32 k rsyslog-gssapi x86_64 8.2102.0-6.el8 appstream 34 k rsyslog-relp x86_64 8.2102.0-6.el8 appstream 33 k Removing: kernel x86_64 4.18.0-331.el8 @baseos 0 kernel-core x86_64 4.18.0-331.el8 @baseos 68 M kernel-devel x86_64 4.18.0-331.el8 @baseos 49 M kernel-modules x86_64 4.18.0-331.el8 @baseos 22 M Transaction Summary ================================================================================ Install 4 Packages Upgrade 28 Packages Remove 4 Packages Original kernel command line set by the installation: # grubby --info=DEFAULT index=0 kernel="/boot/vmlinuz-4.18.0-348.2.1.el8_5.x86_64" args="ro crashkernel=auto resume=/dev/mapper/vg_libws02-swap rd.md.uuid=4ff4c6fb:b996a59d:2adfe0c0:7896834e rd.lvm.lv=vg_libws02/root rd.md.uuid=f61758af:de6bccd5:717720e7:1f915818 rd.lvm.lv=vg_libws02/swap rhgb quiet rd.driver.blacklist=nouveau $tuned_params" root="/dev/mapper/vg_libws02-root" initrd="/boot/initramfs-4.18.0-348.2.1.el8_5.x86_64.img $tuned_initrd" title="CentOS Linux (4.18.0-348.2.1.el8_5.x86_64) 8" id="e0e3606e20034d3d916e4230cd2cc353-4.18.0-348.2.1.el8_5.x86_64" With the above the boot fails. The following command line allows for a successful boot # cat /proc/cmdline BOOT_IMAGE=(mduuid/f61758afde6bccd5717720e71f915818)/vmlinuz-4.18.0-348.2.1.el8_5.x86_64 root=/dev/mapper/vg_libws02-root ro crashkernel=auto rd.md.uuid=4ff4c6fb:b996a59d:2adfe0c0:7896834e rd.md.uuid=f61758af:de6bccd5:717720e7:1f915818 rhgb quiet rd.driver.blacklist=nouveau