Bug 2039091
| Summary: | Boot fails when /var is an LV | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Gordon Messmer <gordon.messmer> | ||||||
| Component: | lvm2 | Assignee: | LVM and device-mapper development team <lvm-team> | ||||||
| lvm2 sub component: | Activating existing Logical Volumes | QA Contact: | cluster-qe <cluster-qe> | ||||||
| Status: | CLOSED DUPLICATE | Docs Contact: | |||||||
| Severity: | unspecified | ||||||||
| Priority: | unspecified | CC: | agk, bstinson, heinzm, jbrassow, jstodola, jwboyer, msnitzer, prajnoha, teigland, zkabelac | ||||||
| Version: | CentOS Stream | ||||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2022-01-12 14:52:33 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Gordon Messmer
2022-01-10 22:42:27 UTC
Could you please provide the content of /etc/lvm/devices/system.devices on the installed system? Does the system boot successfully if you (re)move the file? This could be the same problem discussed in bug 2037905. Created attachment 1850151 [details]
system.devices and lsblk -o +UUID
I'm attaching a file that contains both system.devices and the output of "lsblk -o +UUID" from a new VM on which I've replicated the problem (with manual partitioning instead of a kickstart file).
The file looks correct. It's identical (other than the time) to a new file generated by "vgimportdevices cs". Removing the file does not allow the system to boot, even if I regenerate the initrd with "dracut -f".
If I understand correctly, system.devices identifies the devices that are intended to be used as PVs. If system.devices were incorrect, then I'd expect *no* VGs (and by extension, no LVs) to be activated. But that's not the problem. The problem is that dracut is only activating LVs named in the kernel args with rd.lvm.lv=<LV>, and that list is incomplete.
I'm unable to see bug 2024100, so I can't verify that this is a duplicate. I've updated the VM with packages from https://kojihub.stream.centos.org/koji/buildinfo?buildID=16070 , rebuilt the initrd, and verified that the problem still exists. I don't see any changes from -2 to -3 that would address the problem. There doesn't seem to be anything wrong with lvm2; the system is doing exactly what the documentation says that it will. The man page for dracut.cmdline says that if rd.lvm.lv= is provided, then only those LVs will be activated. That is what is causing the boot failure. Some LVs exist, and they're required for mounts defined in /etc/fstab, but they aren't being activated because Anaconda has told dracut not to activate them. I believe the practice of specifying rd.lvm.lv= is itself, a bug. Even if anaconda is fixed so that it specifies all of the LVs needed for /etc/fstab, anyone who creates new LVs in the default volume group after initial installation is going to struggle to figure out why they are missing when the system is rebooted. Specifying rd.lvm.vg= instead should be a more reliable option. In CS8, additional LVs are also not named on the kernel command line. On that release, early boot activation of LVs that aren't given to rd.lvm.lv= is handled by /usr/lib/systemd/system/lvm2-pvscan@.service. However, on CS9, that unit doesn't exist. On CS9, early boot activation looks like it's intended to be handled by a rule in /usr/lib/udev/rules.d/69-dm-lvm.rules, but that appears to only happens for PVs that weren't handled during dracut init, so it doesn't fire for VGs that dracut partially activated. Created attachment 1850205 [details]
journal from failed boot, with udev debug logs
I added a device to the VM that does not boot and created a new VG on that device. (The new PV is /dev/vdc1, and the VG is BackupVG). Then, I booted the VM with "udev.log-priority=debug" as a boot parameter.
The VM will boot to a rescue environment after the /var mount times. At that point, /dev/cs/root, /dev/cs/swap, and /dev/BackupVG/lv01 exist, but /dev/cs/var does not. The transient systemd unit, lvm-activate-BackupVG.service, exists, which suggests that the rule in 69-dm-lvm.rules is being triggered for /dev/vdc1. And, indeed, the log includes the command from line 82.
However, there is no "lvm-activate-cs.service" unit. And, while there are udev events for vdc1 both before and after the pivot, there are no events for md127 (the PV backing the cs VG) after the pivot.
rd.lvm.lv= arguments provided on the kernel command line should be just the LVs used by dracut/initramfs to mount the root filesystem. The other LVs, that are not needed in initramfs, are activated later in the boot process. So, the kernel command line arguments created by the installer look OK. I'm reassigning this bug to lvm2 to further review. root on lvm on md is broken until 64-lvm.rules in dracut is fixed. As for the other broader suggestion, dracut activating all LVs in the root VG is reasonable in many cases, but in cases where there are thousands of LVs (and of varying complex types) in the root VG it would interfere with and delay the main job of the initrd which is activating the root fs LV. *** This bug has been marked as a duplicate of bug 2033737 *** |