RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2039091 - Boot fails when /var is an LV
Summary: Boot fails when /var is an LV
Keywords:
Status: CLOSED DUPLICATE of bug 2033737
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: lvm2
Version: CentOS Stream
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-10 22:42 UTC by Gordon Messmer
Modified: 2022-01-12 14:52 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-12 14:52:33 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
system.devices and lsblk -o +UUID (1.81 KB, text/plain)
2022-01-11 18:17 UTC, Gordon Messmer
no flags Details
journal from failed boot, with udev debug logs (2.26 MB, text/plain)
2022-01-11 23:48 UTC, Gordon Messmer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-107342 0 None None None 2022-01-10 22:46:20 UTC

Description Gordon Messmer 2022-01-10 22:42:27 UTC
Description of problem:

I've installed a CentOS Stream 9 system from a kickstart file that specified (among other things) several logical volumes:

logvol / --fstype="ext4" --size=10240 --name=lv_root --vgname=VolGroup
logvol /var --fstype="ext4" --size=4096 --name=lv_var --vgname=VolGroup
logvol swap --fstype="swap" --size=2048 --name=lv_swap --vgname=VolGroup

When that system rebooted, the kernel args did specify "rd.lvm.lv=VolGroup/lv_root rd.lvm.lv=VolGroup/lv_swap", but did not specify "rd.lvm.lv=VolGroup/lv_var", so boot failed because the filesystem required for /var couldn't be found.

I'd like to suggest that Anaconda be simplified, and specify "rd.lvm.vg=VolGroup" rather than enumerating individual LVs.  As far as I know, the LVs inside VolGroup can't be activated unless that VG is complete, and if it's complete, then I can see no good reason why Anaconda should add individual LVs to the kernel command line rather than the whole VG.


Version-Release number of selected component (if applicable):
34.25.0.23-1.el9

How reproducible:

I assume always, but have only done one the one installation.  I also don't know yet if the same problem results when filesystem are created manually.

Steps to Reproduce:
1. Create a kickstart file specifying multiple LVs.
2. Install a new system using that kickstart config.
3. Reboot.

Actual results:

Boot fails, unable to mount /var because the LV is missing.

Expected results:

First boot after a clean install should succeed.

Additional info:

Comment 1 Jan Stodola 2022-01-11 08:54:20 UTC
Could you please provide the content of /etc/lvm/devices/system.devices on the installed system? Does the system boot successfully if you (re)move the file?
This could be the same problem discussed in bug 2037905.

Comment 2 Gordon Messmer 2022-01-11 18:17:47 UTC
Created attachment 1850151 [details]
system.devices and lsblk -o +UUID

I'm attaching a file that contains both system.devices and the output of "lsblk -o +UUID" from a new VM on which I've replicated the problem (with manual partitioning instead of a kickstart file).

The file looks correct.  It's identical (other than the time) to a new file generated by "vgimportdevices cs".  Removing the file does not allow the system to boot, even if I regenerate the initrd with "dracut -f".

If I understand correctly, system.devices identifies the devices that are intended to be used as PVs.  If system.devices were incorrect, then I'd expect *no* VGs (and by extension, no LVs) to be activated.  But that's not the problem.  The problem is that dracut is only activating LVs named in the kernel args with rd.lvm.lv=<LV>, and that list is incomplete.

Comment 3 Gordon Messmer 2022-01-11 18:47:13 UTC
I'm unable to see bug 2024100, so I can't verify that this is a duplicate.

I've updated the VM with packages from https://kojihub.stream.centos.org/koji/buildinfo?buildID=16070 , rebuilt the initrd, and verified that the problem still exists.  I don't see any changes from -2 to -3 that would address the problem.  There doesn't seem to be anything wrong with lvm2; the system is doing exactly what the documentation says that it will.  The man page for dracut.cmdline says that if rd.lvm.lv= is provided, then only those LVs will be activated.  That is what is causing the boot failure.  Some LVs exist, and they're required for mounts defined in /etc/fstab, but they aren't being activated because Anaconda has told dracut not to activate them.

I believe the practice of specifying rd.lvm.lv= is itself, a bug.  Even if anaconda is fixed so that it specifies all of the LVs needed for /etc/fstab, anyone who creates new LVs in the default volume group after initial installation is going to struggle to figure out why they are missing when the system is rebooted.  Specifying rd.lvm.vg= instead should be a more reliable option.

Comment 4 Gordon Messmer 2022-01-11 19:23:41 UTC
In CS8, additional LVs are also not named on the kernel command line.  On that release, early boot activation of LVs that aren't given to rd.lvm.lv= is handled by /usr/lib/systemd/system/lvm2-pvscan@.service.  However, on CS9, that unit doesn't exist.  On CS9, early boot activation looks like it's intended to be handled by a rule in /usr/lib/udev/rules.d/69-dm-lvm.rules, but that appears to only happens for PVs that weren't handled during dracut init, so it doesn't fire for VGs that dracut partially activated.

Comment 5 Gordon Messmer 2022-01-11 23:48:14 UTC
Created attachment 1850205 [details]
journal from failed boot, with udev debug logs

I added a device to the VM that does not boot and created a new VG on that device.  (The new PV is /dev/vdc1, and the VG is BackupVG).  Then, I booted the VM with "udev.log-priority=debug" as a boot parameter.

The VM will boot to a rescue environment after the /var mount times.  At that point, /dev/cs/root, /dev/cs/swap, and /dev/BackupVG/lv01 exist, but /dev/cs/var does not.  The transient systemd unit, lvm-activate-BackupVG.service, exists, which suggests that the rule in 69-dm-lvm.rules is being triggered for /dev/vdc1.  And, indeed, the log includes the command from line 82.

However, there is no "lvm-activate-cs.service" unit.  And, while there are udev events for vdc1 both before and after the pivot, there are no events for md127 (the PV backing the cs VG) after the pivot.

Comment 6 Jan Stodola 2022-01-12 09:31:13 UTC
rd.lvm.lv= arguments provided on the kernel command line should be just the LVs used by dracut/initramfs to mount the root filesystem. The other LVs, that are not needed in initramfs, are activated later in the boot process. So, the kernel command line arguments created by the installer look OK.
I'm reassigning this bug to lvm2 to further review.

Comment 7 David Teigland 2022-01-12 14:52:33 UTC
root on lvm on md is broken until 64-lvm.rules in dracut is fixed.

As for the other broader suggestion, dracut activating all LVs in the root VG is reasonable in many cases, but in cases where there are thousands of LVs (and of varying complex types) in the root VG it would interfere with and delay the main job of the initrd which is activating the root fs LV.

*** This bug has been marked as a duplicate of bug 2033737 ***


Note You need to log in before you can comment on or make changes to this bug.