Bug 2058386

Summary: Duplicate PV errors prevent system from booting after installation when mdadm is on direct disks
Product: Red Hat Enterprise Linux 8 Reporter: Lark Gordon <lagordon>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Activating existing Logical Volumes QA Contact: cluster-qe <cluster-qe>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: high CC: agk, heinzm, jbrassow, jcastran, jkonecny, ldigby, msnitzer, nweddle, prajnoha, rmetrich, sbarcomb, zkabelac
Version: 8.5   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-13 10:52:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Kickstart to reproduce none

Description Lark Gordon 2022-02-24 18:54:29 UTC
Description of problem:
New RHEL 8 installations on systems which have mdadm on direct disks fail to boot due to duplicate PV errors. An lvm filter has to be added after installation; the expectation is that the installer should do this automatically. 

Version-Release number of selected component (if applicable):
RHEL 8.5

How reproducible:
Manual installation in this situation always results in an unbootable system

Steps to Reproduce:
1. Install RHEL 8 via the graphical installer on a system with BIOS RAID 1 of two equal SSDs
2. Reboot after installation
3. Boot fails with duplicate PV errors like: 

dracut-initqueue[610]: Scanning devices nvme0n1p3 sda1 sdb1  for LVM logical volumes vg01/root vg01/swap
localhost.localdomain dracut-initqueue[610]: WARNING: Not using device /dev/sdb1 for PV XXXX
dracut-initqueue[610]: WARNING: PV XXXX prefers device /dev/sda1 because device was seen first.

Actual results:
System fails to boot after installation due to duplicate PV errors

Expected results:
System should automatically create an lvm filter to prevent this.

Additional info:
This is the filter added after installation which allows the system to boot: 
  
    filter = [ "a|/dev/md*|", "a|/dev/nvme*|", "r|.*|" ]

This partition scheme fails to boot: 
--------------------------------------------------------
part pv.2521 --fstype="lvmpv" --ondisk=nvme0n1 --size=486761
part pv.3885 --fstype="lvmpv" --ondisk=Volume0_0 --size=927916
raid  --device=imsm --level=CONTAINER --noformat --spares=1 --useexisting
--------------------------------------------------------

But this works: 
--------------------------------------------------------
part pv.2521 --fstype="lvmpv" --ondisk=nvme0n1 --size=486761
part raid.3885 --ondisk=Volume0_0 --size=927916
raid  pv.3885 --device=imsm --level=CONTAINER --noformat --spares=1 --useexisting
--------------------------------------------------------

Comment 1 Lance Digby 2022-03-03 21:47:57 UTC
Note In Rhel 9 the LVM filter will not used by default the system.devices file will be. see man lvmdevices and  man vgimportdevices.

Comment 2 Jiri Konecny 2022-03-11 09:57:44 UTC
Could you please provide us logs. You can find them in /tmp/*log in the installation environment or /var/log/anaconda/*log after the installation.

Comment 4 Renaud Métrich 2022-06-13 09:14:44 UTC
I'm reassigning this issue to LVM because the issue is not with Anaconda but LVM itself.

The customer facing the issue has a IMSM RAID configured in the BIOS on disks sda and sdb.
He **doesn't** use the IMSM RAID for the OS tree, but only /home, which is the condition to make the issue happen.
The OS tree is on another disk (a NVME disk).

When installing the system and rebooting, the /home partition is NOT being mounted, causing Emergency to be entered.
This happens because, due to /home not being part of the critical OS tree, dracut **didn't** include "mdraid" dracut module in the initramfs, which is **expected behaviour** (/home is not required to mount the / on /sysroot).

Due to having a IMSM RAID, LVM **hijacks** the partition hosting /home and seen in the initramfs phase as /dev/sda1 and /dev/sdb1.
This creates a "Duplicated PVs" warning and prevents /home from being mounted after switch root.

The issue is clearly on LVM side which considers /dev/sda1 and /dev/sdb1 as "normal partitions" **even though** they are on a IMSM RAID (which cannot be mounted in the initramfs due to absence of "mdraid" dracut module).
LVM needs to be smarter and has built-in code to detect the IMSM RAID and ignore such partition.


REPRODUCER:

You don't need a real IMSM hardware here, just a QEMU/KVM with 3 disks:
- /dev/vda for OS tree
- /dev/sda and /dev/sdb configured as IMSM RAID and hosting "/home"

Reproducer kickstart in attachment (bz2058386.ks). Due to not having real IMSM hardware, a hack is made. If you have real hardware, remove the "export IMSM_NO_PLATFORM=1" line.

REPRODUCER STEPS:

1. Install the system with the kickstart
2. Upon reboot, STOP at Grub menu and add "rd.break"
3. Check that devices for "/home" were "hijacked" by LVM

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
switch_root:/# lvm lvs
  WARNING: Not using device /dev/sdb1 for PV 8tbqAB-ihqI-G4dZ-hyRt-hL32-mwqp-WrsbvS.
  WARNING: PV 8tbqAB-ihqI-G4dZ-hyRt-hL32-mwqp-WrsbvS prefers device /dev/sda1 because device was seen first.
  LV   VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home data -wi-------  1.99g                                                    
  root rhel -wi-ao---- 10.00g                                                    
  swap rhel -wi-a-----  2.00g                                                    
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

4. Continue the boot, system will end up in Emergency mode

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[FAILED] Failed to start LVM event activation on device 8:1.
See 'systemctl status lvm2-pvscan@8:1.service' for details.
[FAILED] Failed to start LVM event activation on device 8:17.
See 'systemctl status lvm2-pvscan@8:17.service' for details.

[   ***] A start job is running for dev-mapp…ta\x2dhome.device (29s / 1min 30s)

...
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

5. On non-real IMSM hardware, you can configure the IMSM Raid as shown below, but it won't be mountable anyway since LVM hijacked the devices and sees **duplicated PVs**

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# export IMSM_NO_PLATFORM=1
# mdadm --assemble --scan
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Comment 5 Renaud Métrich 2022-06-13 09:15:26 UTC
Created attachment 1889348 [details]
Kickstart to reproduce

Comment 6 Renaud Métrich 2022-06-13 10:52:01 UTC
Works fine with 8.6