Bug 2042656

Summary: control node creation race can lead to inactive volume groups after autoactivation
Product: [Community] LVM and device-mapper Reporter: Ferenc Wágner <wferi>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
lvm2 sub component: Activating existing Logical Volumes QA Contact: cluster-qe <cluster-qe>
Status: POST --- Docs Contact:
Severity: unspecified    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, thornber, zkabelac
Version: unspecifiedFlags: pm-rhel: lvm-technical-solution?
pm-rhel: lvm-test-coverage?
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ferenc Wágner 2022-01-19 20:51:04 UTC
Description of problem:
My Debian bullseye systems occasionally fail to boot due to missing LVs: systemd times out waiting for devices to appear for various local filesystems and the emergency shell is invoked. However, an immediate `vgchange -ay` activates all LVs in all VGs and lets the boot continue to success.

Version-Release number of selected component (if applicable):
2.03.11-2.1

How reproducible:
happens once out of 20 boots approximately

Steps to Reproduce:
1. have two VGs on two disks (each disk a PV)
2. use LVs from both to mount filesystems in fstab
3. keep rebooting until the boot fails into the emergency shell

Actual results:
Eventually the boot fails as described above.

Expected results:
Successful boots only.

Additional info:
This is a default initramfs-tools based boot, not Dracut. Running systemd-udevd with the `--debug` option in the initramfs and grepping for `pvscan` in the log gives:
```
vdb: /usr/lib/udev/rules.d/69-lvm-metad.rules:127 RUN '/sbin/lvm pvscan --cache --activate ay --major $major --minor $minor'
vdb: Running command "/sbin/lvm pvscan --cache --activate ay --major 254 --minor 16"
vdb: Starting '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 16'
vdc: /usr/lib/udev/rules.d/69-lvm-metad.rules:127 RUN '/sbin/lvm pvscan --cache --activate ay --major $major --minor $minor'
vdc: Running command "/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32"
vdc: Starting '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'
vdb: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 16'(out) '  pvscan[147] PV /dev/vdb online, VG ivy is complete.'
vdb: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 16'(out) '  pvscan[147] VG ivy run autoactivation.'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(out) '  pvscan[148] PV /dev/vdc online, VG ldap_ivy is complete.'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(out) '  pvscan[148] VG ldap_ivy run autoactivation.'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(err) '  /dev/mapper/control: mknod failed: File exists'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(err) '  Failure to communicate with kernel device-mapper driver.'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(err) '  Check that device-mapper is available in the kernel.'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(err) '  Incompatible libdevmapper 1.02.175 (2021-01-08) and kernel driver (unknown version).'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(out) '  0 logical volume(s) in volume group "ldap_ivy" now active'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(err) '  ldap_ivy: autoactivation failed.'
vdb: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 16'(out) '  4 logical volume(s) in volume group "ivy" now active'
vdc: Process '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32' failed with exit code 5.
vdc: Command "/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32" returned 5 (error), ignoring.
vdb: Process '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 16' succeeded.
```
However, `/run/lvm/pvs_online/` at this point contains files indicating that both VGs (ldap_ivy and ivy) are online. The opposite result can happen as well, when the ldap_ivy VG gets activated and the ivy VG stays inactive. Most of the time the above error doesn't appear and both VGs activate for real. But in all cases `/run/lvm/online/` indicates full activation, even if only one VG was activated successfully.
Creating `/dev/mapper/control` with the right major and minor device numbers before starting systemd-udevd in the initramfs works around the problem and results in reliable booting.

I guess the message comes from `_create_control()` in `libdm-iface.c` and branching back to the `_control_exists()` call if `mknod()` fails with EEXIST would avoid the problem.

Thanks for your time,
Feri.

Comment 1 Zdenek Kabelac 2023-02-14 11:43:34 UTC
lvmetad Debian udev rule is likely some 'left-over' relict from the past since 2.03 version of lvm2 no longer provides lvmetad daemon (replaced with other way of autoactivation).

However there could have been a race within creation of /dev/mapper/control device as stated in the last line - so here comes the upstream patch:

https://listman.redhat.com/archives/lvm-devel/2023-February/024597.html

Comment 2 Ferenc Wágner 2023-02-14 17:10:11 UTC
Great, thanks for the fix!
Looks like the lvm-metad udev rule is not present in the latest Debian package, so that's probably fixed already.
By the way man/lvmautoactivation.7_main still references 69-dm-lvm-metad.rules, is that intended?

Comment 3 Zdenek Kabelac 2023-02-14 19:10:50 UTC
Ahh - thanks for noticing - this is going to be updated soon - there are some ongoing reworks...