Bug 2042656 - control node creation race can lead to inactive volume groups after autoactivation
Summary: control node creation race can lead to inactive volume groups after autoactiv...
Keywords:
Status: POST
Alias: None
Product: LVM and device-mapper
Classification: Community
Component: lvm2
Version: unspecified
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Zdenek Kabelac
QA Contact: cluster-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-19 20:51 UTC by Ferenc Wágner
Modified: 2023-08-10 15:41 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
pm-rhel: lvm-technical-solution?
pm-rhel: lvm-test-coverage?


Attachments (Terms of Use)

Description Ferenc Wágner 2022-01-19 20:51:04 UTC
Description of problem:
My Debian bullseye systems occasionally fail to boot due to missing LVs: systemd times out waiting for devices to appear for various local filesystems and the emergency shell is invoked. However, an immediate `vgchange -ay` activates all LVs in all VGs and lets the boot continue to success.

Version-Release number of selected component (if applicable):
2.03.11-2.1

How reproducible:
happens once out of 20 boots approximately

Steps to Reproduce:
1. have two VGs on two disks (each disk a PV)
2. use LVs from both to mount filesystems in fstab
3. keep rebooting until the boot fails into the emergency shell

Actual results:
Eventually the boot fails as described above.

Expected results:
Successful boots only.

Additional info:
This is a default initramfs-tools based boot, not Dracut. Running systemd-udevd with the `--debug` option in the initramfs and grepping for `pvscan` in the log gives:
```
vdb: /usr/lib/udev/rules.d/69-lvm-metad.rules:127 RUN '/sbin/lvm pvscan --cache --activate ay --major $major --minor $minor'
vdb: Running command "/sbin/lvm pvscan --cache --activate ay --major 254 --minor 16"
vdb: Starting '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 16'
vdc: /usr/lib/udev/rules.d/69-lvm-metad.rules:127 RUN '/sbin/lvm pvscan --cache --activate ay --major $major --minor $minor'
vdc: Running command "/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32"
vdc: Starting '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'
vdb: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 16'(out) '  pvscan[147] PV /dev/vdb online, VG ivy is complete.'
vdb: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 16'(out) '  pvscan[147] VG ivy run autoactivation.'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(out) '  pvscan[148] PV /dev/vdc online, VG ldap_ivy is complete.'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(out) '  pvscan[148] VG ldap_ivy run autoactivation.'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(err) '  /dev/mapper/control: mknod failed: File exists'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(err) '  Failure to communicate with kernel device-mapper driver.'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(err) '  Check that device-mapper is available in the kernel.'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(err) '  Incompatible libdevmapper 1.02.175 (2021-01-08) and kernel driver (unknown version).'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(out) '  0 logical volume(s) in volume group "ldap_ivy" now active'
vdc: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32'(err) '  ldap_ivy: autoactivation failed.'
vdb: '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 16'(out) '  4 logical volume(s) in volume group "ivy" now active'
vdc: Process '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32' failed with exit code 5.
vdc: Command "/sbin/lvm pvscan --cache --activate ay --major 254 --minor 32" returned 5 (error), ignoring.
vdb: Process '/sbin/lvm pvscan --cache --activate ay --major 254 --minor 16' succeeded.
```
However, `/run/lvm/pvs_online/` at this point contains files indicating that both VGs (ldap_ivy and ivy) are online. The opposite result can happen as well, when the ldap_ivy VG gets activated and the ivy VG stays inactive. Most of the time the above error doesn't appear and both VGs activate for real. But in all cases `/run/lvm/online/` indicates full activation, even if only one VG was activated successfully.
Creating `/dev/mapper/control` with the right major and minor device numbers before starting systemd-udevd in the initramfs works around the problem and results in reliable booting.

I guess the message comes from `_create_control()` in `libdm-iface.c` and branching back to the `_control_exists()` call if `mknod()` fails with EEXIST would avoid the problem.

Thanks for your time,
Feri.

Comment 1 Zdenek Kabelac 2023-02-14 11:43:34 UTC
lvmetad Debian udev rule is likely some 'left-over' relict from the past since 2.03 version of lvm2 no longer provides lvmetad daemon (replaced with other way of autoactivation).

However there could have been a race within creation of /dev/mapper/control device as stated in the last line - so here comes the upstream patch:

https://listman.redhat.com/archives/lvm-devel/2023-February/024597.html

Comment 2 Ferenc Wágner 2023-02-14 17:10:11 UTC
Great, thanks for the fix!
Looks like the lvm-metad udev rule is not present in the latest Debian package, so that's probably fixed already.
By the way man/lvmautoactivation.7_main still references 69-dm-lvm-metad.rules, is that intended?

Comment 3 Zdenek Kabelac 2023-02-14 19:10:50 UTC
Ahh - thanks for noticing - this is going to be updated soon - there are some ongoing reworks...


Note You need to log in before you can comment on or make changes to this bug.