Bug 1932761
Summary: | LVM devices automatically activated with event_activation turned off | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Vojtech Juranek <vjuranek> | ||||||
Component: | lvm2 | Assignee: | Zdenek Kabelac <zkabelac> | ||||||
lvm2 sub component: | Activating existing Logical Volumes | QA Contact: | cluster-qe <cluster-qe> | ||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||
Severity: | high | ||||||||
Priority: | high | CC: | agk, aperotti, bhull, cmackows, cmarthal, fgarciad, heinzm, jbrassow, ldigby, msnitzer, nsoffer, prajnoha, teigland, thornber, troels, zkabelac | ||||||
Version: | 8.2 | Keywords: | Triaged | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | lvm2-2.03.12-1.el8 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2021-11-09 19:45:25 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Vojtech Juranek
2021-02-25 07:27:02 UTC
Do you have lvm filter on this host, allowing lvm to access only /dev/vda? Please attach /etc/lvm/lvm.conf. Created attachment 1759210 [details]
lvm config
attaching lvm config
(In reply to Nir Soffer from comment #1) > Do you have lvm filter on this host, allowing lvm to access only /dev/vda? There is not filter configured. However, vdsm says lvm is configured: [root@localhost ~]# vdsm-tool is-configured lvm is configured for vdsm Managed volume database is already configured abrt is already configured for vdsm libvirt is already configured for vdsm Current revision of multipath.conf detected, preserving sanlock is configured for vdsm On another host in the same deployment, I have following filter: filter = ["r|.*|"] (In reply to Vojtech Juranek from comment #3) > (In reply to Nir Soffer from comment #1) > > Do you have lvm filter on this host, allowing lvm to access only /dev/vda? > > There is not filter configured. However, vdsm says lvm is configured: sorry, not related. I forgot to configure lvm filter on this node. With filter, it works correctly (devices are filtered), so from vdsm perspective, we are safe, but bug LVM is still present. We have fixed this recently on upstream with this small commit: https://listman.redhat.com/archives/lvm-devel/2021-February/msg00040.html the same fix in git https://sourceware.org/git/?p=lvm2.git;a=commit;h=2be585b79c71b8f70c0252af5f09dbd5e6103030 event_activation=0 works for generating lvm2-activation* services during startup to activate VGs directly, but pvscan fails to check it, so the lvm2-pvscan services will also attempt to activate VGs, during startup and after startup when devices are added. Activating LVs when they shouldn't be activated can cause big problems, and it's surprising that this hasn't been reported earlier. When I removed lvmetad (between rhel7 and rhel8), and rewrote pvscan activation to work without it, I failed to replace the old use_lvmetad=0 check (which had a second meaning of stopping event activation), with the new event_activation=0 check. So, this has been broken since RHEL 8.0 was first released. (Part of the issue is also the fact that even when event activation is disabled, systemd/udev still run lvm2-pvscan services, which are supposed to recognize that they aren't supposed to run, and turn themselves into no-ops. Those services really shouldn't be run in the first place, but the systemd/udev/lvm machinery is too limited to be able to do this.) Few more comments - My observation from current systemd is - the autoactivation of LVs - is kind of 'magic' on its own - there are many puzzling issues. The reason why it's not so easy to make it no-ops is - how the systemd is supposed to react to change of lvm.conf setting. Current solution where pvscan is started - and evaluates lvm.conf ain't perfect but - user doesn't need to start/stop services of systemd. I'd say we need to start to solve the problems from its roots - and solve it from initial udev rule processing POV - topic for several new BZ. (In reply to David Teigland from comment #6) > (Part of the issue is also the fact that even when event activation is > disabled, systemd/udev still run lvm2-pvscan services, which are supposed to > recognize that they aren't supposed to run, and turn themselves into no-ops. > Those services really shouldn't be run in the first place, but the > systemd/udev/lvm machinery is too limited to be able to do this.) (Note: Actually, there would be a way - systemd supports EnvironmentFile directive to load key=value pairs from the file and then we could use ConditionEnvironment to check for the value and conditionalize the unit based on that. But that would require lvm2 to use another source of configuration besides its own and in its own format. Usually, such keys are placed in /etc/default for system-wide settings.) The more interesting point is - how to 'apply' new settings - what would be the mechanism. ATM - the change of lvm.conf (new write) - typically does mean it's the moment new setting should work. However with systemd - there might be other 'common' mechanism which makes new settings alive aka 'reload of systemd daemon'.... Should we have our own command like we had 'lvmconf' to change clustering ? Shouldn't even just a reboot like here, not result in the LVs being activated if event_activation is turned off? [root@hayes-02 ~]# grep event_activation /etc/lvm/lvm.conf # Configuration option global/event_activation. # When event_activation is disabled, the system will generally run event_activation = 0 [root@hayes-02 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices lv1 test -wi------- 10.00g /dev/sdb(0) lv10 test -wi------- 10.00g /dev/sdb(23040) lv2 test -wi------- 10.00g /dev/sdb(2560) lv3 test -wi------- 10.00g /dev/sdb(5120) lv4 test -wi------- 10.00g /dev/sdb(7680) lv5 test -wi------- 10.00g /dev/sdb(10240) lv6 test -wi------- 10.00g /dev/sdb(12800) lv7 test -wi------- 10.00g /dev/sdb(15360) lv8 test -wi------- 10.00g /dev/sdb(17920) lv9 test -wi------- 10.00g /dev/sdb(20480) [root@hayes-02 ~]# reboot -fin Rebooting. client_loop: send disconnect: Broken pipe [cmarthal@localhost mrxvt]$ [cmarthal@localhost mrxvt]$ ssh root@hayes-02 root@hayes-02's password: Activate the web console with: systemctl enable --now cockpit.socket This system is not registered to Red Hat Insights. See https://cloud.redhat.com/ To register this system, run: insights-client --register Last login: Fri Apr 16 12:06:35 2021 [root@hayes-02 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices lv1 test -wi-a----- 10.00g /dev/sdc(0) lv10 test -wi-a----- 10.00g /dev/sdc(23040) lv2 test -wi-a----- 10.00g /dev/sdc(2560) lv3 test -wi-a----- 10.00g /dev/sdc(5120) lv4 test -wi-a----- 10.00g /dev/sdc(7680) lv5 test -wi-a----- 10.00g /dev/sdc(10240) lv6 test -wi-a----- 10.00g /dev/sdc(12800) lv7 test -wi-a----- 10.00g /dev/sdc(15360) lv8 test -wi-a----- 10.00g /dev/sdc(17920) lv9 test -wi-a----- 10.00g /dev/sdc(20480) Have you (In reply to Corey Marthaler from comment #10) > Shouldn't even just a reboot like here, not result in the LVs being > activated if event_activation is turned off? Have you rebuild your initramfs with updated lvm.conf ? (as it keeps it's own copy during it build) activation in the initrd might be what's happening, I'm not sure what the best way of telling that would be. In any case, event_activation won't apply in the initrd since initrd scripts just call lvchange -ay for every device that happens to appear while initrd is running. What event_activation is really controlling is the systemd services lvm2-pvscan and lvm2-activation*, which are run by main systemd startup. So to test event_activation, you really just need to trigger uevents (udevadm trigger -c add), and check if udev+systemd+lvm cause activation. When event_activation=0, the uevents should not lead to any activations. An example of a test for this would be: VG test on PVs /dev/sda /dev/sdb set event_activation=0 vgchange -an test rm /run/lvm/pvs_online/* rm /run/lvm/vgs_online/* rm /run/lvm/pvs_lookup/* udevadm trigger --settle -c add /sys/block/sda udevadm trigger --settle -c add /sys/block/sdb lvs in test should show no active LVs If you set event_activation=1 and repeat this, then you should see active LVs in test, and you can run systemctl status lvm2-pvscan@8:X.service and systemcl status lvm2-pvscan@8:Y.service to see which one caused the activation (8:X and 8:Y being the major:minor of sda and sdb). This is what I'm using to test the new udev rule and systemd unit for rhel9 here: https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=742129a9ea5ec6631a5f228b8a79c08b2558e818 Peter explained why the udevadm test won't work for lvm2-pvscan in rhel8 and provided an alternative. The reason is that the SYSTEMD_READY variable in the udev db has to be reset from 1 back to 0 for each device. An alternative test to get around this problem is to run this for each device instead: echo 1 > /sys/block/<devname>/device/delete echo "- - -" > /sys/class/scsi_host/<host>/scan [root@null-03 ~]# vgchange -an ee 0 logical volume(s) in volume group "ee" now active [root@null-03 ~]# pvs -o+uuid /dev/sde PV VG Fmt Attr PSize PFree PV UUID /dev/sde ee lvm2 a-- 931.01g 931.00g 0TQj9A-BQbx-1JZc-WpWL-3GQH-ZszL-RW0UDI [root@null-03 ~]# lvs ee LV VG Attr LSize lvol0 ee -wi------- 4.00m [root@null-03 ~]# rm /run/lvm/pvs_online/0TQj9ABQbx1JZcWpWL3GQHZszLRW0UDI [root@null-03 ~]# rm /run/lvm/vgs_online/ee [root@null-03 ~]# systemctl stop lvm2-pvscan@8:64 [root@null-03 ~]# echo 1 > /sys/block/sde/device/delete [root@null-03 ~]# echo "- - -" > /sys/class/scsi_host/host7/scan [root@null-03 ~]# cat /run/lvm/pvs_online/0TQj9ABQbx1JZcWpWL3GQHZszLRW0UDI 8:64 vg:ee [root@null-03 ~]# ls /run/lvm/vgs_online/ee /run/lvm/vgs_online/ee [root@null-03 ~]# lvs ee LV VG Attr LSize lvol0 ee -wi-a----- 4.00m [root@null-03 ~]# systemctl status lvm2-pvscan@8:64 ● lvm2-pvscan@8:64.service - LVM2 PV scan on device 8:64 Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static; vendor preset: disabled) Active: active (exited) since Mon 2021-04-19 03:54:56 CDT; 41s ago Docs: man:pvscan(8) Process: 21407 ExecStart=/usr/sbin/lvm pvscan --cache --activate ay %i (code=exited, status=0/SUCCESS) Main PID: 21407 (code=exited, status=0/SUCCESS) Apr 19 03:54:56 null-03 systemd[1]: Starting LVM2 PV scan on device 8:64... Apr 19 03:54:56 null-03 lvm[21407]: pvscan[21407] PV /dev/sde online, VG ee is complete. Apr 19 03:54:56 null-03 lvm[21407]: pvscan[21407] VG ee run autoactivation. Apr 19 03:54:56 null-03 lvm[21407]: 1 logical volume(s) in volume group "ee" now active Apr 19 03:54:56 null-03 systemd[1]: Started LVM2 PV scan on device 8:64. Thanks for the updated test case, QA can reproduce this as well now too. kernel-4.18.0-303.el8 BUILT: Wed Mar 31 00:51:07 CDT 2021 lvm2-2.03.11-5.el8 BUILT: Fri Mar 5 07:13:31 CST 2021 [root@hayes-02 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices lv1 test -wi-a----- 100.00g /dev/sdb(0) lv2 test -wi-a----- 100.00g /dev/sdb(25600) # Find sdb is on host0: [root@hayes-02 block]# ls -lrt /sys/block | grep sdb lrwxrwxrwx. 1 root root 0 Apr 19 11:09 sdb -> ../devices/pci0000:00/0000:00:02.0/0000:03:00.0/host0/target0:2:1/0:2:1:0/block/sdb [root@hayes-02 host0]# udevadm info --query=path --name=sdb /devices/pci0000:00/0000:00:02.0/0000:03:00.0/host0/target0:2:1/0:2:1:0/block/sdb [root@hayes-02 block]# cd /sys/class/scsi_host/ [root@hayes-02 scsi_host]# ls -l host0 lrwxrwxrwx. 1 root root 0 Apr 19 11:09 host0 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.0/host0/scsi_host/host0 [root@hayes-02 ~]# grep event_activation /etc/lvm/lvm.conf # Configuration option global/event_activation. # When event_activation is disabled, the system will generally run event_activation = 0 [root@hayes-02 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv1 test -wi-a----- 100.00g lv2 test -wi-a----- 100.00g [root@hayes-02 ~]# vgchange -an test 0 logical volume(s) in volume group "test" now active [root@hayes-02 ~]# pvs -o+uuid /dev/sdb PV VG Fmt Attr PSize PFree PV UUID /dev/sdb test lvm2 a-- <1.82t 1.62t vKrtcB-S3ef-U672-DwMO-LEQ1-yLdI-LkxKGM [root@hayes-02 ~]# lvs test LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv1 test -wi------- 100.00g lv2 test -wi------- 100.00g [root@hayes-02 ~]# rm /run/lvm/pvs_online/vKrtcBS3efU672DwMOLEQ1yLdILkxKGM rm: remove regular file '/run/lvm/pvs_online/vKrtcBS3efU672DwMOLEQ1yLdILkxKGM'? y [root@hayes-02 ~]# rm /run/lvm/vgs_online/test rm: remove regular empty file '/run/lvm/vgs_online/test'? y [root@hayes-02 ~]# systemctl stop lvm2-pvscan@8:64 [root@hayes-02 ~]# echo 1 > /sys/block/sdb/device/delete [root@hayes-02 ~]# ls -lrt /sys/block | grep sdb [root@hayes-02 ~]# echo "- - -" > /sys/class/scsi_host/host0/scan [root@hayes-02 ~]# cat /run/lvm/pvs_online/vKrtcBS3efU672DwMOLEQ1yLdILkxKGM 8:16 vg:test [root@hayes-02 ~]# ls /run/lvm/vgs_online/test /run/lvm/vgs_online/test [root@hayes-02 ~]# lvs test LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv1 test -wi-a----- 100.00g lv2 test -wi-a----- 100.00g [root@hayes-02 ~]# systemctl status lvm2-pvscan@8:16 â lvm2-pvscan@8:16.service - LVM event activation on device 8:16 Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static; vendor preset: disabled) Active: active (exited) since Mon 2021-04-19 12:03:24 CDT; 1min 2s ago Docs: man:pvscan(8) Process: 3465 ExecStart=/usr/sbin/lvm pvscan --cache --activate ay 8:16 (code=exited, status=0/SUCCESS) Main PID: 3465 (code=exited, status=0/SUCCESS) Apr 19 12:03:24 hayes-02.lab.msp.redhat.com systemd[1]: Starting LVM event activation on device 8:16... Apr 19 12:03:24 hayes-02.lab.msp.redhat.com lvm[3465]: pvscan[3465] PV /dev/sdb online, VG test is complete. Apr 19 12:03:24 hayes-02.lab.msp.redhat.com lvm[3465]: pvscan[3465] VG test run autoactivation. Apr 19 12:03:24 hayes-02.lab.msp.redhat.com lvm[3465]: 2 logical volume(s) in volume group "test" now active Apr 19 12:03:24 hayes-02.lab.msp.redhat.com systemd[1]: Started LVM event activation on device 8:16. [root@hayes-02 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv1 test -wi-a----- 100.00g lv2 test -wi-a----- 100.00g Hello David, Would you backport this fix to either of these versions of LVM? The customer is running RHEL 8.1, 8.2 and 8.3. [root@miller64 /]# lvm version LVM version: 2.03.09(2)-RHEL8 (2020-05-28) Library version: 1.02.171-RHEL8 (2020-05-28) Driver version: 4.42.0 [root@kirin64 rules.d]# lvm version LVM version: 2.03.08(2)-RHEL8 (2020-02-11) Library version: 1.02.169-RHEL8 (2020-02-11) Driver version: 4.39.0 We will then be able to test this on their systems. It will also help us verify that this issue is not causing another issue they are having with activating a VG. This would be extremely helpful. Best regards, Brett Created attachment 1776000 [details]
patch backported to rhel8.3
This is a backport to rhel8.3 of "pvscan: support disabled event_activation".
It also applies cleanly to rhel8.2.
Fix verified in the latest rpms. kernel-4.18.0-310.el8 BUILT: Thu May 27 14:24:00 CDT 2021 lvm2-2.03.12-2.el8 BUILT: Tue Jun 1 06:55:37 CDT 2021 lvm2-libs-2.03.12-2.el8 BUILT: Tue Jun 1 06:55:37 CDT 2021 [root@hayes-02 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices lv1 test -wi-a----- 100.00g /dev/sdb(0) lv2 test -wi-a----- 100.00g /dev/sdb(25600) [root@hayes-02 ~]# ls -lrt /sys/block | grep sdb lrwxrwxrwx. 1 root root 0 Jun 10 14:28 sdb -> ../devices/pci0000:00/0000:00:02.0/0000:03:00.0/host0/target0:2:1/0:2:1:0/block/sdb [root@hayes-02 ~]# udevadm info --query=path --name=sdb /devices/pci0000:00/0000:00:02.0/0000:03:00.0/host0/target0:2:1/0:2:1:0/block/sdb [root@hayes-02 ~]# cd /sys/class/scsi_host/ [root@hayes-02 scsi_host]# ls -l host0 lrwxrwxrwx. 1 root root 0 Jun 10 14:28 host0 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.0/host0/scsi_host/host0 [root@hayes-02 scsi_host]# grep event_activation /etc/lvm/lvm.conf # Configuration option global/event_activation. # When event_activation is disabled, the lvm2-activation services are event_activation = 0 [root@hayes-02 scsi_host]# cd [root@hayes-02 ~]# vgchange -an test 0 logical volume(s) in volume group "test" now active [root@hayes-02 ~]# pvs -o+uuid /dev/sdb PV VG Fmt Attr PSize PFree PV UUID /dev/sdb test lvm2 a-- <1.82t 1.62t 97IB4F-Dfwt-cxgc-Wpqn-kSFf-4GRH-BpiELq [root@hayes-02 ~]# lvs test LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv1 test -wi------- 100.00g lv2 test -wi------- 100.00g [root@hayes-02 ~]# rm /run/lvm/pvs_online/97IB4FDfwtcxgcWpqnkSFf4GRHBpiELq rm: remove regular file '/run/lvm/pvs_online/97IB4FDfwtcxgcWpqnkSFf4GRHBpiELq'? y [root@hayes-02 ~]# rm /run/lvm/vgs_online/test rm: remove regular empty file '/run/lvm/vgs_online/test'? y [root@hayes-02 ~]# systemctl stop lvm2-pvscan@8:64 [root@hayes-02 ~]# echo 1 > /sys/block/sdb/device/delete [root@hayes-02 ~]# ls -lrt /sys/block | grep sdb [root@hayes-02 ~]# echo "- - -" > /sys/class/scsi_host/host0/scan [root@hayes-02 ~]# ls /run/lvm/pvs_online/ [root@hayes-02 ~]# ls /run/lvm/vgs_online [root@hayes-02 ~]# lvs test LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv1 test -wi------- 100.00g lv2 test -wi------- 100.00g [root@hayes-02 ~]# systemctl status lvm2-pvscan@8:16 â lvm2-pvscan@8:16.service - LVM event activation on device 8:16 Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static; vendor preset: disabled) Active: active (exited) since Thu 2021-06-10 14:35:05 CDT; 39s ago Docs: man:pvscan(8) Process: 2809 ExecStart=/usr/sbin/lvm pvscan --cache --activate ay 8:16 (code=exited, status=0/SUCCESS) Main PID: 2809 (code=exited, status=0/SUCCESS) Jun 10 14:35:05 hayes-02.lab.msp.redhat.com systemd[1]: Starting LVM event activation on device 8:16... Jun 10 14:35:05 hayes-02.lab.msp.redhat.com systemd[1]: Started LVM event activation on device 8:16. [root@hayes-02 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv1 test -wi------- 100.00g lv2 test -wi------- 100.00g *** Bug 1985175 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (lvm2 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4431 |