Bug 1932586
Summary: | RFE: add ability to configure forced shutdown a shared VG when sanlock locks are lost | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | David Teigland <teigland> |
Component: | lvm2 | Assignee: | David Teigland <teigland> |
lvm2 sub component: | LVM lock daemon / lvmlockd | QA Contact: | cluster-qe <cluster-qe> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | low | CC: | agk, cmackows, cmarthal, heinzm, jbrassow, mcsontos, prajnoha, teigland, zkabelac |
Version: | 8.3 | Keywords: | FutureFeature, Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | lvm2-2.03.12-2.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-11-09 19:45:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Teigland
2021-02-24 18:46:14 UTC
pushed to main: https://sourceware.org/git/?p=lvm2.git;a=commit;h=89a3440fc0179318954855aa251b0aae4f5c1a63 This doesn't change any behavior on its own, but it allows a user to create and configure their own script to shut down a sanlock VG. The feature added by this bug/commit does not change default behavior, it just adds a config option that can be used to set up automated recovery for a sanlock VG if storage access is lost. The expected behavior of a sanlock VG remains the following for node1 and node2. node1 has LV active (ex), and node2 would like to activate LV (ex). lvmlockd/sanlock/wdmd should never allow both nodes to have the LV active (ex) at the same time. While node1 remains alive with LV active, node2 can run lvchange -ay LV and should see: # lvchange -ay vg/lvol0 LV locked by other host: vg/lvol0 Failed to lock logical volume vg/lvol0. If node1 deactivates the LV, the LV lock is released by node1, and node2 can activate LV. Or, if sanlock on node1 fails to renew the lock on the LV, then the LV lock will expire, and node2 can activate the LV. The scenario of interest here is what happens when node1 fails to renew the lock. node1 will fail to renew the lock if sanlock on node1 fails to read/write the storage under the VG. After failing to renew for a period, sanlock on node1 will reset the machine using the watchdog prior to the lock expiring. This means that by the time the LV lock expires, node1 will be reset, and it's safe for node2 to get the lock and activate the LV. sanlock gives lvm the ability to avoid a watchdog reset on node1. If node1 can deactivate LVs in the expiring VG within about 40 seconds, then node1 can avoid a reset. node2 will still likely need to wait for the full expiration period before it can activate. By default, when renewals fail, lvmlockctl is run automatically and it leaves a message in syslog telling the user that they should manually deactivate LVs to avoid a reset: lvmlockctl: lvmlockd lost access to locks in VG vg. lvmlockctl: Immediately deactivate LVs in VG vg. lvmlockctl: Once VG is unused, run lvmlockctl --drop vg. If the user notices these messages, they can follow these steps to manually deactivate LVs in the VG, and then run lvmlockctl --drop vg. If they do this within about 40 seconds, node1 can avoid a wd reset. The change in this bug allows a user to configure a script (set in lvm.conf lvmlockctl_kill_command) to automate the process of deactivating LVs and running lvmlockctl drop. We don't provide a script at this point (we may in the future), but the lvmlockd man page provides some suggestion for a script. I tried testing this three different ways and was never able to get the messages mentioned in comment #3: lvmlockctl: lvmlockd lost access to locks in VG vg. lvmlockctl: Immediately deactivate LVs in VG vg. lvmlockctl: Once VG is unused, run lvmlockctl --drop vg. kernel-4.18.0-310.el8 BUILT: Thu May 27 14:24:00 CDT 2021 lvm2-2.03.12-5.el8 BUILT: Tue Jul 13 11:50:03 CDT 2021 lvm2-libs-2.03.12-5.el8 BUILT: Tue Jul 13 11:50:03 CDT 2021 sanlock-3.8.4-1.el8 BUILT: Tue Jun 1 16:16:52 CDT 2021 sanlock-lib-3.8.4-1.el8 BUILT: Tue Jun 1 16:16:52 CDT 2021 The first time i tried this, I just failed a device and watched what happened. Like mentioned in comment #3, the node with the exclusively active volume with a failed device was eventually reset. The lvmlockd messages above never appeared. Please let me know how to trigger these messages. # NODE 1 [root@host-161 ~]# systemctl status sanlock â sanlock.service - Shared Storage Lease Manager Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2021-07-16 16:46:15 CDT; 8min ago Process: 1436 ExecStart=/usr/sbin/sanlock daemon (code=exited, status=0/SUCCESS) Main PID: 1442 (sanlock) Tasks: 7 (limit: 101097) Memory: 20.9M CGroup: /system.slice/sanlock.service ââ1442 /usr/sbin/sanlock daemon ââ1443 /usr/sbin/sanlock daemon Jul 16 16:46:15 host-161.virt.lab.msp.redhat.com systemd[1]: Starting Shared Storage Lease Manager... Jul 16 16:46:15 host-161.virt.lab.msp.redhat.com systemd[1]: Started Shared Storage Lease Manager. [root@host-161 ~]# systemctl status lvmlockd â lvmlockd.service - LVM lock daemon Loaded: loaded (/usr/lib/systemd/system/lvmlockd.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2021-07-16 16:46:23 CDT; 8min ago Docs: man:lvmlockd(8) Main PID: 1460 (lvmlockd) Tasks: 4 (limit: 101097) Memory: 3.3M CGroup: /system.slice/lvmlockd.service ââ1460 /usr/sbin/lvmlockd --foreground Jul 16 16:46:23 host-161.virt.lab.msp.redhat.com systemd[1]: Starting LVM lock daemon... Jul 16 16:46:23 host-161.virt.lab.msp.redhat.com lvmlockd[1460]: [D] creating /run/lvm/lvmlockd.socket Jul 16 16:46:23 host-161.virt.lab.msp.redhat.com lvmlockd[1460]: 1626471983 lvmlockd started Jul 16 16:46:23 host-161.virt.lab.msp.redhat.com systemd[1]: Started LVM lock daemon. # NODE 2 [root@host-162 ~]# systemctl status sanlock â sanlock.service - Shared Storage Lease Manager Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2021-07-16 16:46:15 CDT; 8min ago Process: 1436 ExecStart=/usr/sbin/sanlock daemon (code=exited, status=0/SUCCESS) Main PID: 1441 (sanlock) Tasks: 8 (limit: 101097) Memory: 24.9M CGroup: /system.slice/sanlock.service ââ1441 /usr/sbin/sanlock daemon ââ1442 /usr/sbin/sanlock daemon Jul 16 16:46:15 host-162.virt.lab.msp.redhat.com systemd[1]: Starting Shared Storage Lease Manager... Jul 16 16:46:15 host-162.virt.lab.msp.redhat.com systemd[1]: Started Shared Storage Lease Manager. [root@host-162 ~]# systemctl status lvmlockd â lvmlockd.service - LVM lock daemon Loaded: loaded (/usr/lib/systemd/system/lvmlockd.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2021-07-16 16:46:23 CDT; 8min ago Docs: man:lvmlockd(8) Main PID: 1459 (lvmlockd) Tasks: 5 (limit: 101097) Memory: 3.4M CGroup: /system.slice/lvmlockd.service ââ1459 /usr/sbin/lvmlockd --foreground Jul 16 16:46:23 host-162.virt.lab.msp.redhat.com systemd[1]: Starting LVM lock daemon... Jul 16 16:46:23 host-162.virt.lab.msp.redhat.com lvmlockd[1459]: [D] creating /run/lvm/lvmlockd.socket Jul 16 16:46:23 host-162.virt.lab.msp.redhat.com lvmlockd[1459]: 1626471983 lvmlockd started Jul 16 16:46:23 host-162.virt.lab.msp.redhat.com systemd[1]: Started LVM lock daemon. [root@host-162 ~]# vgcreate --shared vg /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sdf1 /dev/sdg1 Logical volume "lvmlock" created. Volume group "vg" successfully created VG vg starting sanlock lockspace Starting locking. Waiting until locks are ready... [root@host-161 ~]# vgchange --lock-start vg VG vg starting sanlock lockspace Starting locking. Waiting for sanlock may take 20 sec to 3 min... # Second Scenario, the device to be failed is NOT also the [lvmlock] device # NODE 1 [root@host-161 ~]# lvcreate -aye -L 500M vg Logical volume "lvol0" created. [root@host-161 ~]# lvcreate -aye -L 500M vg /dev/sdc1 Logical volume "lvol1" created. [root@host-161 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices [lvmlock] global -wi-ao---- 256.00m /dev/sdb1(0) [lvmlock] vg -wi-ao---- 256.00m /dev/sda1(0) lvol0 vg -wi-a----- 500.00m /dev/sda1(64) lvol1 vg -wi-a----- 500.00m /dev/sdc1(0) # NODE 2 [root@host-162 ~]# lvchange -aye vg/lvol1 LV locked by other host: vg/lvol1 Failed to lock logical volume vg/lvol1. # NODE 1 [root@host-161 ~]# echo "offline" > /sys/block/sdc/device/state [root@host-161 ~]# pvscan Error reading device /dev/sdc1 at 0 length 4096. PV /dev/sdb1 VG global lvm2 [<30.00 GiB / <29.75 GiB free] WARNING: Couldn't find device with uuid AARhu6-SWNn-F0HI-ndTb-wa0D-xB0k-cV7ZpZ. WARNING: VG vg is missing PV AARhu6-SWNn-F0HI-ndTb-wa0D-xB0k-cV7ZpZ (last written to /dev/sdc1). WARNING: Couldn't find all devices for LV vg/lvol1 while checking used and assumed devices. PV /dev/sda1 VG vg lvm2 [<30.00 GiB / <29.26 GiB free] PV [unknown] VG vg lvm2 [<30.00 GiB / <29.51 GiB free] PV /dev/sdd1 VG vg lvm2 [<30.00 GiB / <30.00 GiB free] PV /dev/sdf1 VG vg lvm2 [<30.00 GiB / <30.00 GiB free] PV /dev/sdg1 VG vg lvm2 [<30.00 GiB / <30.00 GiB free] Total: 6 [<179.98 GiB] / in use: 6 [<179.98 GiB] / in no VG: 0 [0 ] Jul 16 17:02:49 host-161 kernel: sd 0:0:0:2: rejecting I/O to offline device Jul 16 17:02:49 host-161 kernel: blk_update_request: I/O error, dev sdc, sector 40 op 0x0:(READ) flags 0x0 phys_seg 9 prio class 0 Jul 16 17:02:49 host-161 kernel: blk_update_request: I/O error, dev sdc, sector 40 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0 Jul 16 17:03:52 host-161 kernel: blk_update_request: I/O error, dev sdc, sector 40 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0 [root@host-161 ~]# lvchange -an vg WARNING: Couldn't find device with uuid AARhu6-SWNn-F0HI-ndTb-wa0D-xB0k-cV7ZpZ. WARNING: VG vg is missing PV AARhu6-SWNn-F0HI-ndTb-wa0D-xB0k-cV7ZpZ (last written to /dev/sdc1). WARNING: Couldn't find all devices for LV vg/lvol1 while checking used and assumed devices. [root@host-161 ~]# lvs -a -o +devices WARNING: Couldn't find device with uuid AARhu6-SWNn-F0HI-ndTb-wa0D-xB0k-cV7ZpZ. WARNING: VG vg is missing PV AARhu6-SWNn-F0HI-ndTb-wa0D-xB0k-cV7ZpZ (last written to /dev/sdc1). LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices [lvmlock] global -wi-ao---- 256.00m /dev/sdb1(0) [lvmlock] vg -wi-ao---- 256.00m /dev/sda1(0) lvol0 vg -wi------- 500.00m /dev/sda1(64) lvol1 vg -wi-----p- 500.00m [unknown](0) # NODE 2 is able to activate even without a lockdrop on node 1 [root@host-162 ~]# lvchange -aye vg/lvol1 [root@host-162 ~]# # Third scenario, the device to be failed IS also the [lvmlock] device [root@host-161 ~]# lvchange -aye vg/lvol0 [root@host-161 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices [lvmlock] global -wi-ao---- 256.00m /dev/sdb1(0) [lvmlock] vg -wi-ao---- 256.00m /dev/sda1(0) lvol0 vg -wi-a----- 500.00m /dev/sda1(64) lvol1 vg -wi------- 500.00m /dev/sdc1(0) [root@host-162 ~]# lvchange -aye vg/lvol0 LV locked by other host: vg/lvol0 Failed to lock logical volume vg/lvol0. [root@host-161 ~]# echo "offline" > /sys/block/sda/device/state Jul 16 17:24:58 host-161 kernel: blk_update_request: I/O error, dev sda, sector 2088 op 0x0:(READ) flags 0x0 phys_seg 48 prio class 0 Jul 16 17:24:58 host-161 sanlock[1426]: 2021-07-16 17:24:58 1116 [1471]: s2 renewal error -5 delta_length 0 last_success 1056 Jul 16 17:24:58 host-161 sanlock[1426]: 2021-07-16 17:24:58 1116 [1426]: s2 check_our_lease warning 60 last_success 1056 Jul 16 17:24:58 host-161 sanlock[1426]: 2021-07-16 17:24:58 1116 [1471]: s2 delta_renew read rv -5 offset 0 /dev/mapper/vg-lvmlock Jul 16 17:24:58 host-161 kernel: blk_update_request: I/O error, dev sda, sector 2088 op 0x0:(READ) flags 0x0 phys_seg 48 prio class 0 Jul 16 17:24:58 host-161 sanlock[1426]: 2021-07-16 17:24:58 1116 [1471]: s2 renewal error -5 delta_length 0 last_success 1056 Jul 16 17:24:59 host-161 kernel: blk_update_request: I/O error, dev sda, sector 2088 op 0x0:(READ) flags 0x0 phys_seg 48 prio class 0 Jul 16 17:24:59 host-161 sanlock[1426]: 2021-07-16 17:24:59 1117 [1471]: s2 delta_renew read rv -5 offset 0 /dev/mapper/vg-lvmlock Jul 16 17:24:59 host-161 sanlock[1426]: 2021-07-16 17:24:59 1117 [1471]: s2 renewal error -5 delta_length 0 last_success 1056 Jul 16 17:24:59 host-161 sanlock[1426]: 2021-07-16 17:24:59 1117 [1426]: s2 check_our_lease warning 61 last_success 1056 Jul 16 17:24:59 host-161 sanlock[1426]: 2021-07-16 17:24:59 1117 [1471]: s2 delta_renew read rv -5 offset 0 /dev/mapper/vg-lvmlock Jul 16 17:24:59 host-161 kernel: blk_update_request: I/O error, dev sda, sector 2088 op 0x0:(READ) flags 0x0 phys_seg 48 prio class 0 Jul 16 17:24:59 host-161 sanlock[1426]: 2021-07-16 17:24:59 1117 [1471]: s2 renewal error -5 delta_length 0 last_success 1056 Jul 16 17:25:00 host-161 kernel: blk_update_request: I/O error, dev sda, sector 2088 op 0x0:(READ) flags 0x0 phys_seg 48 prio class 0 Jul 16 17:25:00 host-161 sanlock[1426]: 2021-07-16 17:25:00 1118 [1471]: s2 delta_renew read rv -5 offset 0 /dev/mapper/vg-lvmlock Jul 16 17:25:00 host-161 sanlock[1426]: 2021-07-16 17:25:00 1118 [1471]: s2 renewal error -5 delta_length 0 last_success 1056 Jul 16 17:25:00 host-161 sanlock[1426]: 2021-07-16 17:25:00 1118 [1426]: s2 check_our_lease warning 62 last_success 1056 # Wait a few seconds... [root@host-161 ~]# lvmlockctl --drop vg Jul 16 17:25:00 host-161 lvmlockctl[1497]: Dropping locks for VG vg. [root@host-161 ~]# lvchange -an vg/lvol0 WARNING: Couldn't find device with uuid wgvdke-18Cc-fY1s-cvyp-RAsI-NU7R-N6Kyrv. WARNING: VG vg is missing PV wgvdke-18Cc-fY1s-cvyp-RAsI-NU7R-N6Kyrv (last written to /dev/sda1). WARNING: Couldn't find all devices for LV vg/lvol0 while checking used and assumed devices. WARNING: Couldn't find all devices for LV vg/lvmlock while checking used and assumed devices. Reading VG vg without a lock. LV vg/lvol0 lock failed: lockspace is inactive Failed to unlock logical volume vg/lvol0. [root@host-161 ~]# lvs -a -o +devices WARNING: Couldn't find device with uuid wgvdke-18Cc-fY1s-cvyp-RAsI-NU7R-N6Kyrv. WARNING: VG vg is missing PV wgvdke-18Cc-fY1s-cvyp-RAsI-NU7R-N6Kyrv (last written to /dev/sda1). WARNING: Couldn't find all devices for LV vg/lvmlock while checking used and assumed devices. Reading VG vg without a lock. LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices [lvmlock] global -wi-ao---- 256.00m /dev/sdb1(0) [lvmlock] vg -wi-a---p- 256.00m [unknown](0) lvol0 vg -wi-----p- 500.00m [unknown](64) lvol1 vg -wi------- 500.00m /dev/sdc1(0) # NODE 2 # Wait awhile [root@host-162 ~]# lvchange -aye vg/lvol0 LV locked by other host: vg/lvol0 Failed to lock logical volume vg/lvol0. # Wait awhile longer, and the activation works [root@host-162 ~]# [root@host-162 ~]# [root@host-162 ~]# lvchange -aye vg/lvol0 I think there are three requirements to get that message; it's not clear if they were all true for one of the scenarios above. - the VG must have an active LV - offline the PV that holds the hidden lvmlock LV - don't run lvmlockctl drop until after the messages appear In real world scenarios you don't know when a disk failure is going to happen, so when one happens, you first notice it by seeing the messages. In response, the user deactivates LVs in the VG, and after this runs lvmlockctl --drop to tell lvmlockd that the LVs are safely shut down. [root@null-03 ~]# lvs -a bbsan -o+devices LV VG Attr LSize Devices [lvmlock] bbsan -wi-ao---- 256.00m /dev/sdb(0) lvol0 bbsan -wi-a----- 4.00m /dev/sdb(64) lvol1 bbsan -wi-a----- 4.00m /dev/sdb(65) [root@null-03 ~]# echo "offline" > /sys/block/sdb/device/state [root@null-03 ~]# pvs Global lock failed: error -221. [root@null-03 ~]# Broadcast message from systemd-journald@null-03 (Mon 2021-07-19 04:52:46 CDT): lvmlockctl[6506]: lvmlockd lost access to locks in VG bbsan. Broadcast message from systemd-journald@null-03 (Mon 2021-07-19 04:52:46 CDT): lvmlockctl[6506]: Immediately deactivate LVs in VG bbsan. Broadcast message from systemd-journald@null-03 (Mon 2021-07-19 04:52:46 CDT): lvmlockctl[6506]: Once VG is unused, run lvmlockctl --drop bbsan. Message from syslogd@null-03 at Jul 19 04:52:46 ... lvmlockctl: lvmlockd lost access to locks in VG bbsan. Message from syslogd@null-03 at Jul 19 04:52:46 ... lvmlockctl: Immediately deactivate LVs in VG bbsan. Message from syslogd@null-03 at Jul 19 04:52:46 ... lvmlockctl: Once VG is unused, run lvmlockctl --drop bbsan. [root@null-03 ~]# journalctl | grep lvmlockctl Jul 19 04:52:46 null-03 lvmlockctl[6506]: lvmlockd lost access to locks in VG bbsan. Jul 19 04:52:46 null-03 lvmlockctl[6506]: Immediately deactivate LVs in VG bbsan. Jul 19 04:52:46 null-03 lvmlockctl[6506]: Once VG is unused, run lvmlockctl --drop bbsan. We learned the discrepancies we experienced were due to selinux bug 1985000. With selinux set to permissive, i am now able to verify the correct behavior. kernel-4.18.0-310.el8 BUILT: Thu May 27 14:24:00 CDT 2021 lvm2-2.03.12-5.el8 BUILT: Tue Jul 13 11:50:03 CDT 2021 lvm2-libs-2.03.12-5.el8 BUILT: Tue Jul 13 11:50:03 CDT 2021 sanlock-3.8.4-1.el8 BUILT: Tue Jun 1 16:16:52 CDT 2021 sanlock-lib-3.8.4-1.el8 BUILT: Tue Jun 1 16:16:52 CDT 2021 [root@host-162 ~]# getenforce Permissive [root@host-162 ~]# echo "offline" > /sys/block/sdb/device/state [root@host-162 ~]# pvs Error reading device /dev/sdb1 at 0 length 4096. VG vg lock skipped: error -221 WARNING: Couldn't find device with uuid VzqX0V-SNRI-CzkW-VGU6-rozY-BWmN-InTaWH. WARNING: VG vg is missing PV VzqX0V-SNRI-CzkW-VGU6-rozY-BWmN-InTaWH (last written to /dev/sdb1). WARNING: Couldn't find all devices for LV vg/lvol0 while checking used and assumed devices. WARNING: Couldn't find all devices for LV vg/lvol1 while checking used and assumed devices. WARNING: Couldn't find all devices for LV vg/lvol2 while checking used and assumed devices. WARNING: Couldn't find all devices for LV vg/lvmlock while checking used and assumed devices. Reading VG vg without a lock. PV VG Fmt Attr PSize PFree /dev/sda1 vg lvm2 a-- <15.00g <15.00g /dev/sda2 lvm2 --- <15.00g <15.00g /dev/sdc1 vg lvm2 a-- <15.00g <15.00g /dev/sdc2 lvm2 --- <15.00g <15.00g /dev/sdd1 lvm2 --- <15.00g <15.00g /dev/sdd2 global lvm2 a-- <15.00g <14.75g /dev/sde1 vg lvm2 a-- <15.00g <15.00g /dev/sde2 lvm2 --- <15.00g <15.00g /dev/sdf1 vg lvm2 a-- <15.00g <15.00g /dev/sdg1 vg lvm2 a-- <15.00g <15.00g /dev/sdh1 vg lvm2 a-- <15.00g <15.00g /dev/sdh2 lvm2 --- <15.00g <15.00g [unknown] vg lvm2 a-m <15.00g 14.73g [root@host-162 ~]# Broadcast message from systemd-journald.lab.msp.redhat.com (Thu 2021-07-22 11:08:18 CDT): lvmlockctl[1554]: lvmlockd lost access to locks in VG vg. Broadcast message from systemd-journald.lab.msp.redhat.com (Thu 2021-07-22 11:08:18 CDT): lvmlockctl[1554]: Immediately deactivate LVs in VG vg. Broadcast message from systemd-journald.lab.msp.redhat.com (Thu 2021-07-22 11:08:18 CDT): lvmlockctl[1554]: Once VG is unused, run lvmlockctl --drop vg. Message from syslogd@host-162 at Jul 22 11:08:18 ... lvmlockctl[1554]:lvmlockd lost access to locks in VG vg. Message from syslogd@host-162 at Jul 22 11:08:18 ... lvmlockctl[1554]:Immediately deactivate LVs in VG vg. Message from syslogd@host-162 at Jul 22 11:08:18 ... lvmlockctl[1554]:Once VG is unused, run lvmlockctl --drop vg. [root@host-162 ~]# lvchange -an vg VG vg lock skipped: storage failed for sanlock leases WARNING: Couldn't find device with uuid VzqX0V-SNRI-CzkW-VGU6-rozY-BWmN-InTaWH. WARNING: VG vg is missing PV VzqX0V-SNRI-CzkW-VGU6-rozY-BWmN-InTaWH (last written to /dev/sdb1). WARNING: Couldn't find all devices for LV vg/lvol0 while checking used and assumed devices. WARNING: Couldn't find all devices for LV vg/lvol1 while checking used and assumed devices. WARNING: Couldn't find all devices for LV vg/lvol2 while checking used and assumed devices. WARNING: Couldn't find all devices for LV vg/lvmlock while checking used and assumed devices. Reading VG vg without a lock. [root@host-162 ~]# lvmlockctl --drop vg # Other node [root@host-161 ~]# lvchange -aye vg/lvol0 [root@host-161 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices [lvmlock] global -wi-ao---- 256.00m /dev/sdd2(0) [lvmlock] vg -wi-ao---- 256.00m /dev/sdb1(0) lvol0 vg -wi-a----- 4.00m /dev/sdb1(64) lvol1 vg -wi------- 4.00m /dev/sdb1(65) lvol2 vg -wi------- 4.00m /dev/sdb1(66) Marking this VERIFIED with the latest rpms, with the caveat that was with selinux turned off as selinux bug 1985000 blocks this feature. The automated script was verified to properly drop locks automatically in order to avoid being shut down, and to allow the other nodes in the cluster to exclusively activate shared vg. kernel-4.18.0-310.el8 BUILT: Thu May 27 14:24:00 CDT 2021 lvm2-2.03.12-6.el8 BUILT: Tue Aug 3 07:23:05 CDT 2021 lvm2-libs-2.03.12-6.el8 BUILT: Tue Aug 3 07:23:05 CDT 2021 sanlock-3.8.4-1.el8 BUILT: Tue Jun 1 16:16:52 CDT 2021 sanlock-lib-3.8.4-1.el8 BUILT: Tue Jun 1 16:16:52 CDT 2021 [root@host-161 ~]# cat /usr/sbin/my_vg_kill_script.sh #!/bin/bash VG=$1 # replace dm table with the error target for top level LVs dmsetup wipe_table -S "uuid=~LVM && vgname=$VG && lv_layer=\"\"" # check that the error target is in place dmsetup table -c -S "uuid=~LVM && vgname=$VG && lv_layer=\"\"" |grep -vw error if [[ $? -ne 0 ]] ; then exit 0 fi exit 1 [root@host-161 ~]# grep lvmlockctl_kill_command /etc/lvm/lvm.conf # Configuration option global/lvmlockctl_kill_command. # lvmlockctl_kill_command = "" lvmlockctl_kill_command="/usr/sbin/my_vg_kill_script.sh" [root@host-161 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices [lvmlock] global -wi-ao---- 256.00m /dev/sdd2(0) [lvmlock] vg -wi-ao---- 256.00m /dev/sdb1(0) lvol0 vg -wi-a----- 4.00m /dev/sdb1(64) lvol1 vg -wi-a----- 4.00m /dev/sdb1(65) lvol2 vg -wi-a----- 4.00m /dev/sdb1(66) [root@host-161 ~]# echo "offline" > /sys/block/sdb/device/state [root@host-161 ~]# pvs Error reading device /dev/sdb1 at 0 length 4096. VG vg lock skipped: error -221 WARNING: Couldn't find device with uuid VzqX0V-SNRI-CzkW-VGU6-rozY-BWmN-InTaWH. WARNING: VG vg is missing PV VzqX0V-SNRI-CzkW-VGU6-rozY-BWmN-InTaWH (last written to /dev/sdb1). WARNING: Couldn't find all devices for LV vg/lvol0 while checking used and assumed devices. WARNING: Couldn't find all devices for LV vg/lvol1 while checking used and assumed devices. WARNING: Couldn't find all devices for LV vg/lvol2 while checking used and assumed devices. WARNING: Couldn't find all devices for LV vg/lvmlock while checking used and assumed devices. Reading VG vg without a lock. PV VG Fmt Attr PSize PFree /dev/sda1 vg lvm2 a-- <15.00g <15.00g /dev/sda2 lvm2 --- <15.00g <15.00g /dev/sdc1 vg lvm2 a-- <15.00g <15.00g /dev/sdc2 lvm2 --- <15.00g <15.00g /dev/sdd1 lvm2 --- <15.00g <15.00g /dev/sdd2 global lvm2 a-- <15.00g <14.75g /dev/sde1 vg lvm2 a-- <15.00g <15.00g /dev/sde2 lvm2 --- <15.00g <15.00g /dev/sdf1 vg lvm2 a-- <15.00g <15.00g /dev/sdg1 vg lvm2 a-- <15.00g <15.00g /dev/sdh1 vg lvm2 a-- <15.00g <15.00g /dev/sdh2 lvm2 --- <15.00g <15.00g [unknown] vg lvm2 a-m <15.00g 14.73g [root@host-161 ~]# Broadcast message from systemd-journald.lab.msp.redhat.com (Thu 2021-08-05 13:40:51 CDT): lvmlockctl[1604]: lvmlockd lost access to locks in VG vg. Message from syslogd@host-161 at Aug 5 13:40:51 ... lvmlockctl[1604]:lvmlockd lost access to locks in VG vg. Aug 5 13:40:51 host-161 sanlock[1424]: 2021-08-05 13:40:51 4034 [1424]: s2 check_our_lease failed 80 Aug 5 13:40:51 host-161 sanlock[1424]: 2021-08-05 13:40:51 4034 [1424]: s2 kill 1453 sig 100 count 1 Aug 5 13:40:51 host-161 lvmlockctl[1604]: lvmlockd lost access to locks in VG vg. Aug 5 13:40:51 host-161 sanlock[1424]: 2021-08-05 13:40:51 4034 [1543]: s2 delta_renew read rv -5 offset 0 /dev/mapper/vg-lvmlock Aug 5 13:40:51 host-161 kernel: blk_update_request: I/O error, dev sdb, sector 2088 op 0x0:(READ) flags 0x0 phys_seg 39 prio class 0 Aug 5 13:40:51 host-161 sanlock[1424]: 2021-08-05 13:40:51 4034 [1543]: s2 renewal error -5 delta_length 0 last_success 3954 Aug 5 13:40:51 host-161 kernel: Buffer I/O error on dev dm-4, logical block 1008, async page read Aug 5 13:40:51 host-161 kernel: Buffer I/O error on dev dm-3, logical block 1008, async page read Aug 5 13:40:51 host-161 kernel: Buffer I/O error on dev dm-2, logical block 1008, async page read Aug 5 13:40:51 host-161 kernel: Buffer I/O error on dev dm-1, logical block 65520, async page read Aug 5 13:40:52 host-161 lvmlockctl[1604]: Successful VG vg kill command /usr/sbin/my_vg_kill_script.sh vg Aug 5 13:40:52 host-161 wdmd[1438]: /dev/watchdog0 reopen Aug 5 13:40:52 host-161 sanlock[1424]: 2021-08-05 13:40:52 4035 [1424]: s2 all pids clear Aug 5 13:40:52 host-161 lvmlockd[1453]: 1628188852 S lvm_vg rem_lockspace_san error -115 Aug 5 13:40:52 host-161 sanlock[1424]: 2021-08-05 13:40:52 4035 [1543]: s2 delta_renew read rv -5 offset 0 /dev/mapper/vg-lvmlock Aug 5 13:40:52 host-161 sanlock[1424]: 2021-08-05 13:40:52 4035 [1543]: s2 renewal error -5 delta_length 0 last_success 3954 Aug 5 13:40:52 host-161 kernel: Buffer I/O error on dev dm-1, logical block 65520, async page read # Other node [root@host-162 ~]# lvchange -aye vg [root@host-162 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices [lvmlock] global -wi-ao---- 256.00m /dev/sdd2(0) [lvmlock] vg -wi-ao---- 256.00m /dev/sdb1(0) lvol0 vg -wi-a----- 4.00m /dev/sdb1(64) lvol1 vg -wi-a----- 4.00m /dev/sdb1(65) lvol2 vg -wi-a----- 4.00m /dev/sdb1(66) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (lvm2 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4431 |