RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1932586 - RFE: add ability to configure forced shutdown a shared VG when sanlock locks are lost
Summary: RFE: add ability to configure forced shutdown a shared VG when sanlock locks ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: lvm2
Version: 8.3
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: ---
Assignee: David Teigland
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-24 18:46 UTC by David Teigland
Modified: 2021-11-10 08:52 UTC (History)
9 users (show)

Fixed In Version: lvm2-2.03.12-2.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-09 19:45:25 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:4431 0 None None None 2021-11-09 19:45:43 UTC

Description David Teigland 2021-02-24 18:46:14 UTC
Description of problem:

When sanlock locks are lost, the shared VG must be shut down or the system will be reset by the watchdog.  The shutdown process is described in the lvmlockd man page under "sanlock lease storage failure", copied below.  It's currently a manual process, but as suggested in the man page it should be automated.

There was some upstream contribution to automate this that was never merged:
https://listman.redhat.com/archives/lvm-devel/2017-September/msg00011.html

Now there is another upstream request for this:
https://listman.redhat.com/archives/lvm-devel/2021-February/msg00077.html


sanlock lease storage failure

If the PV under a sanlock VG's lvmlock LV is disconnected, unresponsive
or too slow, sanlock cannot renew the lease for the VG's locks.   After  
some  time,  the lease will expire, and locks that the host owns in the 
VG can be acquired by other hosts.  The VG must be forcibly deactivated
on  the host with the expiring lease before other hosts can acquire its
locks. 

When the sanlock daemon detects that the lease storage is lost, it runs
the  command  lvmlockctl  --kill <vgname>.  This command emits a syslog 
message stating that lease storage is lost for the VG, and LVs must  be
immediately deactivated.

If  no  LVs  are  active in the VG, then the lockspace with an expiring
lease will be removed, and errors will be reported when trying  to  use
the VG.  Use the lvmlockctl --drop command to clear the stale lockspace
from lvmlockd.

If the VG has active LVs when the lock storage is lost, the LVs must be
quickly  deactivated before the lockspace lease expires.  After all LVs
are deactivated, run lvmlockctl --drop <vgname> to clear  the  expiring
lockspace  from  lvmlockd.   If  all  LVs in the VG are not deactivated
within about 40 seconds, sanlock uses wdmd and the  local  watchdog  to
reset  the  host.   The  machine  reset is effectively a severe form of
"deactivating" LVs before they can be activated on  other  hosts.   The
reset is considered a better alternative than having LVs used by multi‐ 
ple hosts at once, which could easily damage or destroy their content.

In the future, the lvmlockctl kill command may automatically attempt to
forcibly  deactivate LVs before the sanlock lease expires.  Until then,
the user must notice the syslog message and manually deactivate the  VG
before sanlock resets the machine.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 David Teigland 2021-03-25 14:57:15 UTC
pushed to main:
https://sourceware.org/git/?p=lvm2.git;a=commit;h=89a3440fc0179318954855aa251b0aae4f5c1a63

This doesn't change any behavior on its own, but it allows a user to create and configure their own script to shut down a sanlock VG.

Comment 3 David Teigland 2021-04-14 20:15:54 UTC
The feature added by this bug/commit does not change default behavior, it just adds a config option that can be used to set up automated recovery for a sanlock VG if storage access is lost.

The expected behavior of a sanlock VG remains the following for node1 and node2.

node1 has LV active (ex), and node2 would like to activate LV (ex).  lvmlockd/sanlock/wdmd should never allow both nodes to have the LV active (ex) at the same time.  While node1 remains alive with LV active, node2 can run lvchange -ay LV and should see:

# lvchange -ay vg/lvol0
  LV locked by other host: vg/lvol0
  Failed to lock logical volume vg/lvol0.

If node1 deactivates the LV, the LV lock is released by node1, and node2 can activate LV.

Or, if sanlock on node1 fails to renew the lock on the LV, then the LV lock will expire, and node2 can activate the LV.

The scenario of interest here is what happens when node1 fails to renew the lock.  node1 will fail to renew the lock if sanlock on node1 fails to read/write the storage under the VG.  

After failing to renew for a period, sanlock on node1 will reset the machine using the watchdog prior to the lock expiring.  This means that by the time the LV lock expires, node1 will be reset, and it's safe for node2 to get the lock and activate the LV.

sanlock gives lvm the ability to avoid a watchdog reset on node1.  If node1 can deactivate LVs in the expiring VG within about 40 seconds, then node1 can avoid a reset.  node2 will still likely need to wait for the full expiration period before it can activate.

By default, when renewals fail, lvmlockctl is run automatically and it leaves a message in syslog telling the user that they should manually deactivate LVs to avoid a reset:

  lvmlockctl: lvmlockd lost access to locks in VG vg.
  lvmlockctl: Immediately deactivate LVs in VG vg.
  lvmlockctl: Once VG is unused, run lvmlockctl --drop vg.

If the user notices these messages, they can follow these steps to manually deactivate LVs in the VG, and then run lvmlockctl --drop vg.
If they do this within about 40 seconds, node1 can avoid a wd reset.

The change in this bug allows a user to configure a script (set in lvm.conf lvmlockctl_kill_command) to automate the process of deactivating LVs and running lvmlockctl drop.  We don't provide a script at this point (we may in the future), but the lvmlockd man page provides some suggestion for a script.

Comment 12 Corey Marthaler 2021-07-17 00:40:46 UTC
I tried testing this three different ways and was never able to get the messages mentioned in comment #3:

lvmlockctl: lvmlockd lost access to locks in VG vg.
lvmlockctl: Immediately deactivate LVs in VG vg.
lvmlockctl: Once VG is unused, run lvmlockctl --drop vg.

kernel-4.18.0-310.el8    BUILT: Thu May 27 14:24:00 CDT 2021
lvm2-2.03.12-5.el8    BUILT: Tue Jul 13 11:50:03 CDT 2021
lvm2-libs-2.03.12-5.el8    BUILT: Tue Jul 13 11:50:03 CDT 2021
sanlock-3.8.4-1.el8    BUILT: Tue Jun  1 16:16:52 CDT 2021
sanlock-lib-3.8.4-1.el8    BUILT: Tue Jun  1 16:16:52 CDT 2021


The first time i tried this, I just failed a device and watched what happened. Like mentioned in comment #3, the node with the exclusively active volume with a failed device was eventually reset. The lvmlockd messages above never appeared. Please let me know how to trigger these messages. 



# NODE 1
[root@host-161 ~]# systemctl status sanlock
â sanlock.service - Shared Storage Lease Manager
   Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2021-07-16 16:46:15 CDT; 8min ago
  Process: 1436 ExecStart=/usr/sbin/sanlock daemon (code=exited, status=0/SUCCESS)
 Main PID: 1442 (sanlock)
    Tasks: 7 (limit: 101097)
   Memory: 20.9M
   CGroup: /system.slice/sanlock.service
           ââ1442 /usr/sbin/sanlock daemon
           ââ1443 /usr/sbin/sanlock daemon

Jul 16 16:46:15 host-161.virt.lab.msp.redhat.com systemd[1]: Starting Shared Storage Lease Manager...
Jul 16 16:46:15 host-161.virt.lab.msp.redhat.com systemd[1]: Started Shared Storage Lease Manager.
[root@host-161 ~]# systemctl status lvmlockd
â lvmlockd.service - LVM lock daemon
   Loaded: loaded (/usr/lib/systemd/system/lvmlockd.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2021-07-16 16:46:23 CDT; 8min ago
     Docs: man:lvmlockd(8)
 Main PID: 1460 (lvmlockd)
    Tasks: 4 (limit: 101097)
   Memory: 3.3M
   CGroup: /system.slice/lvmlockd.service
           ââ1460 /usr/sbin/lvmlockd --foreground

Jul 16 16:46:23 host-161.virt.lab.msp.redhat.com systemd[1]: Starting LVM lock daemon...
Jul 16 16:46:23 host-161.virt.lab.msp.redhat.com lvmlockd[1460]: [D] creating /run/lvm/lvmlockd.socket
Jul 16 16:46:23 host-161.virt.lab.msp.redhat.com lvmlockd[1460]: 1626471983 lvmlockd started
Jul 16 16:46:23 host-161.virt.lab.msp.redhat.com systemd[1]: Started LVM lock daemon.



# NODE 2
[root@host-162 ~]# systemctl status sanlock
â sanlock.service - Shared Storage Lease Manager
   Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2021-07-16 16:46:15 CDT; 8min ago
  Process: 1436 ExecStart=/usr/sbin/sanlock daemon (code=exited, status=0/SUCCESS)
 Main PID: 1441 (sanlock)
    Tasks: 8 (limit: 101097)
   Memory: 24.9M
   CGroup: /system.slice/sanlock.service
           ââ1441 /usr/sbin/sanlock daemon
           ââ1442 /usr/sbin/sanlock daemon

Jul 16 16:46:15 host-162.virt.lab.msp.redhat.com systemd[1]: Starting Shared Storage Lease Manager...
Jul 16 16:46:15 host-162.virt.lab.msp.redhat.com systemd[1]: Started Shared Storage Lease Manager.
[root@host-162 ~]# systemctl status lvmlockd
â lvmlockd.service - LVM lock daemon
   Loaded: loaded (/usr/lib/systemd/system/lvmlockd.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2021-07-16 16:46:23 CDT; 8min ago
     Docs: man:lvmlockd(8)
 Main PID: 1459 (lvmlockd)
    Tasks: 5 (limit: 101097)
   Memory: 3.4M
   CGroup: /system.slice/lvmlockd.service
           ââ1459 /usr/sbin/lvmlockd --foreground

Jul 16 16:46:23 host-162.virt.lab.msp.redhat.com systemd[1]: Starting LVM lock daemon...
Jul 16 16:46:23 host-162.virt.lab.msp.redhat.com lvmlockd[1459]: [D] creating /run/lvm/lvmlockd.socket
Jul 16 16:46:23 host-162.virt.lab.msp.redhat.com lvmlockd[1459]: 1626471983 lvmlockd started
Jul 16 16:46:23 host-162.virt.lab.msp.redhat.com systemd[1]: Started LVM lock daemon.

[root@host-162 ~]# vgcreate --shared  vg /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sdf1 /dev/sdg1
  Logical volume "lvmlock" created.
  Volume group "vg" successfully created
  VG vg starting sanlock lockspace
  Starting locking.  Waiting until locks are ready...

[root@host-161 ~]# vgchange --lock-start vg
  VG vg starting sanlock lockspace
  Starting locking.  Waiting for sanlock may take 20 sec to 3 min...



# Second Scenario, the device to be failed is NOT also the [lvmlock] device

# NODE 1
[root@host-161 ~]# lvcreate -aye  -L 500M vg 
  Logical volume "lvol0" created.
[root@host-161 ~]# lvcreate -aye  -L 500M vg /dev/sdc1
  Logical volume "lvol1" created.
[root@host-161 ~]# lvs -a -o +devices
  LV        VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices      
  [lvmlock] global -wi-ao---- 256.00m                                                     /dev/sdb1(0) 
  [lvmlock] vg     -wi-ao---- 256.00m                                                     /dev/sda1(0) 
  lvol0     vg     -wi-a----- 500.00m                                                     /dev/sda1(64)
  lvol1     vg     -wi-a----- 500.00m                                                     /dev/sdc1(0) 

# NODE 2
[root@host-162 ~]# lvchange -aye vg/lvol1
  LV locked by other host: vg/lvol1
  Failed to lock logical volume vg/lvol1.


# NODE 1
[root@host-161 ~]#  echo "offline" > /sys/block/sdc/device/state

[root@host-161 ~]# pvscan
  Error reading device /dev/sdc1 at 0 length 4096.
  PV /dev/sdb1   VG global          lvm2 [<30.00 GiB / <29.75 GiB free]
  WARNING: Couldn't find device with uuid AARhu6-SWNn-F0HI-ndTb-wa0D-xB0k-cV7ZpZ.
  WARNING: VG vg is missing PV AARhu6-SWNn-F0HI-ndTb-wa0D-xB0k-cV7ZpZ (last written to /dev/sdc1).
  WARNING: Couldn't find all devices for LV vg/lvol1 while checking used and assumed devices.
  PV /dev/sda1   VG vg              lvm2 [<30.00 GiB / <29.26 GiB free]
  PV [unknown]   VG vg              lvm2 [<30.00 GiB / <29.51 GiB free]
  PV /dev/sdd1   VG vg              lvm2 [<30.00 GiB / <30.00 GiB free]
  PV /dev/sdf1   VG vg              lvm2 [<30.00 GiB / <30.00 GiB free]
  PV /dev/sdg1   VG vg              lvm2 [<30.00 GiB / <30.00 GiB free]
  Total: 6 [<179.98 GiB] / in use: 6 [<179.98 GiB] / in no VG: 0 [0   ]

Jul 16 17:02:49 host-161 kernel: sd 0:0:0:2: rejecting I/O to offline device
Jul 16 17:02:49 host-161 kernel: blk_update_request: I/O error, dev sdc, sector 40 op 0x0:(READ) flags 0x0 phys_seg 9 prio class 0
Jul 16 17:02:49 host-161 kernel: blk_update_request: I/O error, dev sdc, sector 40 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0
Jul 16 17:03:52 host-161 kernel: blk_update_request: I/O error, dev sdc, sector 40 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0

[root@host-161 ~]# lvchange -an vg
  WARNING: Couldn't find device with uuid AARhu6-SWNn-F0HI-ndTb-wa0D-xB0k-cV7ZpZ.
  WARNING: VG vg is missing PV AARhu6-SWNn-F0HI-ndTb-wa0D-xB0k-cV7ZpZ (last written to /dev/sdc1).
  WARNING: Couldn't find all devices for LV vg/lvol1 while checking used and assumed devices.
[root@host-161 ~]# lvs -a -o +devices
  WARNING: Couldn't find device with uuid AARhu6-SWNn-F0HI-ndTb-wa0D-xB0k-cV7ZpZ.
  WARNING: VG vg is missing PV AARhu6-SWNn-F0HI-ndTb-wa0D-xB0k-cV7ZpZ (last written to /dev/sdc1).
  LV        VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices      
  [lvmlock] global -wi-ao---- 256.00m                                                     /dev/sdb1(0) 
  [lvmlock] vg     -wi-ao---- 256.00m                                                     /dev/sda1(0) 
  lvol0     vg     -wi------- 500.00m                                                     /dev/sda1(64)
  lvol1     vg     -wi-----p- 500.00m                                                     [unknown](0)

# NODE 2 is able to activate even without a lockdrop on node 1
[root@host-162 ~]# lvchange -aye vg/lvol1
[root@host-162 ~]# 




# Third scenario, the device to be failed IS also the [lvmlock] device
[root@host-161 ~]# lvchange -aye vg/lvol0
[root@host-161 ~]# lvs -a -o +devices
  LV        VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices      
  [lvmlock] global -wi-ao---- 256.00m                                                     /dev/sdb1(0) 
  [lvmlock] vg     -wi-ao---- 256.00m                                                     /dev/sda1(0) 
  lvol0     vg     -wi-a----- 500.00m                                                     /dev/sda1(64)
  lvol1     vg     -wi------- 500.00m                                                     /dev/sdc1(0) 
[root@host-162 ~]#  lvchange -aye vg/lvol0
  LV locked by other host: vg/lvol0
  Failed to lock logical volume vg/lvol0.

[root@host-161 ~]# echo "offline" > /sys/block/sda/device/state

Jul 16 17:24:58 host-161 kernel: blk_update_request: I/O error, dev sda, sector 2088 op 0x0:(READ) flags 0x0 phys_seg 48 prio class 0
Jul 16 17:24:58 host-161 sanlock[1426]: 2021-07-16 17:24:58 1116 [1471]: s2 renewal error -5 delta_length 0 last_success 1056
Jul 16 17:24:58 host-161 sanlock[1426]: 2021-07-16 17:24:58 1116 [1426]: s2 check_our_lease warning 60 last_success 1056
Jul 16 17:24:58 host-161 sanlock[1426]: 2021-07-16 17:24:58 1116 [1471]: s2 delta_renew read rv -5 offset 0 /dev/mapper/vg-lvmlock
Jul 16 17:24:58 host-161 kernel: blk_update_request: I/O error, dev sda, sector 2088 op 0x0:(READ) flags 0x0 phys_seg 48 prio class 0
Jul 16 17:24:58 host-161 sanlock[1426]: 2021-07-16 17:24:58 1116 [1471]: s2 renewal error -5 delta_length 0 last_success 1056
Jul 16 17:24:59 host-161 kernel: blk_update_request: I/O error, dev sda, sector 2088 op 0x0:(READ) flags 0x0 phys_seg 48 prio class 0
Jul 16 17:24:59 host-161 sanlock[1426]: 2021-07-16 17:24:59 1117 [1471]: s2 delta_renew read rv -5 offset 0 /dev/mapper/vg-lvmlock
Jul 16 17:24:59 host-161 sanlock[1426]: 2021-07-16 17:24:59 1117 [1471]: s2 renewal error -5 delta_length 0 last_success 1056
Jul 16 17:24:59 host-161 sanlock[1426]: 2021-07-16 17:24:59 1117 [1426]: s2 check_our_lease warning 61 last_success 1056
Jul 16 17:24:59 host-161 sanlock[1426]: 2021-07-16 17:24:59 1117 [1471]: s2 delta_renew read rv -5 offset 0 /dev/mapper/vg-lvmlock
Jul 16 17:24:59 host-161 kernel: blk_update_request: I/O error, dev sda, sector 2088 op 0x0:(READ) flags 0x0 phys_seg 48 prio class 0
Jul 16 17:24:59 host-161 sanlock[1426]: 2021-07-16 17:24:59 1117 [1471]: s2 renewal error -5 delta_length 0 last_success 1056
Jul 16 17:25:00 host-161 kernel: blk_update_request: I/O error, dev sda, sector 2088 op 0x0:(READ) flags 0x0 phys_seg 48 prio class 0
Jul 16 17:25:00 host-161 sanlock[1426]: 2021-07-16 17:25:00 1118 [1471]: s2 delta_renew read rv -5 offset 0 /dev/mapper/vg-lvmlock
Jul 16 17:25:00 host-161 sanlock[1426]: 2021-07-16 17:25:00 1118 [1471]: s2 renewal error -5 delta_length 0 last_success 1056
Jul 16 17:25:00 host-161 sanlock[1426]: 2021-07-16 17:25:00 1118 [1426]: s2 check_our_lease warning 62 last_success 1056

# Wait a few seconds...
[root@host-161 ~]# lvmlockctl --drop vg

Jul 16 17:25:00 host-161 lvmlockctl[1497]: Dropping locks for VG vg.



[root@host-161 ~]# lvchange -an vg/lvol0
  WARNING: Couldn't find device with uuid wgvdke-18Cc-fY1s-cvyp-RAsI-NU7R-N6Kyrv.
  WARNING: VG vg is missing PV wgvdke-18Cc-fY1s-cvyp-RAsI-NU7R-N6Kyrv (last written to /dev/sda1).
  WARNING: Couldn't find all devices for LV vg/lvol0 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV vg/lvmlock while checking used and assumed devices.
  Reading VG vg without a lock.
  LV vg/lvol0 lock failed: lockspace is inactive
  Failed to unlock logical volume vg/lvol0.
[root@host-161 ~]# lvs -a -o +devices
  WARNING: Couldn't find device with uuid wgvdke-18Cc-fY1s-cvyp-RAsI-NU7R-N6Kyrv.
  WARNING: VG vg is missing PV wgvdke-18Cc-fY1s-cvyp-RAsI-NU7R-N6Kyrv (last written to /dev/sda1).
  WARNING: Couldn't find all devices for LV vg/lvmlock while checking used and assumed devices.
  Reading VG vg without a lock.
  LV        VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices      
  [lvmlock] global -wi-ao---- 256.00m                                                     /dev/sdb1(0) 
  [lvmlock] vg     -wi-a---p- 256.00m                                                     [unknown](0) 
  lvol0     vg     -wi-----p- 500.00m                                                     [unknown](64)
  lvol1     vg     -wi------- 500.00m                                                     /dev/sdc1(0) 

# NODE 2
# Wait awhile
[root@host-162 ~]#  lvchange -aye vg/lvol0
  LV locked by other host: vg/lvol0
  Failed to lock logical volume vg/lvol0.

# Wait awhile longer, and the activation works
[root@host-162 ~]# 
[root@host-162 ~]# 
[root@host-162 ~]#  lvchange -aye vg/lvol0

Comment 13 David Teigland 2021-07-19 16:26:23 UTC
I think there are three requirements to get that message; it's not clear if they were all true for one of the scenarios above.
- the VG must have an active LV
- offline the PV that holds the hidden lvmlock LV
- don't run lvmlockctl drop until after the messages appear

In real world scenarios you don't know when a disk failure is going to happen, so when one happens, you first notice it by seeing the messages.  In response, the user deactivates LVs in the VG, and after this runs lvmlockctl --drop to tell lvmlockd that the LVs are safely shut down.


[root@null-03 ~]# lvs -a bbsan -o+devices
  LV        VG    Attr       LSize   Devices     
  [lvmlock] bbsan -wi-ao---- 256.00m /dev/sdb(0) 
  lvol0     bbsan -wi-a-----   4.00m /dev/sdb(64)
  lvol1     bbsan -wi-a-----   4.00m /dev/sdb(65)
[root@null-03 ~]# echo "offline" > /sys/block/sdb/device/state
[root@null-03 ~]# pvs
  Global lock failed: error -221.
[root@null-03 ~]# 
Broadcast message from systemd-journald@null-03 (Mon 2021-07-19 04:52:46 CDT):

lvmlockctl[6506]: lvmlockd lost access to locks in VG bbsan.

Broadcast message from systemd-journald@null-03 (Mon 2021-07-19 04:52:46 CDT):

lvmlockctl[6506]: Immediately deactivate LVs in VG bbsan.

Broadcast message from systemd-journald@null-03 (Mon 2021-07-19 04:52:46 CDT):

lvmlockctl[6506]: Once VG is unused, run lvmlockctl --drop bbsan.

Message from syslogd@null-03 at Jul 19 04:52:46 ...
 lvmlockctl: lvmlockd lost access to locks in VG bbsan.

Message from syslogd@null-03 at Jul 19 04:52:46 ...
 lvmlockctl: Immediately deactivate LVs in VG bbsan.

Message from syslogd@null-03 at Jul 19 04:52:46 ...
 lvmlockctl: Once VG is unused, run lvmlockctl --drop bbsan.


[root@null-03 ~]# journalctl  | grep lvmlockctl                                 
Jul 19 04:52:46 null-03 lvmlockctl[6506]: lvmlockd lost access to locks in VG bbsan.                                                                            
Jul 19 04:52:46 null-03 lvmlockctl[6506]: Immediately deactivate LVs in VG bbsan.                                                                               
Jul 19 04:52:46 null-03 lvmlockctl[6506]: Once VG is unused, run lvmlockctl --drop bbsan.

Comment 14 Corey Marthaler 2021-07-22 16:16:03 UTC
We learned the discrepancies we experienced were due to selinux bug 1985000. With selinux set to permissive, i am now able to verify the correct behavior.

kernel-4.18.0-310.el8    BUILT: Thu May 27 14:24:00 CDT 2021
lvm2-2.03.12-5.el8    BUILT: Tue Jul 13 11:50:03 CDT 2021
lvm2-libs-2.03.12-5.el8    BUILT: Tue Jul 13 11:50:03 CDT 2021
sanlock-3.8.4-1.el8    BUILT: Tue Jun  1 16:16:52 CDT 2021
sanlock-lib-3.8.4-1.el8    BUILT: Tue Jun  1 16:16:52 CDT 2021


[root@host-162 ~]# getenforce
Permissive
[root@host-162 ~]# echo "offline" > /sys/block/sdb/device/state
[root@host-162 ~]# pvs
  Error reading device /dev/sdb1 at 0 length 4096.
  VG vg lock skipped: error -221
  WARNING: Couldn't find device with uuid VzqX0V-SNRI-CzkW-VGU6-rozY-BWmN-InTaWH.
  WARNING: VG vg is missing PV VzqX0V-SNRI-CzkW-VGU6-rozY-BWmN-InTaWH (last written to /dev/sdb1).
  WARNING: Couldn't find all devices for LV vg/lvol0 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV vg/lvol1 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV vg/lvol2 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV vg/lvmlock while checking used and assumed devices.
  Reading VG vg without a lock.
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/sda1  vg     lvm2 a--  <15.00g <15.00g
  /dev/sda2         lvm2 ---  <15.00g <15.00g
  /dev/sdc1  vg     lvm2 a--  <15.00g <15.00g
  /dev/sdc2         lvm2 ---  <15.00g <15.00g
  /dev/sdd1         lvm2 ---  <15.00g <15.00g
  /dev/sdd2  global lvm2 a--  <15.00g <14.75g
  /dev/sde1  vg     lvm2 a--  <15.00g <15.00g
  /dev/sde2         lvm2 ---  <15.00g <15.00g
  /dev/sdf1  vg     lvm2 a--  <15.00g <15.00g
  /dev/sdg1  vg     lvm2 a--  <15.00g <15.00g
  /dev/sdh1  vg     lvm2 a--  <15.00g <15.00g
  /dev/sdh2         lvm2 ---  <15.00g <15.00g
  [unknown]  vg     lvm2 a-m  <15.00g  14.73g
[root@host-162 ~]# 
Broadcast message from systemd-journald.lab.msp.redhat.com (Thu 2021-07-22 11:08:18 CDT):

lvmlockctl[1554]: lvmlockd lost access to locks in VG vg.


Broadcast message from systemd-journald.lab.msp.redhat.com (Thu 2021-07-22 11:08:18 CDT):

lvmlockctl[1554]: Immediately deactivate LVs in VG vg.


Broadcast message from systemd-journald.lab.msp.redhat.com (Thu 2021-07-22 11:08:18 CDT):

lvmlockctl[1554]: Once VG is unused, run lvmlockctl --drop vg.


Message from syslogd@host-162 at Jul 22 11:08:18 ...
 lvmlockctl[1554]:lvmlockd lost access to locks in VG vg.

Message from syslogd@host-162 at Jul 22 11:08:18 ...
 lvmlockctl[1554]:Immediately deactivate LVs in VG vg.

Message from syslogd@host-162 at Jul 22 11:08:18 ...
 lvmlockctl[1554]:Once VG is unused, run lvmlockctl --drop vg.

[root@host-162 ~]# lvchange -an vg
  VG vg lock skipped: storage failed for sanlock leases
  WARNING: Couldn't find device with uuid VzqX0V-SNRI-CzkW-VGU6-rozY-BWmN-InTaWH.
  WARNING: VG vg is missing PV VzqX0V-SNRI-CzkW-VGU6-rozY-BWmN-InTaWH (last written to /dev/sdb1).
  WARNING: Couldn't find all devices for LV vg/lvol0 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV vg/lvol1 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV vg/lvol2 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV vg/lvmlock while checking used and assumed devices.
  Reading VG vg without a lock.
[root@host-162 ~]#  lvmlockctl --drop vg


# Other node
[root@host-161 ~]# lvchange -aye vg/lvol0
[root@host-161 ~]# lvs -a -o +devices
  LV        VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices      
  [lvmlock] global -wi-ao---- 256.00m                                                     /dev/sdd2(0) 
  [lvmlock] vg     -wi-ao---- 256.00m                                                     /dev/sdb1(0) 
  lvol0     vg     -wi-a-----   4.00m                                                     /dev/sdb1(64)
  lvol1     vg     -wi-------   4.00m                                                     /dev/sdb1(65)
  lvol2     vg     -wi-------   4.00m                                                     /dev/sdb1(66)

Comment 18 Corey Marthaler 2021-08-05 19:06:46 UTC
Marking this VERIFIED with the latest rpms, with the caveat that was with selinux turned off as selinux bug 1985000 blocks this feature. The automated script was verified to properly drop locks automatically in order to avoid being shut down, and to allow the other nodes in the cluster to exclusively activate shared vg.


kernel-4.18.0-310.el8    BUILT: Thu May 27 14:24:00 CDT 2021
lvm2-2.03.12-6.el8    BUILT: Tue Aug  3 07:23:05 CDT 2021
lvm2-libs-2.03.12-6.el8    BUILT: Tue Aug  3 07:23:05 CDT 2021
sanlock-3.8.4-1.el8    BUILT: Tue Jun  1 16:16:52 CDT 2021
sanlock-lib-3.8.4-1.el8    BUILT: Tue Jun  1 16:16:52 CDT 2021


[root@host-161 ~]# cat /usr/sbin/my_vg_kill_script.sh
#!/bin/bash
VG=$1
# replace dm table with the error target for top level LVs
dmsetup wipe_table -S "uuid=~LVM && vgname=$VG && lv_layer=\"\""
# check that the error target is in place
dmsetup table -c -S "uuid=~LVM && vgname=$VG && lv_layer=\"\"" |grep -vw error
if [[ $? -ne 0 ]] ; then
  exit 0
fi
exit 1

[root@host-161 ~]# grep lvmlockctl_kill_command /etc/lvm/lvm.conf
        # Configuration option global/lvmlockctl_kill_command.
        # lvmlockctl_kill_command = ""
        lvmlockctl_kill_command="/usr/sbin/my_vg_kill_script.sh"

[root@host-161 ~]# lvs -a -o +devices
  LV        VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices      
  [lvmlock] global -wi-ao---- 256.00m                                                     /dev/sdd2(0) 
  [lvmlock] vg     -wi-ao---- 256.00m                                                     /dev/sdb1(0) 
  lvol0     vg     -wi-a-----   4.00m                                                     /dev/sdb1(64)
  lvol1     vg     -wi-a-----   4.00m                                                     /dev/sdb1(65)
  lvol2     vg     -wi-a-----   4.00m                                                     /dev/sdb1(66)

[root@host-161 ~]# echo "offline" > /sys/block/sdb/device/state
[root@host-161 ~]# pvs
  Error reading device /dev/sdb1 at 0 length 4096.
  VG vg lock skipped: error -221
  WARNING: Couldn't find device with uuid VzqX0V-SNRI-CzkW-VGU6-rozY-BWmN-InTaWH.
  WARNING: VG vg is missing PV VzqX0V-SNRI-CzkW-VGU6-rozY-BWmN-InTaWH (last written to /dev/sdb1).
  WARNING: Couldn't find all devices for LV vg/lvol0 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV vg/lvol1 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV vg/lvol2 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV vg/lvmlock while checking used and assumed devices.
  Reading VG vg without a lock.
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/sda1  vg     lvm2 a--  <15.00g <15.00g
  /dev/sda2         lvm2 ---  <15.00g <15.00g
  /dev/sdc1  vg     lvm2 a--  <15.00g <15.00g
  /dev/sdc2         lvm2 ---  <15.00g <15.00g
  /dev/sdd1         lvm2 ---  <15.00g <15.00g
  /dev/sdd2  global lvm2 a--  <15.00g <14.75g
  /dev/sde1  vg     lvm2 a--  <15.00g <15.00g
  /dev/sde2         lvm2 ---  <15.00g <15.00g
  /dev/sdf1  vg     lvm2 a--  <15.00g <15.00g
  /dev/sdg1  vg     lvm2 a--  <15.00g <15.00g
  /dev/sdh1  vg     lvm2 a--  <15.00g <15.00g
  /dev/sdh2         lvm2 ---  <15.00g <15.00g
  [unknown]  vg     lvm2 a-m  <15.00g  14.73g

[root@host-161 ~]# 
Broadcast message from systemd-journald.lab.msp.redhat.com (Thu 2021-08-05 13:40:51 CDT):

lvmlockctl[1604]: lvmlockd lost access to locks in VG vg.


Message from syslogd@host-161 at Aug  5 13:40:51 ...
 lvmlockctl[1604]:lvmlockd lost access to locks in VG vg.

Aug  5 13:40:51 host-161 sanlock[1424]: 2021-08-05 13:40:51 4034 [1424]: s2 check_our_lease failed 80
Aug  5 13:40:51 host-161 sanlock[1424]: 2021-08-05 13:40:51 4034 [1424]: s2 kill 1453 sig 100 count 1
Aug  5 13:40:51 host-161 lvmlockctl[1604]: lvmlockd lost access to locks in VG vg.
Aug  5 13:40:51 host-161 sanlock[1424]: 2021-08-05 13:40:51 4034 [1543]: s2 delta_renew read rv -5 offset 0 /dev/mapper/vg-lvmlock
Aug  5 13:40:51 host-161 kernel: blk_update_request: I/O error, dev sdb, sector 2088 op 0x0:(READ) flags 0x0 phys_seg 39 prio class 0
Aug  5 13:40:51 host-161 sanlock[1424]: 2021-08-05 13:40:51 4034 [1543]: s2 renewal error -5 delta_length 0 last_success 3954
Aug  5 13:40:51 host-161 kernel: Buffer I/O error on dev dm-4, logical block 1008, async page read
Aug  5 13:40:51 host-161 kernel: Buffer I/O error on dev dm-3, logical block 1008, async page read
Aug  5 13:40:51 host-161 kernel: Buffer I/O error on dev dm-2, logical block 1008, async page read
Aug  5 13:40:51 host-161 kernel: Buffer I/O error on dev dm-1, logical block 65520, async page read
Aug  5 13:40:52 host-161 lvmlockctl[1604]: Successful VG vg kill command /usr/sbin/my_vg_kill_script.sh vg
Aug  5 13:40:52 host-161 wdmd[1438]: /dev/watchdog0 reopen
Aug  5 13:40:52 host-161 sanlock[1424]: 2021-08-05 13:40:52 4035 [1424]: s2 all pids clear
Aug  5 13:40:52 host-161 lvmlockd[1453]: 1628188852 S lvm_vg rem_lockspace_san error -115
Aug  5 13:40:52 host-161 sanlock[1424]: 2021-08-05 13:40:52 4035 [1543]: s2 delta_renew read rv -5 offset 0 /dev/mapper/vg-lvmlock
Aug  5 13:40:52 host-161 sanlock[1424]: 2021-08-05 13:40:52 4035 [1543]: s2 renewal error -5 delta_length 0 last_success 3954
Aug  5 13:40:52 host-161 kernel: Buffer I/O error on dev dm-1, logical block 65520, async page read


# Other node
[root@host-162 ~]# lvchange -aye vg
[root@host-162 ~]# lvs -a -o +devices
  LV        VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices      
  [lvmlock] global -wi-ao---- 256.00m                                                     /dev/sdd2(0) 
  [lvmlock] vg     -wi-ao---- 256.00m                                                     /dev/sdb1(0) 
  lvol0     vg     -wi-a-----   4.00m                                                     /dev/sdb1(64)
  lvol1     vg     -wi-a-----   4.00m                                                     /dev/sdb1(65)
  lvol2     vg     -wi-a-----   4.00m                                                     /dev/sdb1(66)

Comment 21 errata-xmlrpc 2021-11-09 19:45:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (lvm2 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4431


Note You need to log in before you can comment on or make changes to this bug.