Bug 1194667

Summary: LVM locks up due to orphaned locks with parallel pvs commands
Product: Red Hat Enterprise Linux 7 Reporter: Nenad Peric <nperic>
Component: lvm2Assignee: David Teigland <teigland>
lvm2 sub component: Clustering / clvmd QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: unspecified    
Priority: unspecified CC: agk, cluster-qe, cmarthal, heinzm, jbrassow, msnitzer, nperic, prajnoha, richard, zkabelac
Version: 7.1   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-20 21:01:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nenad Peric 2015-02-20 13:37:10 UTC
Description of problem:

If you have a cluster of, say, three nodes and you run pvs commands on them simultaneously there is some timing issue which causes an orphaned lock to remain thus preventing any other LVM commands from executing afterwards.

Version-Release number of selected component (if applicable):


How reproducible:

Everytime

Steps to Reproduce:
Create clustered VG
Create a normal linear LV (should be enough)

do a pvs -a loop on all nodes:

for i in {1..600};
do
    pvs -a 2>&1 > /dev/null;
done &

Actual results:

[root@tardis-01 ~]#   Error locking on node 2: Command timed out
  Can't get lock for #orphans_lvm2
  Cannot process volume group #orphans_lvm2


Expected results:

locks should not be orphaned and cause the cluster-wide lockup of LVM commands.

Additional info:

Here are the dlm_tool outputs and locking debug from all the nodes:

Node 1:

[root@tardis-01 ~]# dlm_tool lockdump clvmd && grep "" /sys/kernel/debug/dlm/*
id 00000003 gr PR rq IV pid 12873 master 3 "P_#global"
id 00000002 gr CR rq IV pid 3982 master 2 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"


/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff881819289a40 Name (len=9) "P_#global"  
/sys/kernel/debug/dlm/clvmd:Local Copy, Master is node 3
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000003 PR Master:     00000003
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff880c1d45d040 Name (len=64) "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"  
/sys/kernel/debug/dlm/clvmd:Local Copy, Master is node 2
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000002 CR Master:     00000003
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd_all:version rsb 1.1 lvb 1.1 lkb 1.1
/sys/kernel/debug/dlm/clvmd_all:rsb ffff881819289a40 3 0 0 0 0 0 9 str P_#global
/sys/kernel/debug/dlm/clvmd_all:lkb 3 3 3 12873 0 0 1 2 3 -1 0 0 0 0 10025950304649 0
/sys/kernel/debug/dlm/clvmd_all:rsb ffff880c1d45d040 2 0 0 0 0 0 64 str LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU
/sys/kernel/debug/dlm/clvmd_all:lkb 2 2 3 3982 0 1 1 2 1 -1 0 0 0 0 8941659034964 0
/sys/kernel/debug/dlm/clvmd_locks:id nodeid remid pid xid exflags flags sts grmode rqmode time_ms r_nodeid r_len r_name
/sys/kernel/debug/dlm/clvmd_locks:3 3 3 12873 0 0 1 2 3 -1 49926820 3 9 "P_#global"
/sys/kernel/debug/dlm/clvmd_locks:2 2 3 3982 0 1 1 2 1 -1 1134218110 2 64 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"


Node 2:

[root@tardis-02 ~]# dlm_tool lockdump clvmd && grep "" /sys/kernel/debug/dlm/*
id 00000004 gr PR rq IV pid 12677 master 3 "P_#global"
id 00000001 gr CR rq IV pid 3953 master 0 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"


/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff88121dd51900 Name (len=9) "P_#global"  
/sys/kernel/debug/dlm/clvmd:Local Copy, Master is node 3
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000004 PR Master:     00000002
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff880614a9c140 Name (len=64) "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"  
/sys/kernel/debug/dlm/clvmd:Master Copy
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000003 CR Remote:   1 00000002
/sys/kernel/debug/dlm/clvmd:00000002 CR Remote:   3 00000001
/sys/kernel/debug/dlm/clvmd:00000001 CR
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd_all:version rsb 1.1 lvb 1.1 lkb 1.1
/sys/kernel/debug/dlm/clvmd_all:rsb ffff88121dd51900 3 0 0 0 0 0 9 str P_#global
/sys/kernel/debug/dlm/clvmd_all:lkb 4 3 2 12677 0 0 1 2 3 -1 0 0 0 0 10053835634830 0
/sys/kernel/debug/dlm/clvmd_all:rsb ffff880614a9c140 0 0 0 0 0 0 64 str LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU
/sys/kernel/debug/dlm/clvmd_all:lkb 3 1 2 3982 0 1 10001 2 1 -1 0 0 0 0 8969539437904 0
/sys/kernel/debug/dlm/clvmd_all:lkb 2 3 1 3925 0 1 10001 2 1 -1 0 0 0 0 8969534962051 0
/sys/kernel/debug/dlm/clvmd_all:lkb 1 0 0 3953 0 1 1 2 1 -1 0 0 0 0 8969534794751 0
/sys/kernel/debug/dlm/clvmd_locks:id nodeid remid pid xid exflags flags sts grmode rqmode time_ms r_nodeid r_len r_name
/sys/kernel/debug/dlm/clvmd_locks:4 3 2 12677 0 0 1 2 3 -1 49929956 3 9 "P_#global"
/sys/kernel/debug/dlm/clvmd_locks:3 1 2 3982 0 1 10001 2 1 -1 1134226172 0 64 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"
/sys/kernel/debug/dlm/clvmd_locks:2 3 1 3925 0 1 10001 2 1 -1 1134230650 0 64 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"
/sys/kernel/debug/dlm/clvmd_locks:1 0 0 3953 0 1 1 2 1 -1 1134230819 0 64 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"
[root@tardis-02 ~]# 



Node 3:

[root@tardis-03 ~]# dlm_tool lockdump clvmd && grep "" /sys/kernel/debug/dlm/*
id 00000004 gr PR rq IV pid 10678 master 0 "P_#global"
id 00000001 gr CR rq IV pid 3925 master 2 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"


/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff8812171e8280 Name (len=9) "P_#global"  
/sys/kernel/debug/dlm/clvmd:Master Copy
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000003 PR Remote:   1 00000003
/sys/kernel/debug/dlm/clvmd:00000002 PR Remote:   2 00000004
/sys/kernel/debug/dlm/clvmd:00000004 PR
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff88061deb4140 Name (len=64) "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"  
/sys/kernel/debug/dlm/clvmd:Local Copy, Master is node 2
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000001 CR Master:     00000002
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd_all:version rsb 1.1 lvb 1.1 lkb 1.1
/sys/kernel/debug/dlm/clvmd_all:rsb ffff8812171e8280 0 0 0 0 0 0 9 str P_#global
/sys/kernel/debug/dlm/clvmd_all:lkb 3 1 3 12873 0 0 10001 2 3 -1 0 0 0 0 10157543798535 0
/sys/kernel/debug/dlm/clvmd_all:lkb 2 2 4 12677 0 0 10001 2 3 -1 0 0 0 0 10157540983497 0
/sys/kernel/debug/dlm/clvmd_all:lkb 4 0 0 10678 0 0 1 2 3 -1 0 0 0 0 9436517404230 0
/sys/kernel/debug/dlm/clvmd_all:rsb ffff88061deb4140 2 0 0 0 0 0 64 str LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU
/sys/kernel/debug/dlm/clvmd_all:lkb 1 2 2 3925 0 1 1 2 1 -1 0 0 0 0 9073234139285 0
/sys/kernel/debug/dlm/clvmd_locks:id nodeid remid pid xid exflags flags sts grmode rqmode time_ms r_nodeid r_len r_name
/sys/kernel/debug/dlm/clvmd_locks:3 1 3 12873 0 0 10001 2 3 -1 49927378 0 9 "P_#global"
/sys/kernel/debug/dlm/clvmd_locks:2 2 4 12677 0 0 10001 2 3 -1 49930196 0 9 "P_#global"
/sys/kernel/debug/dlm/clvmd_locks:4 0 0 10678 0 0 1 2 3 -1 770953777 0 9 "P_#global"
/sys/kernel/debug/dlm/clvmd_locks:1 2 2 3925 0 1 1 2 1 -1 1134237061 2 64 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"
[root@tardis-03 ~]#

Comment 3 Peter Rajnoha 2015-10-08 13:46:43 UTC
Let's try this with latest builds - lvm2-2.02.130-2.el7+. There have been a few changes in clvmd code since 7.1 (as for me, I can't reproduce this).

Comment 4 Peter Rajnoha 2015-10-08 13:48:04 UTC
...might be related or even same as bug #1263356.

Comment 6 Peter Rajnoha 2016-06-06 12:23:15 UTC
Still reproducible?

Comment 7 Corey Marthaler 2016-06-08 16:49:06 UTC
I added a comment to bug 1263356 that it was still easily reproducible. If bug 1263356 is a dup of this one, then yes, it's still reproducible.

Comment 8 David Teigland 2016-06-30 17:13:40 UTC
The last time this happened we studied it closely and then needed debug info to get any further, so we are waiting to reproduce with debugging.