RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1194667 - LVM locks up due to orphaned locks with parallel pvs commands
Summary: LVM locks up due to orphaned locks with parallel pvs commands
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: David Teigland
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-20 13:37 UTC by Nenad Peric
Modified: 2021-09-03 12:40 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-20 21:01:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Nenad Peric 2015-02-20 13:37:10 UTC
Description of problem:

If you have a cluster of, say, three nodes and you run pvs commands on them simultaneously there is some timing issue which causes an orphaned lock to remain thus preventing any other LVM commands from executing afterwards.

Version-Release number of selected component (if applicable):


How reproducible:

Everytime

Steps to Reproduce:
Create clustered VG
Create a normal linear LV (should be enough)

do a pvs -a loop on all nodes:

for i in {1..600};
do
    pvs -a 2>&1 > /dev/null;
done &

Actual results:

[root@tardis-01 ~]#   Error locking on node 2: Command timed out
  Can't get lock for #orphans_lvm2
  Cannot process volume group #orphans_lvm2


Expected results:

locks should not be orphaned and cause the cluster-wide lockup of LVM commands.

Additional info:

Here are the dlm_tool outputs and locking debug from all the nodes:

Node 1:

[root@tardis-01 ~]# dlm_tool lockdump clvmd && grep "" /sys/kernel/debug/dlm/*
id 00000003 gr PR rq IV pid 12873 master 3 "P_#global"
id 00000002 gr CR rq IV pid 3982 master 2 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"


/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff881819289a40 Name (len=9) "P_#global"  
/sys/kernel/debug/dlm/clvmd:Local Copy, Master is node 3
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000003 PR Master:     00000003
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff880c1d45d040 Name (len=64) "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"  
/sys/kernel/debug/dlm/clvmd:Local Copy, Master is node 2
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000002 CR Master:     00000003
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd_all:version rsb 1.1 lvb 1.1 lkb 1.1
/sys/kernel/debug/dlm/clvmd_all:rsb ffff881819289a40 3 0 0 0 0 0 9 str P_#global
/sys/kernel/debug/dlm/clvmd_all:lkb 3 3 3 12873 0 0 1 2 3 -1 0 0 0 0 10025950304649 0
/sys/kernel/debug/dlm/clvmd_all:rsb ffff880c1d45d040 2 0 0 0 0 0 64 str LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU
/sys/kernel/debug/dlm/clvmd_all:lkb 2 2 3 3982 0 1 1 2 1 -1 0 0 0 0 8941659034964 0
/sys/kernel/debug/dlm/clvmd_locks:id nodeid remid pid xid exflags flags sts grmode rqmode time_ms r_nodeid r_len r_name
/sys/kernel/debug/dlm/clvmd_locks:3 3 3 12873 0 0 1 2 3 -1 49926820 3 9 "P_#global"
/sys/kernel/debug/dlm/clvmd_locks:2 2 3 3982 0 1 1 2 1 -1 1134218110 2 64 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"


Node 2:

[root@tardis-02 ~]# dlm_tool lockdump clvmd && grep "" /sys/kernel/debug/dlm/*
id 00000004 gr PR rq IV pid 12677 master 3 "P_#global"
id 00000001 gr CR rq IV pid 3953 master 0 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"


/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff88121dd51900 Name (len=9) "P_#global"  
/sys/kernel/debug/dlm/clvmd:Local Copy, Master is node 3
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000004 PR Master:     00000002
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff880614a9c140 Name (len=64) "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"  
/sys/kernel/debug/dlm/clvmd:Master Copy
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000003 CR Remote:   1 00000002
/sys/kernel/debug/dlm/clvmd:00000002 CR Remote:   3 00000001
/sys/kernel/debug/dlm/clvmd:00000001 CR
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd_all:version rsb 1.1 lvb 1.1 lkb 1.1
/sys/kernel/debug/dlm/clvmd_all:rsb ffff88121dd51900 3 0 0 0 0 0 9 str P_#global
/sys/kernel/debug/dlm/clvmd_all:lkb 4 3 2 12677 0 0 1 2 3 -1 0 0 0 0 10053835634830 0
/sys/kernel/debug/dlm/clvmd_all:rsb ffff880614a9c140 0 0 0 0 0 0 64 str LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU
/sys/kernel/debug/dlm/clvmd_all:lkb 3 1 2 3982 0 1 10001 2 1 -1 0 0 0 0 8969539437904 0
/sys/kernel/debug/dlm/clvmd_all:lkb 2 3 1 3925 0 1 10001 2 1 -1 0 0 0 0 8969534962051 0
/sys/kernel/debug/dlm/clvmd_all:lkb 1 0 0 3953 0 1 1 2 1 -1 0 0 0 0 8969534794751 0
/sys/kernel/debug/dlm/clvmd_locks:id nodeid remid pid xid exflags flags sts grmode rqmode time_ms r_nodeid r_len r_name
/sys/kernel/debug/dlm/clvmd_locks:4 3 2 12677 0 0 1 2 3 -1 49929956 3 9 "P_#global"
/sys/kernel/debug/dlm/clvmd_locks:3 1 2 3982 0 1 10001 2 1 -1 1134226172 0 64 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"
/sys/kernel/debug/dlm/clvmd_locks:2 3 1 3925 0 1 10001 2 1 -1 1134230650 0 64 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"
/sys/kernel/debug/dlm/clvmd_locks:1 0 0 3953 0 1 1 2 1 -1 1134230819 0 64 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"
[root@tardis-02 ~]# 



Node 3:

[root@tardis-03 ~]# dlm_tool lockdump clvmd && grep "" /sys/kernel/debug/dlm/*
id 00000004 gr PR rq IV pid 10678 master 0 "P_#global"
id 00000001 gr CR rq IV pid 3925 master 2 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"


/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff8812171e8280 Name (len=9) "P_#global"  
/sys/kernel/debug/dlm/clvmd:Master Copy
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000003 PR Remote:   1 00000003
/sys/kernel/debug/dlm/clvmd:00000002 PR Remote:   2 00000004
/sys/kernel/debug/dlm/clvmd:00000004 PR
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd:
/sys/kernel/debug/dlm/clvmd:Resource ffff88061deb4140 Name (len=64) "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"  
/sys/kernel/debug/dlm/clvmd:Local Copy, Master is node 2
/sys/kernel/debug/dlm/clvmd:Granted Queue
/sys/kernel/debug/dlm/clvmd:00000001 CR Master:     00000002
/sys/kernel/debug/dlm/clvmd:Conversion Queue
/sys/kernel/debug/dlm/clvmd:Waiting Queue
/sys/kernel/debug/dlm/clvmd_all:version rsb 1.1 lvb 1.1 lkb 1.1
/sys/kernel/debug/dlm/clvmd_all:rsb ffff8812171e8280 0 0 0 0 0 0 9 str P_#global
/sys/kernel/debug/dlm/clvmd_all:lkb 3 1 3 12873 0 0 10001 2 3 -1 0 0 0 0 10157543798535 0
/sys/kernel/debug/dlm/clvmd_all:lkb 2 2 4 12677 0 0 10001 2 3 -1 0 0 0 0 10157540983497 0
/sys/kernel/debug/dlm/clvmd_all:lkb 4 0 0 10678 0 0 1 2 3 -1 0 0 0 0 9436517404230 0
/sys/kernel/debug/dlm/clvmd_all:rsb ffff88061deb4140 2 0 0 0 0 0 64 str LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU
/sys/kernel/debug/dlm/clvmd_all:lkb 1 2 2 3925 0 1 1 2 1 -1 0 0 0 0 9073234139285 0
/sys/kernel/debug/dlm/clvmd_locks:id nodeid remid pid xid exflags flags sts grmode rqmode time_ms r_nodeid r_len r_name
/sys/kernel/debug/dlm/clvmd_locks:3 1 3 12873 0 0 10001 2 3 -1 49927378 0 9 "P_#global"
/sys/kernel/debug/dlm/clvmd_locks:2 2 4 12677 0 0 10001 2 3 -1 49930196 0 9 "P_#global"
/sys/kernel/debug/dlm/clvmd_locks:4 0 0 10678 0 0 1 2 3 -1 770953777 0 9 "P_#global"
/sys/kernel/debug/dlm/clvmd_locks:1 2 2 3925 0 1 1 2 1 -1 1134237061 2 64 "LdWV10hcX8m4af9zdg7M72gSt6keWHSDjduMBz0U4T3R2GL9KFpeaE4cmIMkdWfU"
[root@tardis-03 ~]#

Comment 3 Peter Rajnoha 2015-10-08 13:46:43 UTC
Let's try this with latest builds - lvm2-2.02.130-2.el7+. There have been a few changes in clvmd code since 7.1 (as for me, I can't reproduce this).

Comment 4 Peter Rajnoha 2015-10-08 13:48:04 UTC
...might be related or even same as bug #1263356.

Comment 6 Peter Rajnoha 2016-06-06 12:23:15 UTC
Still reproducible?

Comment 7 Corey Marthaler 2016-06-08 16:49:06 UTC
I added a comment to bug 1263356 that it was still easily reproducible. If bug 1263356 is a dup of this one, then yes, it's still reproducible.

Comment 8 David Teigland 2016-06-30 17:13:40 UTC
The last time this happened we studied it closely and then needed debug info to get any further, so we are waiting to reproduce with debugging.


Note You need to log in before you can comment on or make changes to this bug.