Bug 918944

Summary: Deadlock in lk calls of stripe subvolume
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: stripeAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-22 15:46:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pranith Kumar K 2013-03-07 09:38:37 UTC
Description of problem:
When the following command is run at some point there will be a dead lock in setlkw calls at the bricks because of order of arrival.

execute the following command.
while tests/bugs/bug-905864.t; do :; done
after an hour or so the test hangs and when we observe the statedump we see that in two stripe subolumes the order of arrival of setlkws is swapped which causes a dead lock.

On brick-5 here are the locks:
[xlator.features.locks.patchy-locks.inode]
path=/file1
mandatory=0
posixlk-count=2
posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=2, pid = 32312, owner=c753d2d946fd5b45, transport=0x150cf30, , granted at Thu Mar  7 14:03:57 2013

posixlk.posixlk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=2, pid = 32309, owner=9a4f25833ec8c200, transport=0x150d5b0, , blocked at Thu Mar  7 14:03:57 2013

On brick7 these are the locks:
[xlator.features.locks.patchy-locks.inode]
path=/file1
mandatory=0
posixlk-count=2
posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=2, pid = 32309, owner=9a4f25833ec8c200, transport=0xe82b90, , granted at Thu Mar  7 14:03:57 2013

posixlk.posixlk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=2, pid = 32312, owner=c753d2d946fd5b45, transport=0xe84480, , blocked at Thu Mar  7 14:03:57 2013

As we can see this causes a dead lock.

Version-Release number of selected component (if applicable):


How reproducible:
1/1

Steps to Reproduce:
1.As per the description
2.
3.
  
Actual results:
Tests hung

Expected results:
Tests should not hang.

Additional info:

Comment 1 Kaleb KEITHLEY 2015-10-22 15:46:38 UTC
because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.