Bug 1008173

Summary: Running a second gluster command from the same node clears locks held by the first gluster command, even before the first command has completed execution
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Avra Sengupta <asengupt>
Component: glusterfsAssignee: Avra Sengupta <asengupt>
Status: CLOSED ERRATA QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: amarts, gluster-bugs, kparthas, shaines, vagarwal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.35rhs-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1008172 Environment:
Last Closed: 2013-11-27 15:38:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1008172    
Bug Blocks:    

Description Avra Sengupta 2013-09-15 13:09:27 UTC
+++ This bug was initially created as a clone of Bug #1008172 +++

Description of problem:

While a gluster command holding lock is in execution,
any other gluster command which tries to run will fail to
acquire the lock. As a result command#2 will follow the
cleanup code flow, which also includes unlocking the held
locks. As both the commands are run from the same node,
command#2 will end up releasing the locks held by command#1
even before command#1 reaches completion.


Version-Release number of selected component (if applicable):


How reproducible:
Everytime


Steps to Reproduce:
1. Make a gluster command take long time to execute (put in a hack to make it call a script and make the script sleep for 2-3 mins)
2. Meanwhile run another gluster command from the same node. This command will fail to acquire locks but end up releasing the locks already held. 
3. Now command#1 is still in execution, and the locks are released. You can run another command parallely to the first gluster command (still in execution)

Actual results:
Second gluster transaction from the same node releases the locks held by another transaction.


Expected results:
The locks should only be unlocked from the same transaction.
Additional info:

Comment 3 SATHEESARAN 2013-10-16 02:00:46 UTC
I was trying to verify this bug with glusterfs-3.4.0.34rhs-1, and got glusterd crashing.

After looking in to logs, I could understand this is manifestation of bug, https://bugzilla.redhat.com/show_bug.cgi?id=1018043

So this bug also depends on, https://bugzilla.redhat.com/show_bug.cgi?id=1018043

This bug could not be verified, unless the fix for, https://bugzilla.redhat.com/show_bug.cgi?id=1018043 gets in.

Steps done to verify the bug
============================
1. Created a distributed-replicate volume of 2X2
2. Started the volume
3. NFS mounted the volume in RHEL 6.4
4. I was creating lots of files (i.e) merely 'touch'ing it
(i.e) for i in {1..100000}; do touch file${i}; done
5. While step 4, was in progress, from RHS Node, did gluster volume status for inode
(i.e) gluster volume status <vol-name> inode
6. 'gluster status inode' was taking sometime and I issued 'gluster volume status'
command from the same node.

Result - glusterd crashed with error message, as captured below

[Tue Oct 15 18:42:57 UTC 2013 root.37.170:~ ] # gluster volume status distrepvol inode
Connection failed. Please check if gluster daemon is operational.

Snip of glusterd crash in glusterd log
======================================

[2013-10-15 18:44:18.687354] E [glusterd-utils.c:149:glusterd_lock] 0-management: Unable to get lock for uuid: e9c5334f-9522-4d50-946c-aeb15a6166db, lock held by: e9c5334f-9522-4d50-946c-aeb15a6166db
[2013-10-15 18:44:18.687402] E [glusterd-syncop.c:1202:gd_sync_task_begin] 0-management: Unable to acquire lock
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-10-15 18:44:18configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.34rhs
/lib64/libc.so.6[0x3c5d232960]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(gd_unlock_op_phase+0xae)[0x7fa44c87ffce]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(gd_sync_task_begin+0xdf)[0x7fa44c880bef]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x3b)[0x7fa44c880f0b]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(__glusterd_handle_status_volume+0x14a)[0x7fa44c80e4fa]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7fa44c80ea7f]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x3a63a49a72]
/lib64/libc.so.6[0x3c5d243bb0]
---------

Expected
=========
1. When there is lock held already, it should throw, "Another transaction is in progress"
2. The attempt to acquire a lock, when its already held by another op, should not unlock the previously held lock

Conclusion
==========
Since glusterd crashed against expectation, marking this bug as FailedQA

Comment 5 SATHEESARAN 2013-10-18 14:01:49 UTC
VERIFIED with glusterfs-3.4.0.35rhs-1

Steps followed
===============
1. Created a distributed-replicate volume of 2X2
2. Started the volume
3. NFS mounted the volume in RHEL 6.4
4. I was creating lots of files (i.e) merely 'touch'ing it
(i.e) for i in {1..100000}; do touch file${i}; done
5. While step 4, was in progress, from RHS Node, did gluster volume status for inode
(i.e) gluster volume status <vol-name> inode
This command took sometime and its does a lock for other OPs

6. Executed 'gluster volume status' command from the same node and also from other node, for more than 4 or 5 times.

All that time, I could get "Another Transaction is in progress". This makes it clear, attempt to acquire lock by an OP doesn't unlock the previously held lock

Comment 6 SATHEESARAN 2013-10-18 14:02:50 UTC
Fixed in version should be - glusterfs-3.4.0.35rhs-1 and not glusterfs-3.4.0.35rhs

Comment 8 errata-xmlrpc 2013-11-27 15:38:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html