1008173 – Running a second gluster command from the same node clears locks held by the first gluster command, even before the first command has completed execution

Bug 1008173 - Running a second gluster command from the same node clears locks held by the first gluster command, even before the first command has completed execution

Summary: Running a second gluster command from the same node clears locks held by the ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Avra Sengupta
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:	1008172
Blocks:
TreeView+	depends on / blocked

Reported:	2013-09-15 13:09 UTC by Avra Sengupta
Modified:	2013-11-27 15:38 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.4.0.35rhs-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1008172
Environment:
Last Closed:	2013-11-27 15:38:14 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:1769	0	normal	SHIPPED_LIVE	Red Hat Storage 2.1 enhancement and bug fix update #1	2013-11-27 20:17:39 UTC

Description Avra Sengupta 2013-09-15 13:09:27 UTC

+++ This bug was initially created as a clone of Bug #1008172 +++

Description of problem:

While a gluster command holding lock is in execution,
any other gluster command which tries to run will fail to
acquire the lock. As a result command#2 will follow the
cleanup code flow, which also includes unlocking the held
locks. As both the commands are run from the same node,
command#2 will end up releasing the locks held by command#1
even before command#1 reaches completion.


Version-Release number of selected component (if applicable):


How reproducible:
Everytime


Steps to Reproduce:
1. Make a gluster command take long time to execute (put in a hack to make it call a script and make the script sleep for 2-3 mins)
2. Meanwhile run another gluster command from the same node. This command will fail to acquire locks but end up releasing the locks already held. 
3. Now command#1 is still in execution, and the locks are released. You can run another command parallely to the first gluster command (still in execution)

Actual results:
Second gluster transaction from the same node releases the locks held by another transaction.


Expected results:
The locks should only be unlocked from the same transaction.
Additional info:

Comment 2 Amar Tumballi 2013-10-04 09:09:01 UTC

https://code.engineering.redhat.com/gerrit/#/c/13160/

Comment 3 SATHEESARAN 2013-10-16 02:00:46 UTC

I was trying to verify this bug with glusterfs-3.4.0.34rhs-1, and got glusterd crashing.

After looking in to logs, I could understand this is manifestation of bug, https://bugzilla.redhat.com/show_bug.cgi?id=1018043

So this bug also depends on, https://bugzilla.redhat.com/show_bug.cgi?id=1018043

This bug could not be verified, unless the fix for, https://bugzilla.redhat.com/show_bug.cgi?id=1018043 gets in.

Steps done to verify the bug
============================
1. Created a distributed-replicate volume of 2X2
2. Started the volume
3. NFS mounted the volume in RHEL 6.4
4. I was creating lots of files (i.e) merely 'touch'ing it
(i.e) for i in {1..100000}; do touch file${i}; done
5. While step 4, was in progress, from RHS Node, did gluster volume status for inode
(i.e) gluster volume status <vol-name> inode
6. 'gluster status inode' was taking sometime and I issued 'gluster volume status'
command from the same node.

Result - glusterd crashed with error message, as captured below

[Tue Oct 15 18:42:57 UTC 2013 root.37.170:~ ] # gluster volume status distrepvol inode
Connection failed. Please check if gluster daemon is operational.

Snip of glusterd crash in glusterd log
======================================

[2013-10-15 18:44:18.687354] E [glusterd-utils.c:149:glusterd_lock] 0-management: Unable to get lock for uuid: e9c5334f-9522-4d50-946c-aeb15a6166db, lock held by: e9c5334f-9522-4d50-946c-aeb15a6166db
[2013-10-15 18:44:18.687402] E [glusterd-syncop.c:1202:gd_sync_task_begin] 0-management: Unable to acquire lock
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-10-15 18:44:18configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.34rhs
/lib64/libc.so.6[0x3c5d232960]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(gd_unlock_op_phase+0xae)[0x7fa44c87ffce]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(gd_sync_task_begin+0xdf)[0x7fa44c880bef]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x3b)[0x7fa44c880f0b]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(__glusterd_handle_status_volume+0x14a)[0x7fa44c80e4fa]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7fa44c80ea7f]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x3a63a49a72]
/lib64/libc.so.6[0x3c5d243bb0]
---------

Expected
=========
1. When there is lock held already, it should throw, "Another transaction is in progress"
2. The attempt to acquire a lock, when its already held by another op, should not unlock the previously held lock

Conclusion
==========
Since glusterd crashed against expectation, marking this bug as FailedQA

Comment 5 SATHEESARAN 2013-10-18 14:01:49 UTC

VERIFIED with glusterfs-3.4.0.35rhs-1

Steps followed
===============
1. Created a distributed-replicate volume of 2X2
2. Started the volume
3. NFS mounted the volume in RHEL 6.4
4. I was creating lots of files (i.e) merely 'touch'ing it
(i.e) for i in {1..100000}; do touch file${i}; done
5. While step 4, was in progress, from RHS Node, did gluster volume status for inode
(i.e) gluster volume status <vol-name> inode
This command took sometime and its does a lock for other OPs

6. Executed 'gluster volume status' command from the same node and also from other node, for more than 4 or 5 times.

All that time, I could get "Another Transaction is in progress". This makes it clear, attempt to acquire lock by an OP doesn't unlock the previously held lock

Comment 6 SATHEESARAN 2013-10-18 14:02:50 UTC

Fixed in version should be - glusterfs-3.4.0.35rhs-1 and not glusterfs-3.4.0.35rhs

Comment 8 errata-xmlrpc 2013-11-27 15:38:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html

Note You need to log in before you can comment on or make changes to this bug.