Bug 1043862

Summary: [SNAPSHOT] : glusterd crashed while acquiring volume lock when the snap create command is executed on the same vol from all the nodes in cluster
Product: Red Hat Gluster Storage Reporter: Rahul Hinduja <rhinduja>
Component: snapshotAssignee: Avra Sengupta <asengupt>
Status: CLOSED ERRATA QA Contact: Rahul Hinduja <rhinduja>
Severity: urgent Docs Contact:
Priority: urgent    
Version: rhgs-3.0CC: asengupt, nsathyan, rhs-bugs, sasundar, sdharane, senaik, ssamanta, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.0.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: SNAPSHOT
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-22 19:30:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Rahul Hinduja 2013-12-17 11:36:42 UTC
Description of problem:
=======================

While trying to create snapshot(same or different snap name ) for the same volume from all the nodes in cluster, glusterd crashed on all the nodes in cluster with the following logs:

[2013-12-17 01:27:59.578772] E [glusterd-locks.c:233:glusterd_volume_lock] 0-: Unable to acquire lock. Lock for vol-snap0 held by 8795e6ed-a3b7-4c04-b397-d938a05dd1d4
[2013-12-17 01:27:59.578825] E [glusterd-locks.c:174:glusterd_multiple_volumes_lock] 0-: Failed to acquire lock for vol-snap0. Unlocking other volumes locked by this transaction
[2013-12-17 01:27:59.578842] E [glusterd-mgmt-handler.c:85:glusterd_syctasked_volume_lock] 0-: Failed to acquire volume locks on localhost
[2013-12-17 01:27:59.579111] E [rpcsvc.c:495:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
2013-12-17 01:27:59.579670] E [glusterd-mgmt.c:108:gd_mgmt_v3_collate_errors] 0-: Locking volume failed on 10.70.43.20. Please check log file for details.
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2013-12-17 01:27:59configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.snap.dec03.2013git
/lib64/libc.so.6[0x312ea329a0]
/lib64/libc.so.6(gsignal+0x35)[0x312ea32925]
/lib64/libc.so.6(abort+0x175)[0x312ea34105]
/lib64/libc.so.6[0x312ea70837]
/lib64/libc.so.6[0x312ea76166]
/lib64/libc.so.6[0x312ea79f8a]
/lib64/libc.so.6(__libc_calloc+0xc6)[0x312ea7a626]
/usr/lib64/libglusterfs.so.0(__gf_calloc+0x53)[0x7f0fd92e5b33]
/usr/lib64/libglusterfs.so.0(iobref_new+0x15)[0x7f0fd92e6cd5]
/usr/lib64/glusterfs/3.4.0.snap.dec03.2013git/rpc-transport/socket.so(+0x944c)[0x7f0fd51dc44c]
/usr/lib64/glusterfs/3.4.0.snap.dec03.2013git/rpc-transport/socket.so(+0xa7bd)[0x7f0fd51dd7bd]
/usr/lib64/libglusterfs.so.0(+0x66097)[0x7f0fd930c097]
/usr/sbin/glusterd(main+0x53a)[0x40680a]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x312ea1ed1d]
/usr/sbin/glusterd[0x404629]
---------

========================================================================

Following set of commands were in execution from all the nodes in cluster simultaneously

gluster snapshot create vol-snap0 -n snap3
gluster snapshot create vol-snap0 -n snap4
gluster snapshot create vol-snap0 -n snap5
gluster snapshot create vol-snap0 -n snap6
gluster snapshot create vol-snap0 -n snap7

expected result was:
===================

1. If a snap already exist, fail the command with message "snap already exist"
2. If a snap name doesnt exist than acquire a lock from one machine on a volume and success the snap creation.

actual result:
===============

glusterd crashed on all the nodes in cluster



Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.4.0.snap.dec03.2013git-1.el6.x86_64

Comment 3 Avra Sengupta 2014-01-22 12:04:24 UTC
Fix at http://review.gluster.org/#/c/6602/

Comment 4 Rahul Hinduja 2014-01-22 12:08:42 UTC
Verified with build: glusterfs-3.4.1.snap.jan15.2014git-1.el6.x86_64

Tried various cases for multiple volume locks on same and different volumes. Didn't observe the glusterd crash. Marking the bug to verified state.

Comment 6 Nagaprasad Sathyanarayana 2014-04-21 06:17:50 UTC
Marking snapshot BZs to RHS 3.0.

Comment 7 Nagaprasad Sathyanarayana 2014-05-19 10:56:40 UTC
Setting flags required to add BZs to RHS 3.0 Errata

Comment 10 errata-xmlrpc 2014-09-22 19:30:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html