1018043 – Crash in glusterd when command issued from multiple peers simultaneously

Bug 1018043 - Crash in glusterd when command issued from multiple peers simultaneously

Summary: Crash in glusterd when command issued from multiple peers simultaneously

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.0.0
Assignee:	Krutika Dhananjay
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-10-11 04:42 UTC by shylesh
Modified:	2015-05-13 16:55 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.4.0.35rhs
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-09-22 19:29:05 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2014:1278	0	normal	SHIPPED_LIVE	Red Hat Storage Server 3.0 bug fix and enhancement update	2014-09-22 23:26:55 UTC

Description shylesh 2013-10-11 04:42:30 UTC

Description of problem:
issuing commands from multiple peers at the same time causes glusterd crash

Version-Release number of selected component (if applicable):
glusterfs-libs-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-devel-3.4.0.34rhs-1.el6rhs.x86_64
samba-glusterfs-3.6.9-160.3.el6rhs.x86_64
gluster-swift-container-1.8.0-6.11.el6rhs.noarch
glusterfs-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-api-devel-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.34rhs-1.el6rhs.x86_64
gluster-swift-proxy-1.8.0-6.11.el6rhs.noarch
gluster-swift-account-1.8.0-6.11.el6rhs.noarch
gluster-swift-plugin-1.8.0-7.el6rhs.noarch
vdsm-gluster-4.10.2-23.0.1.el6rhs.noarch
glusterfs-fuse-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.34rhs-1.el6rhs.x86_64
gluster-swift-1.8.0-6.11.el6rhs.noarch
gluster-swift-object-1.8.0-6.11.el6rhs.noarch


How reproducible:
Not tried

Steps to Reproduce:
1.On a distributed volume "quota list " command was running in a loop
2.from another node issued "quota limit-usage" command  for some directory on the mount point


Actual results:
glusterd crashed

Expected results:


Additional info:
#0  gd_unlock_op_phase (peers=0xb7e940, op=<value optimized out>, op_ret=-1, req=0x7f5b88999920, op_ctx=0x7f5b8af8090c,
    op_errstr=0x7f5b700386d0 "Another transaction is in progress. Please try again after sometime.", npeers=0, is_locked=_gf_false)
    at glusterd-syncop.c:1085
1085            if (conf->pending_quorum_action)
Missing separate debuginfos, use: debuginfo-install device-mapper-event-libs-1.02.77-9.el6.x86_64 device-mapper-libs-1.02.77-9.el6.x86_64 glibc-2.12-1.107.el6_4.4.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.4.x86_64 libcom_err-1.41.12-14.el6_4.2.x86_64 libgcc-4.4.7-3.el6.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64 libsepol-2.0.41-4.el6.x86_64 libudev-147-2.46.el6.x86_64 libxml2-2.7.6-12.el6_4.1.x86_64 lvm2-libs-2.02.98-9.el6.x86_64 openssl-1.0.0-27.el6_4.2.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  gd_unlock_op_phase (peers=0xb7e940, op=<value optimized out>, op_ret=-1, req=0x7f5b88999920, op_ctx=0x7f5b8af8090c,
    op_errstr=0x7f5b700386d0 "Another transaction is in progress. Please try again after sometime.", npeers=0, is_locked=_gf_false)
    at glusterd-syncop.c:1085
#1  0x00007f5b88d5fbef in gd_sync_task_begin (op_ctx=0x7f5b8af8090c, req=0x7f5b88999920) at glusterd-syncop.c:1246
#2  0x00007f5b88d5ff0b in glusterd_op_begin_synctask (req=0x7f5b88999920, op=<value optimized out>, dict=0x7f5b8af8090c) at glusterd-syncop.c:1273
#3  0x00007f5b88d3deec in __glusterd_handle_quota (req=0x7f5b88999920) at glusterd-quota.c:116
#4  0x00007f5b88ceda7f in glusterd_big_locked_handler (req=0x7f5b88999920, actor_fn=0x7f5b88d3dcc0 <__glusterd_handle_quota>)
    at glusterd-handler.c:77
#5  0x00007f5b8c7aca72 in synctask_wrap (old_task=<value optimized out>) at syncop.c:132
#6  0x0000003bcde43bb0 in ?? () from /lib64/libc.so.6
#7  0x0000000000000000 in ?? ()


bt full
======
 at glusterd-syncop.c:1085
        peerinfo = <value optimized out>
        tmp = <value optimized out>
        tmp_uuid = '\000' <repeats 15 times>
        peer_cnt = <value optimized out>
        ret = <value optimized out>
        this = <value optimized out>
        conf = <value optimized out>
        args = {op_ret = 0, op_errno = 0, iatt1 = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {
              suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {
                read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, 
            ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, 
            ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, iatt2 = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, 
            ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', 
                exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', 
                exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, 
            ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, xattr = 0x0, entries = {{list = {next = 0x0, 
                prev = 0x0}, {next = 0x0, prev = 0x0}}, d_ino = 0, d_off = 0, d_len = 0, d_type = 0, d_stat = {ia_ino = 0, 
              ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', 
                owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, 
                other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, 
              ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, 
            dict = 0x0, inode = 0x0, d_name = 0x11aa070 ""}, statvfs_buf = {f_bsize = 0, f_frsize = 0, f_blocks = 0, f_bfree = 0, f_bavail = 0, 
            f_files = 0, f_ffree = 0, f_favail = 0, f_fsid = 0, f_flag = 0, f_namemax = 0, __f_spare = {0, 0, 0, 0, 0, 0}}, vector = 0x0, 
          count = 0, iobref = 0x0, buffer = 0x0, xdata = 0x0, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {
              len = 0, data = '\000' <repeats 1023 times>}}, uuid = '\000' <repeats 15 times>, errstr = 0x0, dict = 0x0, lock_dict = {__data = {
              __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
            __size = '\000' <repeats 39 times>, __align = 0}, barrier = {guard = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, 
                __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, cond = {
              __data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, 
                __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}, waitq = {next = 0x0, prev = 0x0}, count = 0}, task = 0x0, 
          mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
            __size = '\000' <repeats 39 times>, __align = 0}, cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, 
              __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}, done = 0}
        __FUNCTION__ = "gd_unlock_op_phase"
#1  0x00007f5b88d5fbef in gd_sync_task_begin (op_ctx=0x7f5b8af8090c, req=0x7f5b88999920) at glusterd-syncop.c:1246
        ret = -1
   npeers = <value optimized out>
        req_dict = 0x0
        conf = <value optimized out>
        op = GD_OP_QUOTA
        tmp_op = 17
        op_errstr = 0x7f5b700386d0 "Another transaction is in progress. Please try again after sometime."
        this = 0xb74f10
        is_locked = <value optimized out>
        __FUNCTION__ = "gd_sync_task_begin"
#2  0x00007f5b88d5ff0b in glusterd_op_begin_synctask (req=0x7f5b88999920, op=<value optimized out>, dict=0x7f5b8af8090c) at glusterd-syncop.c:1273
        ret = 0
        __FUNCTION__ = "glusterd_op_begin_synctask"
#3  0x00007f5b88d3deec in __glusterd_handle_quota (req=0x7f5b88999920) at glusterd-quota.c:116
        ret = 0
        cli_req = {dict = {dict_len = 92, dict_val = 0x7f5b700380f0 "\340l\001p[\177"}}
        dict = 0x7f5b8af8090c
        volname = 0x7f5b70021030 "\300l\001p[\177"
        type = 5
        msg = '\000' <repeats 2047 times>
        this = 0xb74f10
        conf = 0xb7e900
        __FUNCTION__ = "__glusterd_handle_quota"
#4  0x00007f5b88ceda7f in glusterd_big_locked_handler (req=0x7f5b88999920, actor_fn=0x7f5b88d3dcc0 <__glusterd_handle_quota>)
    at glusterd-handler.c:77
        priv = 0xb7e900
        ret = -1
#5  0x00007f5b8c7aca72 in synctask_wrap (old_task=<value optimized out>) at syncop.c:132
        task = 0xbbeb20
#6  0x0000003bcde43bb0 in ?? () from /lib64/libc.so.6
No symbol table info available.
#7  0x0000000000000000 in ?? ()




cluster info
-----------
Rhs nodes
======
rhs-client40.lab.eng.blr.redhat.com
rhs-client39.lab.eng.blr.redhat.com
rhs-client9.lab.eng.blr.redhat.com
rhs-client4.lab.eng.blr.redhat.com 

mount point
-----------
rhs-client4.lab.eng.blr.redhat.com:/mnt

[root@rhs-client4 mnt]# gluster v info qtest
 
Volume Name: qtest
Type: Distribute
Volume ID: f09ad949-5470-4a0a-a238-5ef74b09499f
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhs-client4.lab.eng.blr.redhat.com:/home/qtest0
Brick2: rhs-client9.lab.eng.blr.redhat.com:/home/qtest1
Brick3: rhs-client39.lab.eng.blr.redhat.com:/home/qtest2
Options Reconfigured:
features.quota: on


attaching the sosreports

Comment 4 SATHEESARAN 2013-10-11 08:49:36 UTC

Hit the similar crash,

2013-10-11 07:04:34.867610] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-10-11 07:09:28.452428] E [glusterd-utils.c:149:glusterd_lock] 0-management: Unable to get lock for uuid: a189b813-ddd5-406d-b363-31857e3f93f8, lock held by: a189b813-ddd5-406d-b363-31857e3f93f8
[2013-10-11 07:09:28.466916] E [glusterd-syncop.c:1202:gd_sync_task_begin] 0-management: Unable to acquire lock
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-10-11 07:09:28configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.34rhs
/lib64/libc.so.6[0x3621032960]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(gd_unlock_op_phase+0xae)[0x7f3b41b5bfce]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(gd_sync_task_begin+0xdf)[0x7f3b41b5cbef]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x3b)[0x7f3b41b5cf0b]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(__glusterd_handle_quota+0x22c)[0x7f3b41b3aeec]
/usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7f3b41aeaa7f]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x7f3b455a9a72]
/lib64/libc.so.6[0x3621043bb0]

Comment 5 shylesh 2013-10-17 05:11:35 UTC

Now issuing commands simultaneously gives the message "quota command failed : Another transaction is in progress. Please try again after sometime."
marking it as verified on 3.4.0.35rhs-1.el6rhs.x86_64

Comment 7 errata-xmlrpc 2014-09-22 19:29:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html

Note You need to log in before you can comment on or make changes to this bug.