Description of problem: issuing commands from multiple peers at the same time causes glusterd crash Version-Release number of selected component (if applicable): glusterfs-libs-3.4.0.34rhs-1.el6rhs.x86_64 glusterfs-api-3.4.0.34rhs-1.el6rhs.x86_64 glusterfs-devel-3.4.0.34rhs-1.el6rhs.x86_64 samba-glusterfs-3.6.9-160.3.el6rhs.x86_64 gluster-swift-container-1.8.0-6.11.el6rhs.noarch glusterfs-3.4.0.34rhs-1.el6rhs.x86_64 glusterfs-geo-replication-3.4.0.34rhs-1.el6rhs.x86_64 glusterfs-api-devel-3.4.0.34rhs-1.el6rhs.x86_64 glusterfs-rdma-3.4.0.34rhs-1.el6rhs.x86_64 glusterfs-debuginfo-3.4.0.34rhs-1.el6rhs.x86_64 gluster-swift-proxy-1.8.0-6.11.el6rhs.noarch gluster-swift-account-1.8.0-6.11.el6rhs.noarch gluster-swift-plugin-1.8.0-7.el6rhs.noarch vdsm-gluster-4.10.2-23.0.1.el6rhs.noarch glusterfs-fuse-3.4.0.34rhs-1.el6rhs.x86_64 glusterfs-server-3.4.0.34rhs-1.el6rhs.x86_64 gluster-swift-1.8.0-6.11.el6rhs.noarch gluster-swift-object-1.8.0-6.11.el6rhs.noarch How reproducible: Not tried Steps to Reproduce: 1.On a distributed volume "quota list " command was running in a loop 2.from another node issued "quota limit-usage" command for some directory on the mount point Actual results: glusterd crashed Expected results: Additional info: #0 gd_unlock_op_phase (peers=0xb7e940, op=<value optimized out>, op_ret=-1, req=0x7f5b88999920, op_ctx=0x7f5b8af8090c, op_errstr=0x7f5b700386d0 "Another transaction is in progress. Please try again after sometime.", npeers=0, is_locked=_gf_false) at glusterd-syncop.c:1085 1085 if (conf->pending_quorum_action) Missing separate debuginfos, use: debuginfo-install device-mapper-event-libs-1.02.77-9.el6.x86_64 device-mapper-libs-1.02.77-9.el6.x86_64 glibc-2.12-1.107.el6_4.4.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.4.x86_64 libcom_err-1.41.12-14.el6_4.2.x86_64 libgcc-4.4.7-3.el6.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64 libsepol-2.0.41-4.el6.x86_64 libudev-147-2.46.el6.x86_64 libxml2-2.7.6-12.el6_4.1.x86_64 lvm2-libs-2.02.98-9.el6.x86_64 openssl-1.0.0-27.el6_4.2.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 gd_unlock_op_phase (peers=0xb7e940, op=<value optimized out>, op_ret=-1, req=0x7f5b88999920, op_ctx=0x7f5b8af8090c, op_errstr=0x7f5b700386d0 "Another transaction is in progress. Please try again after sometime.", npeers=0, is_locked=_gf_false) at glusterd-syncop.c:1085 #1 0x00007f5b88d5fbef in gd_sync_task_begin (op_ctx=0x7f5b8af8090c, req=0x7f5b88999920) at glusterd-syncop.c:1246 #2 0x00007f5b88d5ff0b in glusterd_op_begin_synctask (req=0x7f5b88999920, op=<value optimized out>, dict=0x7f5b8af8090c) at glusterd-syncop.c:1273 #3 0x00007f5b88d3deec in __glusterd_handle_quota (req=0x7f5b88999920) at glusterd-quota.c:116 #4 0x00007f5b88ceda7f in glusterd_big_locked_handler (req=0x7f5b88999920, actor_fn=0x7f5b88d3dcc0 <__glusterd_handle_quota>) at glusterd-handler.c:77 #5 0x00007f5b8c7aca72 in synctask_wrap (old_task=<value optimized out>) at syncop.c:132 #6 0x0000003bcde43bb0 in ?? () from /lib64/libc.so.6 #7 0x0000000000000000 in ?? () bt full ====== at glusterd-syncop.c:1085 peerinfo = <value optimized out> tmp = <value optimized out> tmp_uuid = '\000' <repeats 15 times> peer_cnt = <value optimized out> ret = <value optimized out> this = <value optimized out> conf = <value optimized out> args = {op_ret = 0, op_errno = 0, iatt1 = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = { suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, iatt2 = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, xattr = 0x0, entries = {{list = {next = 0x0, prev = 0x0}, {next = 0x0, prev = 0x0}}, d_ino = 0, d_off = 0, d_len = 0, d_type = 0, d_stat = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, dict = 0x0, inode = 0x0, d_name = 0x11aa070 ""}, statvfs_buf = {f_bsize = 0, f_frsize = 0, f_blocks = 0, f_bfree = 0, f_bavail = 0, f_files = 0, f_ffree = 0, f_favail = 0, f_fsid = 0, f_flag = 0, f_namemax = 0, __f_spare = {0, 0, 0, 0, 0, 0}}, vector = 0x0, count = 0, iobref = 0x0, buffer = 0x0, xdata = 0x0, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = { len = 0, data = '\000' <repeats 1023 times>}}, uuid = '\000' <repeats 15 times>, errstr = 0x0, dict = 0x0, lock_dict = {__data = { __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, barrier = {guard = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, cond = { __data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}, waitq = {next = 0x0, prev = 0x0}, count = 0}, task = 0x0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}, done = 0} __FUNCTION__ = "gd_unlock_op_phase" #1 0x00007f5b88d5fbef in gd_sync_task_begin (op_ctx=0x7f5b8af8090c, req=0x7f5b88999920) at glusterd-syncop.c:1246 ret = -1 npeers = <value optimized out> req_dict = 0x0 conf = <value optimized out> op = GD_OP_QUOTA tmp_op = 17 op_errstr = 0x7f5b700386d0 "Another transaction is in progress. Please try again after sometime." this = 0xb74f10 is_locked = <value optimized out> __FUNCTION__ = "gd_sync_task_begin" #2 0x00007f5b88d5ff0b in glusterd_op_begin_synctask (req=0x7f5b88999920, op=<value optimized out>, dict=0x7f5b8af8090c) at glusterd-syncop.c:1273 ret = 0 __FUNCTION__ = "glusterd_op_begin_synctask" #3 0x00007f5b88d3deec in __glusterd_handle_quota (req=0x7f5b88999920) at glusterd-quota.c:116 ret = 0 cli_req = {dict = {dict_len = 92, dict_val = 0x7f5b700380f0 "\340l\001p[\177"}} dict = 0x7f5b8af8090c volname = 0x7f5b70021030 "\300l\001p[\177" type = 5 msg = '\000' <repeats 2047 times> this = 0xb74f10 conf = 0xb7e900 __FUNCTION__ = "__glusterd_handle_quota" #4 0x00007f5b88ceda7f in glusterd_big_locked_handler (req=0x7f5b88999920, actor_fn=0x7f5b88d3dcc0 <__glusterd_handle_quota>) at glusterd-handler.c:77 priv = 0xb7e900 ret = -1 #5 0x00007f5b8c7aca72 in synctask_wrap (old_task=<value optimized out>) at syncop.c:132 task = 0xbbeb20 #6 0x0000003bcde43bb0 in ?? () from /lib64/libc.so.6 No symbol table info available. #7 0x0000000000000000 in ?? () cluster info ----------- Rhs nodes ====== rhs-client40.lab.eng.blr.redhat.com rhs-client39.lab.eng.blr.redhat.com rhs-client9.lab.eng.blr.redhat.com rhs-client4.lab.eng.blr.redhat.com mount point ----------- rhs-client4.lab.eng.blr.redhat.com:/mnt [root@rhs-client4 mnt]# gluster v info qtest Volume Name: qtest Type: Distribute Volume ID: f09ad949-5470-4a0a-a238-5ef74b09499f Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: rhs-client4.lab.eng.blr.redhat.com:/home/qtest0 Brick2: rhs-client9.lab.eng.blr.redhat.com:/home/qtest1 Brick3: rhs-client39.lab.eng.blr.redhat.com:/home/qtest2 Options Reconfigured: features.quota: on attaching the sosreports
Hit the similar crash, 2013-10-11 07:04:34.867610] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-10-11 07:09:28.452428] E [glusterd-utils.c:149:glusterd_lock] 0-management: Unable to get lock for uuid: a189b813-ddd5-406d-b363-31857e3f93f8, lock held by: a189b813-ddd5-406d-b363-31857e3f93f8 [2013-10-11 07:09:28.466916] E [glusterd-syncop.c:1202:gd_sync_task_begin] 0-management: Unable to acquire lock pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-10-11 07:09:28configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.34rhs /lib64/libc.so.6[0x3621032960] /usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(gd_unlock_op_phase+0xae)[0x7f3b41b5bfce] /usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(gd_sync_task_begin+0xdf)[0x7f3b41b5cbef] /usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x3b)[0x7f3b41b5cf0b] /usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(__glusterd_handle_quota+0x22c)[0x7f3b41b3aeec] /usr/lib64/glusterfs/3.4.0.34rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7f3b41aeaa7f] /usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x7f3b455a9a72] /lib64/libc.so.6[0x3621043bb0]
Now issuing commands simultaneously gives the message "quota command failed : Another transaction is in progress. Please try again after sometime." marking it as verified on 3.4.0.35rhs-1.el6rhs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html