Description of problem: Quota list on a volume hangs after glusterd restart an a node. Version-Release number of selected component (if applicable): [root@rhsqa14-vm1 ~]# rpm -qa | grep gluster gluster-nagios-addons-0.2.3-1.el6rhs.x86_64 glusterfs-3.7.1-8.el6rhs.x86_64 glusterfs-cli-3.7.1-8.el6rhs.x86_64 gluster-nagios-common-0.2.0-1.el6rhs.noarch glusterfs-libs-3.7.1-8.el6rhs.x86_64 glusterfs-client-xlators-3.7.1-8.el6rhs.x86_64 glusterfs-api-3.7.1-8.el6rhs.x86_64 glusterfs-server-3.7.1-8.el6rhs.x86_64 glusterfs-rdma-3.7.1-8.el6rhs.x86_64 vdsm-gluster-4.16.20-1.1.el6rhs.noarch glusterfs-fuse-3.7.1-8.el6rhs.x86_64 glusterfs-geo-replication-3.7.1-8.el6rhs.x86_64 [root@rhsqa14-vm1 ~]# How reproducible: easily Steps to Reproduce: 1.create dsitrep volume, 2. enable quota on it. enable quota limit 3.check the enabled quota limit. 4. restart glusterd on the node. 5. check the list immediately. it hangs indefinitely and use ctrl+c to stop the command and check the quota limit, it displays properly. Additional info: root@rhsqa14-vm1 ~]# gluster v create good 10.70.47.165:/rhs/brick1/g0 10.70.47.163:/rhs/brick1/g0 volume create: good: success: please start the volume to access data [root@rhsqa14-vm1 ~]# gluster v start good volume start: good: success [root@rhsqa14-vm1 ~]# gluster v info good Volume Name: good Type: Distribute Volume ID: 03bf7a59-aa79-4ff2-80ec-57e33ff37d88 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 10.70.47.165:/rhs/brick1/g0 Brick2: 10.70.47.163:/rhs/brick1/g0 Options Reconfigured: performance.readdir-ahead: on [root@rhsqa14-vm1 ~]# gluster v status good Status of volume: good Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.165:/rhs/brick1/g0 49156 0 Y 10746 Brick 10.70.47.163:/rhs/brick1/g0 49156 0 Y 8773 NFS Server on localhost 2049 0 Y 10765 NFS Server on 10.70.47.163 2049 0 Y 8792 Task Status of Volume good ------------------------------------------------------------------------------ There are no active volume tasks [root@rhsqa14-vm1 ~]# ./options.sh good volume set: success volume quota : success volume set: success volume quota : success volume set: success [root@rhsqa14-vm1 ~]# service glusterd status glusterd (pid 10465) is running... [root@rhsqa14-vm1 ~]# gluster v quota good list Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? --------------------------------------------------------------------------------------------------------------------------- / 20.0GB 80% 0Bytes 20.0GB No No [root@rhsqa14-vm1 ~]# service glusterd restart Starting glusterd: [ OK ] [root@rhsqa14-vm1 ~]# gluster v quota good list Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? --------------------------------------------------------------------------------------------------------------------------- ^C [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm1 ~]# gluster v quota good list Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? --------------------------------------------------------------------------------------------------------------------------- / 20.0GB 80% 0Bytes 20.0GB No No [root@rhsqa14-vm1 ~]# service glusterd restart Starting glusterd: [ OK ] [root@rhsqa14-vm1 ~]# gluster v quota good list Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? --------------------------------------------------------------------------------------------------------------------------- ^C [root@rhsqa14-vm1 ~]# gluster v quota good list quota command failed : Locking failed on 10.70.47.163. Please check log file for details. [root@rhsqa14-vm1 ~]# gluster v quota good list quota command failed : Locking failed on 10.70.47.163. Please check log file for details. [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm2 ~]# gluster v quota good list Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? --------------------------------------------------------------------------------------------------------------------------- / 20.0GB 80% 0Bytes 20.0GB No No [root@rhsqa14-vm2 ~]# service glusterd restart Starting glusterd: [ OK ] [root@rhsqa14-vm2 ~]# [root@rhsqa14-vm2 ~]# gluster v quota good list ^C [root@rhsqa14-vm2 ~]# gluster v quota good list quota command failed : Another transaction is in progress for good. Please try again after sometime. [root@rhsqa14-vm2 ~]# Log messages. [2015-07-13 18:56:46.187969] W [glusterd-locks.c:677:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f0b2896a52 0] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x551)[0x7f0b1d40b441] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/ glusterd.so(__glusterd_peer_rpc_notify+0x2f0)[0x7f0b1d371290] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60) [0x7f0b1d354a00] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x7f0b28736953] ))))) 0-management: Lock for vol good not held [2015-07-13 18:56:46.188046] W [MSGID: 106118] [glusterd-handler.c:5059:__glusterd_peer_rpc_notify] 0-management: Lock not released for good [2015-07-13 18:56:46.188901] W [glusterd-locks.c:677:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f0b2896a520] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x551)[0x7f0b1d40b441] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2f0)[0x7f0b1d371290] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f0b1d354a00] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x7f0b28736953] ))))) 0-management: Lock for vol test not held [2015-07-13 18:56:46.188966] W [MSGID: 106118] [glusterd-handler.c:5059:__glusterd_peer_rpc_notify] 0-management: Lock not released for test [2015-07-13 18:56:51.877542] I [MSGID: 106163] [glusterd-handshake.c:1198:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30703 [2015-07-13 18:56:51.988913] I [MSGID: 106490] [glusterd-handler.c:2530:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 053f0a9c-84df-4e69-9825-e480ea7f51fa [2015-07-13 18:56:52.032972] I [MSGID: 106490] [glusterd-handler.c:2530:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 053f0a9c-84df-4e69-9825-e480ea7f51fa [2015-07-13 18:56:56.571192] I [MSGID: 106493] [glusterd-handler.c:3757:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.47.163 (0), ret: 0 [2015-07-13 18:56:56.822277] W [glusterd-locks.c:575:glusterd_mgmt_v3_lock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f0b2896a520] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x1d4)[0x7f0b1d40bf84] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(+0x5b9b5)[0x7f0b1d3779b5] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_op_sm+0x231)[0x7f0b1d37b8e1] (--> /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(+0x1115b9)[0x7f0b1d42d5b9] ))))) 0-management: Lock for good held by 053f0a9c-84df-4e69-9825-e480ea7f51fa [2015-07-13 18:56:56.693505] I [MSGID: 106493] [glusterd-handler.c:3757:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.47.163 (0), ret: 0 [2015-07-13 18:56:56.822334] E [MSGID: 106119] [glusterd-op-sm.c:3556:glusterd_op_ac_lock] 0-management: Unable to acquire lock for good [2015-07-13 18:56:56.822557] E [MSGID: 106376] [glusterd-op-sm.c:7289:glusterd_op_sm] 0-management: handler returned: -1 [2015-07-13 18:56:58.786170] E [glusterd-op-sm.c:253:glusterd_get_txn_opinfo] (--> 0-management: Unable to get transaction opinfo for transaction ID : c79c04fd-e35a-4bd2-8670-295679f2196e [2015-07-13 18:56:58.786219] E [MSGID: 106402] [glusterd-op-sm.c:7270:glusterd_op_sm] 0-management: Unable to get transaction's opinfo [2015-07-13 18:56:59.004321] I [MSGID: 106493] [glusterd-rpc-ops.c:478:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 053f0a9c-84df-4e69-9825-e480ea7f51fa, host: 10.70.47.163, port: 0 [2015-07-13 18:56:59.114350] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid:: [2015-07-13 18:56:59.114350] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 053f0a9c-84df-4e69-9825-e480ea7f51fa [2015-07-13 18:56:59.114454] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-07-13 18:56:59.114485] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-07-13 18:56:59.115160] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 053f0a9c-84df-4e69-9825-e480ea7f51fa [2015-07-13 18:56:59.115238] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-07-13 18:56:51.919906] I [MSGID: 106163] [glusterd-handshake.c:1198:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30703 [2015-07-13 18:56:59.115289] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2015-07-13 18:58:31.155153] E [MSGID: 106116] [glusterd-mgmt.c:134:gd_mgmt_v3_collate_errors] 0-management: Locking failed on 10.70.47.163. Please check log file for details. [2015-07-13 18:58:31.155257] E [MSGID: 106151] [glusterd-syncop.c:1870:gd_sync_task_begin] 0-management: Locking Peers Failed. [2015-07-13 18:58:31.160289] E [MSGID: 106116] [glusterd-mgmt.c:134:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on 10.70.47.163. Please check log file for details. [2015-07-13 18:58:31.160534] E [MSGID: 106152] [glusterd-syncop.c:1574:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s) [2015-07-13 18:59:00.579592] E [MSGID: 106116] [glusterd-mgmt.c:134:gd_mgmt_v3_collate_errors] 0-management: Locking failed on 10.70.47.163. Please check log file for details. [2015-07-13 18:59:00.579681] E [MSGID: 106151] [glusterd-syncop.c:1870:gd_sync_task_begin] 0-management: Locking Peers Failed. [2015-07-13 18:59:00.584048] E [MSGID: 106116] [glusterd-mgmt.c:134:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on 10.70.47.163. Please check log file for details. [2015-07-13 18:59:00.584102] E [MSGID: 106152] [glusterd-syncop.c:1574:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s) (END)
triveni, how many volume you have. can you paste the output of "#gluster volume status" after restarting glusterd.
[Switching to thread 3 (Thread 0x7ffff01ea700 (LWP 6210))]#0 0x0000003bc760e264 in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) bt #0 0x0000003bc760e264 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003bc7609508 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x0000003bc76093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x000000000040b25d in cli_cmd_broadcast_response (status=-1) at cli-cmd.c:310 #4 0x0000000000424e1f in cli_quotad_getlimit_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7fffe000181c) at cli-rpc-ops.c:3359 #5 0x00007ffff7911ca6 in rpc_clnt_submit (rpc=0x6ac2d0, prog=<value optimized out>, procnum=<value optimized out>, cbkfn=0x424d90 <cli_quotad_getlimit_cbk>, proghdr=<value optimized out>, proghdrcount=<value optimized out>, progpayload=0x0, progpayloadcount=0, iobref=<value optimized out>, frame=0x7fffe000181c, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1600 #6 0x000000000040a74b in cli_submit_request (rpc=0x6ac2d0, req=<value optimized out>, frame=0x7fffe000181c, prog=<value optimized out>, procnum=2, iobref=0x7fffe0002ba0, this=0x7ffff7ffc220, cbkfn=0x424d90 <cli_quotad_getlimit_cbk>, xdrproc=0x409388 <xdr_gf_cli_req@plt>) at cli.c:237 #7 0x000000000040b4fa in cli_cmd_submit (rpc=0x6ac2d0, req=0x7ffff01e8c00, frame=0x7fffe000181c, prog=0x659380, procnum=<value optimized out>, iobref=<value optimized out>, this=0x7ffff7ffc220, cbkfn=0x424d90 <cli_quotad_getlimit_cbk>, xdrproc=0x409388 <xdr_gf_cli_req@plt>) at cli-cmd.c:380 #8 0x0000000000415a2f in cli_quotad_getlimit (frame=0x7fffe000181c, this=0x7ffff7ffc220, data=<value optimized out>) at cli-rpc-ops.c:3389 #9 0x000000000040f7cf in cli_cmd_quota_handle_list_all (words=<value optimized out>, options=<value optimized out>) at cli-cmd-volume.c:1419 #10 0x0000000000410c41 in cli_cmd_quota_cbk (state=0x7fffffffe220, word=<value optimized out>, words=0x7fffffffe420, wordcount=4) at cli-cmd-volume.c:1571 #11 0x000000000040b70b in cli_cmd_process (state=0x7fffffffe220, argc=4, argv=0x7fffffffe420) at cli-cmd.c:135 #12 0x000000000040b170 in cli_batch (d=<value optimized out>) at input.c:29 #13 0x0000003bc76079d1 in start_thread () from /lib64/libpthread.so.0 #14 0x0000003bc72e8b6d in clone () from /lib64/libc.so.6 (gdb) p cond_mutex $2 = {__data = {__lock = 2, __count = 0, __owner = 6210, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000B\030\000\000\001", '\000' <repeats 26 times>, __align = 2} from above bt, callback also called by same thread and it trying get lock which is held by same thread, it leads to dead lock .
cli list command hangs if quotad is not running, this is negative scenario , so this bug is not blocker for 3.1,
upstream patch already available for this bug. url is http://review.gluster.org/#/c/11658/
http://review.gluster.org/11658 patch solving this problem which is already merged. but this problem might hit again, Assigning to quota team member.
Patch merged: https://code.engineering.redhat.com/gerrit/#/c/56573/
[root@darkknight glusterfs]# systemctl restart glusterd gluster v quota testvol list [root@darkknight glusterfs]# gluster v quota testvol list Connection failed. Please check if quota daemon is operational. [root@darkknight glusterfs]# ps aux | grep quotad root 19330 0.0 0.4 461684 33248 ? Ssl 21:33 0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/lib/glusterd/quotad/run/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/f077dff85992befedc232c1fc52788c8.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off root 19453 0.0 0.0 112644 928 pts/0 S+ 21:36 0:00 grep --color=auto quotad [root@darkknight glusterfs]# kill 19330 [root@darkknight glusterfs]# gluster v quota testvol list Connection failed. Please check if quota daemon is operational. Bug verified on build glusterfs-3.7.1-14.el7rhgs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1845.html