Description of problem: Hi, I have a cluster with 3 nodes on pre-production. Yesterday, one node was down. The errror that I have seen is that: [2015-05-28 19:04:27.305560] E [glusterd-syncop.c:1578:gd_sync_task_begin] 0-management: Unable to acquire lock for cfe-gv1 The message "I [MSGID: 106006] [glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: nfs has disconnected from glusterd." repeated 5 times between [2015-05-28 19:04:09.346088] and [2015-05-28 19:04:24.349191] pending frames: frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-05-28 19:04:27 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.6.1 /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7fd86e2f1232] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7fd86e30871d] /usr/lib64/libc.so.6(+0x35640)[0x7fd86d30c640] /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_remove_pending_entry+0x2c)[0x7fd85f52450c] /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(+0x5ae28)[0x7fd85f511e28] /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_op_sm+0x237)[0x7fd85f50f027] /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_brick_op_cbk+0x2fe)[0x7fd85f53be5e] /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_cbk+0x4c)[0x7fd85f53d48c] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7fd86e0c50b0] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x171)[0x7fd86e0c5321] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fd86e0c1273] /usr/lib64/glusterfs/3.6.1/rpc-transport/socket.so(+0x8530)[0x7fd85d17d530] /usr/lib64/glusterfs/3.6.1/rpc-transport/socket.so(+0xace4)[0x7fd85d17fce4] /usr/lib64/libglusterfs.so.0(+0x76322)[0x7fd86e346322] /usr/sbin/glusterd(main+0x502)[0x7fd86e79afb2] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd86d2f8af5] /usr/sbin/glusterd(+0x6351)[0x7fd86e79b351] --------- Version-Release number of selected component (if applicable): 6.3.1 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1031880 [details] sosreport
Created attachment 1031897 [details] File1
Created attachment 1031909 [details] File2
Created attachment 1031921 [details] File 3
Created attachment 1031935 [details] File 4
Created attachment 1031949 [details] File 5
Created attachment 1031952 [details] File 6
Created attachment 1031953 [details] File 7
Created attachment 1031982 [details] File 8
Created attachment 1032015 [details] File 9
Created attachment 1032017 [details] File 10
Created attachment 1032018 [details] File 11
Created attachment 1032019 [details] File 12
Created attachment 1032022 [details] File 13
Created attachment 1032034 [details] File 14
Created attachment 1032036 [details] File 15
Created attachment 1033229 [details] Glusterd log
Created attachment 1033231 [details] Cli log
Created attachment 1033249 [details] Glustershd log
Please attach the core file and mention the steps performed to hit the crash.
Created attachment 1033252 [details] cmd history
The problem what I see here is concurrent volume status transactions were run at a given point of time. 3.6.1 has some fixes missing to take care of the issues identified on the same line. If you upgrade your cluster to 3.6.3 beta version the problem will go away. However 3.6.3 still misses one more fix http://review.gluster.org/#/c/10023/ which will be released in 3.6.4. I would request you to upgrade your cluster to 3.6.3 if not 3.7.
Could you upgrade your cluster and check if this problem goes away, if so then mind to close this bug?
Since the reported hasn't gotten back with updates closing it, feel free to reopen if the problem persists.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days