+++ This bug was initially created as a clone of Bug #1262291 +++ Description of problem: ----------------------- Trying to fetch the value of `replica.split-brain-status' attribute of a file in a volume causes the mount to hang. The following is using gdb on the mount process - Thread 5 (Thread 0x7facd0a94700 (LWP 24886)): #0 0x00007facdc8a463c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007facdd7fe02b in __syncbarrier_wait (barrier=0x7facc94991c0, waitfor=2) at syncop.c:1130 #2 syncbarrier_wait (barrier=0x7facc94991c0, waitfor=2) at syncop.c:1147 #3 0x00007faccbb8a98d in afr_selfheal_unlocked_discover_on (frame=<value optimized out>, inode=<value optimized out>, gfid=<value optimized out>, replies=0x7facd0a91590, discover_on=0x7faccc05ffb0 "\001\001\r", <incomplete sequence \360\255\272>) at afr-self-heal-common.c:754 #4 0x00007faccbb97b47 in afr_is_split_brain (frame=0x7facdb371474, this=0x7faccc008c90, inode=0x7facc9b18108, gfid=0x7facbc000ca0 "\370f\262,\031", d_spb=<value optimized out>, m_spb=0x7facd0a91e40) at afr-common.c:4850 #5 0x00007faccbb9a9f1 in afr_get_split_brain_status (frame=0x7facdb371474, this=0x7faccc008c90, loc=0x7facbc000c80) at afr-common.c:4900 #6 0x00007faccbb72eb0 in afr_getxattr (frame=0x7facdb371474, this=0x7faccc008c90, loc=0x7facbc000c80, name=<value optimized out>, xdata=0x0) at afr-inode-read.c:1492 #7 0x00007faccb92bf8c in dht_getxattr (frame=0x7facdb37187c, this=<value optimized out>, loc=0x7facbc000c80, key=<value optimized out>, xdata=0x0) at dht-common.c:3264 #8 0x00007facdd7bef3b in default_getxattr (frame=0x7facdb37187c, this=0x7faccc00b4d0, loc=0x7facbc000c80, name=0x7facbc0103a0 "replica.split-brain-status", xdata=<value optimized out>) at defaults.c:1970 #9 0x00007facdd7bef3b in default_getxattr (frame=0x7facdb37187c, this=0x7faccc00c920, loc=0x7facbc000c80, name=0x7facbc0103a0 "replica.split-brain-status", xdata=<value optimized out>) at defaults.c:1970 #10 0x00007facdd7bef3b in default_getxattr (frame=0x7facdb37187c, this=0x7faccc00dc80, loc=0x7facbc000c80, name=0x7facbc0103a0 "replica.split-brain-status", xdata=<value optimized out>) at defaults.c:1970 #11 0x00007facdd7bef3b in default_getxattr (frame=0x7facdb37187c, this=0x7faccc00f0d0, loc=0x7facbc000c80, name=0x7facbc0103a0 "replica.split-brain-status", xdata=<value optimized out>) at defaults.c:1970 #12 0x00007facdd7bef3b in default_getxattr (frame=0x7facdb37187c, this=0x7faccc010430, loc=0x7facbc000c80, name=0x7facbc0103a0 "replica.split-brain-status", xdata=<value optimized out>) at defaults.c:1970 #13 0x00007facdd7bef3b in default_getxattr (frame=0x7facdb37187c, this=0x7faccc0117e0, loc=0x7facbc000c80, name=0x7facbc0103a0 "replica.split-brain-status", xdata=<value optimized out>) at defaults.c:1970 #14 0x00007faccaa9ef6d in svc_getxattr (frame=0x7facdb37187c, this=<value optimized out>, loc=0x7facbc000c80, name=0x7facbc0103a0 "replica.split-brain-status", xdata=0x0) at snapview-client.c:888 #15 0x00007facca883715 in io_stats_getxattr (frame=0x7facdb371d30, this=0x7faccc015150, loc=0x7facbc000c80, name=0x7facbc0103a0 "replica.split-brain-status", xdata=0x0) at io-stats.c:2289 #16 0x00007facdd7bef3b in default_getxattr (frame=0x7facdb371d30, this=0x7faccc016690, loc=0x7facbc000c80, name=0x7facbc0103a0 "replica.split-brain-status", xdata=<value optimized out>) at defaults.c:1970 #17 0x00007facd4adf5c8 in fuse_getxattr_resume (state=0x7facbc000c60) at fuse-bridge.c:3425 #18 0x00007facd4ad35f5 in fuse_resolve_done (state=<value optimized out>) at fuse-resolve.c:644 #19 fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:671 #20 0x00007facd4ad3326 in fuse_resolve (state=0x7facbc000c60) at fuse-resolve.c:635 #21 0x00007facd4ad363e in fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:667 #22 0x00007facd4ad36a3 in fuse_resolve_continue (state=<value optimized out>) at fuse-resolve.c:687 #23 0x00007facd4ad38a1 in fuse_resolve_gfid_cbk (frame=<value optimized out>, cookie=<value optimized out>, this=0x7facdf78a690, op_ret=0, op_errno=<value optimized out>, inode=<value optimized out>, buf=0x7facc9b688f8, xattr=0x7facdad6ac74, postparent=0x7facc9b68b28) at fuse-resolve.c:169 #24 0x00007facca8904d3 in io_stats_lookup_cbk (frame=0x7facdb371d30, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=117, inode=0x7facc8d091a4, buf=0x7facc9b688f8, xdata=0x7facdad6ac74, postparent=0x7facc9b68b28) at io-stats.c:1512 #25 0x00007faccaaa6138 in svc_lookup_cbk (frame=0x7facdb37187c, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=117, inode=0x7facc8d091a4, buf=0x7facc9b688f8, ---Type <return> to continue, or q <return> to quit--- xdata=0x7facdad6ac74, postparent=0x7facc9b68b28) at snapview-client.c:371 #26 0x00007faccaeb8554 in qr_lookup_cbk (frame=0x7facdb371474, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=117, inode_ret=0x7facc8d091a4, buf=0x7facc9b688f8, xdata=0x7facdad6ac74, postparent=0x7facc9b68b28) at quick-read.c:446 #27 0x00007faccb0c4d45 in ioc_lookup_cbk (frame=0x7facdb3711c4, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=117, inode=0x7facc8d091a4, stbuf=0x7facc9b688f8, xdata=0x7facdad6ac74, postparent=0x7facc9b68b28) at io-cache.c:260 #28 0x00007faccb927d40 in dht_discover_complete (this=<value optimized out>, discover_frame=<value optimized out>) at dht-common.c:306 #29 0x00007faccb9323bc in dht_discover_cbk (frame=0x7facdb1a56d4, cookie=0x7facdb37131c, this=0x7faccc00a0f0, op_ret=<value optimized out>, op_errno=117, inode=0x7facc8d091a4, stbuf=0x7facbc011198, xattr=0x7facdad6ac74, postparent=0x7facbc011208) at dht-common.c:441 #30 0x00007faccbba2fd5 in afr_discover_done (frame=<value optimized out>, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, inode=<value optimized out>, buf=0x7facd0a93ab0, xdata=0x7facdad6ac74, postparent=0x7facd0a93a40) at afr-common.c:2115 #31 afr_discover_cbk (frame=<value optimized out>, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, inode=<value optimized out>, buf=0x7facd0a93ab0, xdata=0x7facdad6ac74, postparent=0x7facd0a93a40) at afr-common.c:2150 #32 0x00007faccbde00f7 in client3_3_lookup_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7facdb371e88) at client-rpc-fops.c:2978 #33 0x00007facdd584455 in rpc_clnt_handle_reply (clnt=0x7faccc0bdc60, pollin=0x7faccc0cf430) at rpc-clnt.c:766 #34 0x00007facdd585981 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x7faccc0bdc90, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7faccc0cf430) at rpc-clnt.c:907 #35 0x00007facdd580ad8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:543 #36 0x00007facd20bb265 in socket_event_poll_in (this=0x7faccc0cd7f0) at socket.c:2290 #37 0x00007facd20bce3d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7faccc0cd7f0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2403 #38 0x00007facdd81ab20 in event_dispatch_epoll_handler (data=0x7facdf7c37a0) at event-epoll.c:575 #39 event_dispatch_epoll_worker (data=0x7facdf7c37a0) at event-epoll.c:678 #40 0x00007facdc8a0a51 in start_thread () from /lib64/libpthread.so.0 #41 0x00007facdc20a9ad in clone () from /lib64/libc.so.6 Version-Release number of selected component (if applicable): ------------------------------------------------------------- glusterfs-3.7.1-14.el6.x86_64 How reproducible: ----------------- Always Steps to Reproduce: ------------------- 1. Create a replicate volume and mount it on a client via fuse. 2. Run the following command to get the value of attribute `replica.split-brain-status' of a file. Actual results: --------------- The mount was found to hang. Expected results: ----------------- The mount should not hang.
REVIEW: http://review.gluster.org/12163 (afr : get split-brain-status in a synctask) posted (#1) for review on master by Anuradha Talur (atalur)
REVIEW: http://review.gluster.org/12163 (afr : get split-brain-status in a synctask) posted (#2) for review on master by Anuradha Talur (atalur)
REVIEW: http://review.gluster.org/12163 (afr : get split-brain-status in a synctask) posted (#3) for review on master by Anuradha Talur (atalur)
REVIEW: http://review.gluster.org/12163 (afr : get split-brain-status in a synctask) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/12169 (afr: perform replace-brick in a synctask) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/12169 (afr: perform replace-brick in a synctask) posted (#3) for review on master by Ravishankar N (ravishankar)
REVIEW: http://review.gluster.org/12169 (afr: perform replace-brick in a synctask) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/12169 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 22882a9b810c4d5c7a7be0f1fac7069422d08b57 Author: Ravishankar N <ravishankar> Date: Mon Sep 14 15:43:31 2015 +0530 afr: perform replace-brick in a synctask Problem: replace-brick setxattr is not performed inside a synctask. This can lead to hangs if the setxattr is executed by epoll thread, as the epoll thread will be waiting for replies to come where as epoll thread is the thread that needs to epoll_ctl for reading from socket and listen. Fix: Move replace-brick to synctask to prevent epoll thread hang. This patch is in line with the fix performed in http://review.gluster.org/#/c/12163/ Change-Id: I6a71038bb7819f9e98f7098b18a6cee34805868f BUG: 1262345 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: http://review.gluster.org/12169 Reviewed-by: Pranith Kumar Karampuri <pkarampu> Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user