Description of problem: ======================= GlusterFS client crashed generating below bt when the below steps are performed (gdb) bt #0 0x00007fcb2c58660b in transit_state_mb (pstate=<optimized out>, pstate=<optimized out>, mctx=0x7fcb20be5470) at regexec.c:2530 #1 transit_state (state=0x7fcb1ec9c770, mctx=0x7fcb20be5470, err=0x7fcb20be5420) at regexec.c:2285 #2 check_matching (p_match_first=0x7fcb20be5410, fl_longest_match=1, mctx=0x7fcb20be5470) at regexec.c:1171 #3 re_search_internal (preg=preg@entry=0x7fcb14071ba8, string=string@entry=0x7fcae6b9f138 "..", length=2, start=<optimized out>, start@entry=0, range=0, stop=<optimized out>, nmatch=<optimized out>, pmatch=0x7fcb20be55d0, eflags=0) at regexec.c:842 #4 0x00007fcb2c58c1f5 in __regexec (preg=0x7fcb14071ba8, string=0x7fcae6b9f138 "..", nmatch=<optimized out>, pmatch=0x7fcb20be55d0, eflags=<optimized out>) at regexec.c:250 #5 0x00007fcb1bb288c9 in dht_munge_name (original=original@entry=0x7fcae6b9f138 "..", modified=modified@entry=0x7fcb20be5640 ".", len=len@entry=3, re=re@entry=0x7fcb14071ba8) at dht-hashfn.c:49 #6 0x00007fcb1bb28ace in dht_hash_compute (this=this@entry=0x7fcad57aec20, type=0, name=name@entry=0x7fcae6b9f138 "..", hash_p=hash_p@entry=0x7fcb20be56f4) at dht-hashfn.c:86 #7 0x00007fcb1bb08c56 in dht_layout_search (this=0x7fcad57aec20, layout=0x7fcb1c4ebe30, name=0x7fcae6b9f138 "..") at dht-layout.c:166 #8 0x00007fcb1bb311bb in dht_readdirp_cbk (frame=0x7fcb2b8e24e4, cookie=0x7fcb2b8e0714, this=0x7fcad57aec20, op_ret=2, op_errno=2, orig_entries=0x7fcb20be58f0, xdata=0x0) at dht-common.c:4780 #9 0x00007fcb1bd8f97c in afr_readdir_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=2, op_errno=2, subvol_entries=<optimized out>, xdata=0x0) at afr-dir-read.c:234 #10 0x00007fcb201ae7a1 in client3_3_readdirp_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7fcb2b8de028) at client-rpc-fops.c:2650 #11 0x00007fcb2dbcd680 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fcb15a900d0, pollin=pollin@entry=0x7fcb1ee30a20) at rpc-clnt.c:791 #12 0x00007fcb2dbcd95f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7fcb15a90100, event=<optimized out>, data=0x7fcb1ee30a20) at rpc-clnt.c:962 #13 0x00007fcb2dbc9883 in rpc_transport_notify (this=this@entry=0x7fcad6c09480, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fcb1ee30a20) at rpc-transport.c:537 #14 0x00007fcb2248eec4 in socket_event_poll_in (this=this@entry=0x7fcad6c09480) at socket.c:2267 #15 0x00007fcb22491375 in socket_event_handler (fd=<optimized out>, idx=46, data=0x7fcad6c09480, poll_in=1, poll_out=0, poll_err=0) at socket.c:2397 #16 0x00007fcb2de5d3b0 in event_dispatch_epoll_handler (event=0x7fcb20be5e80, event_pool=0x7fcb30174f00) at event-epoll.c:571 #17 event_dispatch_epoll_worker (data=0x7fcb301bf8a0) at event-epoll.c:674 #18 0x00007fcb2cc64dc5 in start_thread (arg=0x7fcb20be6700) at pthread_create.c:308 #19 0x00007fcb2c5a973d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Version-Release number of selected component (if applicable): 3.8.4-5.el7rhgs.x86_64 Steps to Reproduce: =================== 1) Create a Distriuted-Repicate volume and start it. 2) FUSE mount the volume on multiple clients. 3) Start creating a big file from one client and do continuous lookups from other clients (find, stat *, ls -lRt) "dd if=/dev/urandom of=BIG bs=1024k count=10000" 4) With Step-3 running, Identify on which bricks the file is actually stored and remove those bricks. The client running continuous "find" commands got crashed. The volume got unmounted and seen "Transport endpoint is not connected" errors as below find: failed to restore initial working directory: Transport endpoint is not connected find: ‘.’: Transport endpoint is not connected find: failed to restore initial working directory: Transport endpoint is not connected find: ‘.’: Transport endpoint is not connected FUSE mount logs: ================ [2016-11-28 07:14:47.325198] I [MSGID: 109086] [dht-shared.c:297:dht_parse_decommissioned_bricks] 25-distrep-dht: decommissioning subvolume distrep-replicate-2 pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(READDIRP) frame : type(1) op(OPENDIR) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2016-11-28 07:14:47 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.8.4 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7fcb2de03bd2] /lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7fcb2de0d654] /lib64/libc.so.6(+0x35250)[0x7fcb2c4e7250] /lib64/libc.so.6(+0xd460b)[0x7fcb2c58660b] /lib64/libc.so.6(regexec+0xc5)[0x7fcb2c58c1f5] /usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so(+0x258c9)[0x7fcb1bb288c9] /usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so(+0x25ace)[0x7fcb1bb28ace] /usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so(+0x5c56)[0x7fcb1bb08c56] /usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so(+0x2e1bb)[0x7fcb1bb311bb] /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x697c)[0x7fcb1bd8f97c] /usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so(+0x207a1)[0x7fcb201ae7a1] /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7fcb2dbcd680] /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1df)[0x7fcb2dbcd95f] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fcb2dbc9883] /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x6ec4)[0x7fcb2248eec4] /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x9375)[0x7fcb22491375] /lib64/libglusterfs.so.0(+0x833b0)[0x7fcb2de5d3b0] /lib64/libpthread.so.0(+0x7dc5)[0x7fcb2cc64dc5] /lib64/libc.so.6(clone+0x6d)[0x7fcb2c5a973d] --------- Actual results: =============== Client crashed. Expected results: ================= There should not be any crashes.
upstream patch http://review.gluster.org/15945 posted for review.
master: http://review.gluster.org/#/c/15945/ release-3.8 : http://review.gluster.org/#/c/15793/ release-3.9 : http://review.gluster.org/#/c/15949/ downstream patch : https://code.engineering.redhat.com/gerrit/92555
Repeated the steps in the description thrice with glusterfs version 3.8.4-10.el7rhgs.x86_64 and client crashes are not seen. Hence, moving this BZ to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html