Description of problem: ======================= While running the fop (rm -rf) on geo-rep master setup found lots of crashes on slave side with bt: (gdb) bt #0 0x00007f570287bc00 in dht_rmdir_do (frame=frame@entry=0x7f570e68b1d0, this=this@entry=0x7f56fc00e5e0) at dht-common.c:7944 #1 0x00007f570287c4ab in dht_rmdir_cached_lookup_cbk (frame=frame@entry=0x7f570e68a06c, cookie=<optimized out>, this=0x7f56fc00e5e0, op_ret=0, op_errno=<optimized out>, inode=<optimized out>, stbuf=stbuf@entry=0x7f56f0021410, xattr=0x7f570de29e88, parent=0x7f56f0021480) at dht-common.c:8137 #2 0x00007f5702b13056 in afr_lookup_done (frame=frame@entry=0x7f570e68ac04, this=this@entry=0x7f56fc00d670) at afr-common.c:2167 #3 0x00007f5702b13a04 in afr_lookup_metadata_heal_check (frame=frame@entry=0x7f570e68ac04, this=0x7f56fc00d670, this@entry=0x8072f15dde9c0700) at afr-common.c:2410 #4 0x00007f5702b14331 in afr_lookup_entry_heal (frame=frame@entry=0x7f570e68ac04, this=0x8072f15dde9c0700, this@entry=0x7f56fc00d670) at afr-common.c:2501 #5 0x00007f5702b1469d in afr_lookup_cbk (frame=frame@entry=0x7f570e68ac04, cookie=<optimized out>, this=0x7f56fc00d670, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7f56fa6af20c, buf=buf@entry=0x7f56fb48e940, xdata=0x7f570de2a138, postparent=postparent@entry=0x7f56fb48e9b0) at afr-common.c:2549 #6 0x00007f5702d515dd in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f570e68b0fc) at client-rpc-fops.c:2945 #7 0x00007f571097a860 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f56fc090970, pollin=pollin@entry=0x7f56f0004e20) at rpc-clnt.c:794 #8 0x00007f571097ab4f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f56fc0909a0, event=<optimized out>, data=0x7f56f0004e20) at rpc-clnt.c:987 #9 0x00007f57109769f3 in rpc_transport_notify (this=this@entry=0x7f56fc0a0690, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f56f0004e20) at rpc-transport.c:538 #10 0x00007f570523b314 in socket_event_poll_in (this=this@entry=0x7f56fc0a0690) at socket.c:2272 #11 0x00007f570523d7c5 in socket_event_handler (fd=<optimized out>, idx=1, data=0x7f56fc0a0690, poll_in=1, poll_out=0, poll_err=0) at socket.c:2402 #12 0x00007f5710c0a770 in event_dispatch_epoll_handler (event=0x7f56fb48ee80, event_pool=0x7f571156de10) at event-epoll.c:571 #13 event_dispatch_epoll_worker (data=0x7f56fc07bbf0) at event-epoll.c:674 #14 0x00007f570fa11dc5 in start_thread () from /lib64/libpthread.so.0 #15 0x00007f570f35673d in clone () from /lib64/libc.so.6 (gdb) Version-Release number of selected component (if applicable): ============================================================= glusterfs-server-3.8.4-18.2.el7rhgs.x86_64 How reproducible: ================= Always Steps to Reproduce: =================== 1. Setup geo-rep between master and slave 2. Create data on master 3. Perform rm -rf on master Actual results: =============== Multiple fs process crashed
Verified the same case with the build: glusterfs-3.8.4-18.4.el7rhgs.x86_64 No core is observed at slave and the sync is completed for all fops including rmdir. Moving this bug to verified state...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1418