Description of problem: While doing rm -rf on master, the slave glusterfs process crashed in io-cache. Once this crash happens, every time the worker spawns it crashes on the slave glusterfs process. bt in log-file >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-05-08 10:48:21.638483] W [fuse-bridge.c:1628:fuse_err_cbk] 0-glusterfs-fuse: 6: MKDIR() /level00 => -1 (File exists) [2014-05-08 10:48:21.641337] E [dht-helper.c:1144:dht_inode_ctx_get] (-->/usr/lib64/glusterfs/3.4.0.60rhs/xlator/cluster/distribute.so(dht_lookup_selfheal_cbk+0x1d6) [0x7f0c9ce7aab6] (-->/usr/lib64/glusterfs/3.4.0.60rhs/xlator/cluster/distribute.so(dht_layout_set+0x4e) [0x7f0c9ce6260e] (-->/usr/lib64/glusterfs/3.4.0.60rhs/xlator/cluster/distribute.so(dht_inode_ctx_layout_get+0x1b) [0x7f0c9ce73a8b]))) 0-slave-dht: invalid argument: inode [2014-05-08 10:48:21.641404] E [dht-helper.c:1144:dht_inode_ctx_get] (-->/usr/lib64/glusterfs/3.4.0.60rhs/xlator/cluster/distribute.so(dht_lookup_selfheal_cbk+0x1d6) [0x7f0c9ce7aab6] (-->/usr/lib64/glusterfs/3.4.0.60rhs/xlator/cluster/distribute.so(dht_layout_set+0x63) [0x7f0c9ce62623] (-->/usr/lib64/glusterfs/3.4.0.60rhs/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34) [0x7f0c9ce62de4]))) 0-slave-dht: invalid argument: inode [2014-05-08 10:48:21.641452] E [dht-helper.c:1163:dht_inode_ctx_set] (-->/usr/lib64/glusterfs/3.4.0.60rhs/xlator/cluster/distribute.so(dht_lookup_selfheal_cbk+0x1d6) [0x7f0c9ce7aab6] (-->/usr/lib64/glusterfs/3.4.0.60rhs/xlator/cluster/distribute.so(dht_layout_set+0x63) [0x7f0c9ce62623] (-->/usr/lib64/glusterfs/3.4.0.60rhs/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52) [0x7f0c9ce62e02]))) 0-slave-dht: invalid argument: inode pending frames: frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2014-05-08 10:48:21configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.60rhs /lib64/libc.so.6[0x3584a329a0] /lib64/libpthread.so.0(pthread_spin_lock+0x0)[0x358520c380] /usr/lib64/glusterfs/3.4.0.60rhs/xlator/performance/io-cache.so(ioc_lookup_cbk+0x87)[0x7f0c9c837e07] /usr/lib64/glusterfs/3.4.0.60rhs/xlator/cluster/distribute.so(dht_lookup_selfheal_cbk+0x17b)[0x7f0c9ce7aa5b] /usr/lib64/glusterfs/3.4.0.60rhs/xlator/cluster/distribute.so(dht_selfheal_dir_finish+0x20)[0x7f0c9ce6ae60] /usr/lib64/glusterfs/3.4.0.60rhs/xlator/cluster/distribute.so(dht_selfheal_directory_for_nameless_lookup+0x3ff)[0x7f0c9ce6c71f] /usr/lib64/glusterfs/3.4.0.60rhs/xlator/cluster/distribute.so(dht_discover_cbk+0x273)[0x7f0c9ce896a3] /usr/lib64/glusterfs/3.4.0.60rhs/xlator/cluster/replicate.so(afr_lookup_cbk+0x558)[0x7f0c9d0ffb58] /usr/lib64/glusterfs/3.4.0.60rhs/xlator/protocol/client.so(client3_3_lookup_cbk+0x633)[0x7f0c9d33ca33] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7f0ca1b8bf45] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x147)[0x7f0ca1b8d507] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7f0ca1b88d88] /usr/lib64/glusterfs/3.4.0.60rhs/rpc-transport/socket.so(+0x8dc6)[0x7f0c9e178dc6] /usr/lib64/glusterfs/3.4.0.60rhs/rpc-transport/socket.so(+0xa6dd)[0x7f0c9e17a6dd] /usr/lib64/libglusterfs.so.0(+0x62457)[0x7f0ca1df8457] /usr/sbin/glusterfs(main+0x6c7)[0x4069d7] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3584a1ed1d] /usr/sbin/glusterfs[0x404619] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version-Release number of selected component (if applicable): glusterfs-3.4.0.60rhs-1.el6rhs.x86_64 How reproducible: Happened once. Steps to Reproduce: 1.create and start a geo-rep relationship between master and slave. 2.create some 100K files on master over 10x10 directory. use crefi "crefi T 10 -n 100 --multi -b 10 -d 10 --random --min=1K --max=10K /mnt/master" 3.let them sync to slave. 4. then run rm -rf on master mount point Actual results: some of the files failed to remove from slave and slave glusterfs process crashed. Expected results: It shouldn't fail to remove files and also shouldn't crash Additional info: bt from core >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Core was generated by `/usr/sbin/glusterfs --aux-gfid-mount --log-file=/var/log/glusterfs/geo-replicat'. Program terminated with signal 11, Segmentation fault. #0 0x000000358520c380 in pthread_spin_lock () from /lib64/libpthread.so.0 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-15.el6_5.1.x86_64 libcom_err-1.41.12-18.el6.x86_64 libgcc-4.4.7-4.el6.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64 openssl-1.0.1e-16.el6_5.4.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x000000358520c380 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00007f71a302de07 in ioc_lookup_cbk (frame=0x7f71a73ea4e0, cookie=<value optimized out>, this=0x1a9afd0, op_ret=0, op_errno=2, inode=0x0, stbuf=0x7f71a28fc0c4, xdata=0x0, postparent=0x7f71a28fc2f4) at io-cache.c:207 #2 0x00007f71a3670a5b in dht_lookup_selfheal_cbk (frame=0x7f71a73ea02c, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, xdata=<value optimized out>) at dht-common.c:141 #3 0x00007f71a3660e60 in dht_selfheal_dir_finish (frame=<value optimized out>, this=<value optimized out>, ret=<value optimized out>) at dht-selfheal.c:72 #4 0x00007f71a366271f in dht_selfheal_dir_xattr_for_nameless_lookup (frame=0x7f71a73ea02c, dir_cbk=<value optimized out>, loc=<value optimized out>, layout=0x7f7194002240) at dht-selfheal.c:416 #5 dht_selfheal_directory_for_nameless_lookup (frame=0x7f71a73ea02c, dir_cbk=<value optimized out>, loc=<value optimized out>, layout=0x7f7194002240) at dht-selfheal.c:1174 #6 0x00007f71a367f6a3 in dht_discover_cbk (frame=0x7f71a721fc18, cookie=0x7f71a73ea6e4, this=0x1a97c60, op_ret=<value optimized out>, op_errno=2, inode=0x0, stbuf=0x7f71a22e885c, xattr=0x0, postparent=0x7f71a22e88cc) at dht-common.c:341 #7 0x00007f71a38f5b58 in afr_lookup_done (frame=0x7f71a73ea6e4, cookie=0x1, this=0x1a971d0, op_ret=<value optimized out>, op_errno=2, inode=0x7f71a1391164, buf=0x7fff787383e0, xattr=0x0, postparent=0x7fff78738370) at afr-common.c:2220 #8 afr_lookup_cbk (frame=0x7f71a73ea6e4, cookie=0x1, this=0x1a971d0, op_ret=<value optimized out>, op_errno=2, inode=0x7f71a1391164, buf=0x7fff787383e0, xattr=0x0, postparent=0x7fff78738370) at afr-common.c:2451 #9 0x00007f71a3b32a33 in client3_3_lookup_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7f71a73ea83c) at client-rpc-fops.c:2610 #10 0x00007f71a8381f45 in rpc_clnt_handle_reply (clnt=0x1ac1ee0, pollin=0x1a8c600) at rpc-clnt.c:773 #11 0x00007f71a8383507 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1ac1f10, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:906 #12 0x00007f71a837ed88 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:512 #13 0x00007f71a496edc6 in socket_event_poll_in (this=0x1ad1970) at socket.c:2119 #14 0x00007f71a49706dd in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x1ad1970, poll_in=1, poll_out=0, poll_err=0) at socket.c:2229 #15 0x00007f71a85ee457 in event_dispatch_epoll_handler (event_pool=0x1a52ee0) at event-epoll.c:384 #16 event_dispatch_epoll (event_pool=0x1a52ee0) at event-epoll.c:445 #17 0x00000000004069d7 in main (argc=7, argv=0x7fff7873a088) at glusterfsd.c:2050 (gdb) f o No symbol "o" in current context. (gdb) f 0 #0 0x000000358520c380 in pthread_spin_lock () from /lib64/libpthread.so.0 (gdb) f 1 #1 0x00007f71a302de07 in ioc_lookup_cbk (frame=0x7f71a73ea4e0, cookie=<value optimized out>, this=0x1a9afd0, op_ret=0, op_errno=2, inode=0x0, stbuf=0x7f71a28fc0c4, xdata=0x0, postparent=0x7f71a28fc2f4) at io-cache.c:207 207 LOCK (&inode->lock); (gdb) f 2 #2 0x00007f71a3670a5b in dht_lookup_selfheal_cbk (frame=0x7f71a73ea02c, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, xdata=<value optimized out>) at dht-common.c:141 141 DHT_STACK_UNWIND (lookup, frame, ret, local->op_errno, local->inode, (gdb) f 3 #3 0x00007f71a3660e60 in dht_selfheal_dir_finish (frame=<value optimized out>, this=<value optimized out>, ret=<value optimized out>) at dht-selfheal.c:72 72 local->selfheal.dir_cbk (frame, NULL, frame->this, ret, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Removing blocker+ and rhs-3.0.0+ as this bug is not relevant to RHS 3.0.
verified on the build 3.4.0.65rhs
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html