Description of problem: In a striped-replicated volume, I was untarring the Linux kernel on the mountpoint. Then when I ran ls on it, fuse client crashed. Version-Release number of selected component (if applicable): glusterfs-3.3.0qa40 How reproducible: Consistent Steps to Reproduce: 1. Create and start a striped-replicated-volume. (or distributed-striped-replicated volume). 2. Now do a fuse mount and untar the Linux kernel on the mountpoint. 3. Run ls after or even during the untarring. Actual results: Fuse client crashed with following back trace. Core was generated by `/usr/local/sbin/glusterfs --volfile-id=hosdu --volfile-server=172.17.251.63 /mn'. Program terminated with signal 11, Segmentation fault. #0 0x00000035bbc7e3e6 in __strcmp_sse2 () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.9.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6_2.3.x86_64 zlib-1.2.3-27.el6.x86_64 (gdb) bt #0 0x00000035bbc7e3e6 in __strcmp_sse2 () from /lib64/libc.so.6 #1 0x00007fc20439e1a3 in afr_lookup (frame=0x7fc207bdd744, this=0x1c71e90, loc=0x7fffca8662b0, xattr_req=0x30026ec) at afr-common.c:2122 #2 0x00007fc204128452 in stripe_readdirp_cbk (frame=0x7fc207bdb9b4, cookie=0x7fc207bdb85c, this=0x1c748d0, op_ret=4, op_errno=2, orig_entries=0x7fffca866500, xdata=0x0) at stripe.c:4013 #3 0x00007fc204346366 in afr_readdirp_cbk (frame=0x7fc207bdb85c, cookie=0x1, this=0x1c71e90, op_ret=4, op_errno=2, entries=0x7fffca866500, xdata=0x0) at afr-dir-read.c:626 #4 0x00007fc2045e55ab in client3_1_readdirp_cbk (req=0x7fc1fc47bee8, iov=0x7fc1fc47bf28, count=1, myframe=0x7fc207bdb500) at client3_1-fops.c:2311 #5 0x00007fc208b84a48 in rpc_clnt_handle_reply (clnt=0x1cfc8b0, pollin=0x2f84940) at rpc-clnt.c:797 #6 0x00007fc208b84de5 in rpc_clnt_notify (trans=0x1d0c440, mydata=0x1cfc8e0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2f84940) at rpc-clnt.c:916 #7 0x00007fc208b80ec8 in rpc_transport_notify (this=0x1d0c440, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2f84940) at rpc-transport.c:498 #8 0x00007fc20542d280 in socket_event_poll_in (this=0x1d0c440) at socket.c:1686 #9 0x00007fc20542d804 in socket_event_handler (fd=13, idx=6, data=0x1d0c440, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801 #10 0x00007fc208ddbc48 in event_dispatch_epoll_handler (event_pool=0x1c55500, events=0x1c63560, i=0) at event.c:794 #11 0x00007fc208ddbe6b in event_dispatch_epoll (event_pool=0x1c55500) at event.c:856 #12 0x00007fc208ddc1f6 in event_dispatch (event_pool=0x1c55500) at event.c:956 #13 0x00000000004082a4 in main (argc=4, argv=0x7fffca866b68) at glusterfsd.c:1652 (gdb) f 1 #1 0x00007fc20439e1a3 in afr_lookup (frame=0x7fc207bdd744, this=0x1c71e90, loc=0x7fffca8662b0, xattr_req=0x30026ec) at afr-common.c:2122 2122 if (!strcmp (loc->path, "/" GF_REPLICATE_TRASH_DIR)) { (gdb) p loc->path $1 = 0x0 (gdb) f 2 #2 0x00007fc204128452 in stripe_readdirp_cbk (frame=0x7fc207bdb9b4, cookie=0x7fc207bdb85c, this=0x1c748d0, op_ret=4, op_errno=2, orig_entries=0x7fffca866500, xdata=0x0) at stripe.c:4013 4013 STACK_WIND (local_frame, stripe_readdirp_lookup_cbk, (gdb) Expected results: glusterfs client should not crash. Additional info: I have archived the log files and core.
Same thing happened to me in geo-rep testing with dist-striped-replicate volume. All the glusterfs went down. It resulted in all the aux mount crash and geo-rep status going to faulty state. this is the backtrace in the log file. [2012-05-14 00:20:33.462141] I [afr-common.c:1971:afr_set_root_inode_on_first_lookup] 0-doa-replicate-2: added root inode [2012-05-14 00:20:33.462201] I [afr-common.c:1971:afr_set_root_inode_on_first_lookup] 0-doa-replicate-3: added root inode [2012-05-14 00:28:38.071932] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed [2012-05-14 00:28:40.678708] I [glusterfsd-mgmt.c:1565:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing pending frames: frame : type(1) op(READDIR) frame : type(1) op(READDIR) frame : type(1) op(READDIR) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2012-05-14 00:54:27 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.3.0qa40 /lib64/libc.so.6[0x39db832900] /usr/lib64/glusterfs/3.3.0qa40/xlator/cluster/replicate.so(afr_lookup+0xa5)[0x7f30115ceac5] /usr/lib64/glusterfs/3.3.0qa40/xlator/cluster/stripe.so(stripe_readdirp_cbk+0x536)[0x7f301136b346] /usr/lib64/glusterfs/3.3.0qa40/xlator/cluster/replicate.so(afr_readdirp_cbk+0x1ca)[0x7f301158a69a] /usr/lib64/glusterfs/3.3.0qa40/xlator/protocol/client.so(client3_1_readdirp_cbk+0x170)[0x7f3011803b00] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x306240f302] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb6)[0x306240f516] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x27)[0x306240ae17] /usr/lib64/glusterfs/3.3.0qa40/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x7f3012648c8f] /usr/lib64/glusterfs/3.3.0qa40/rpc-transport/socket.so(socket_event_handler+0x188)[0x7f3012648e38] /usr/lib64/libglusterfs.so.0[0x3061c3e941] /usr/sbin/glusterfs(main+0x502)[0x4066c2] /lib64/libc.so.6(__libc_start_main+0xfd)[0x39db81ecdd] /usr/sbin/glusterfs[0x404349]
Blocking many test cases. Moving the severity to high.
please see if the patch http://review.gluster.com/3325 fixes the issue, and continue your tests with the patch included.
With the patch applied, I didn't see any crash. But client got hung few times and I see a lot of below warnings in the client log file. [2012-05-14 06:30:56.504861] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.505110] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.508427] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.508720] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.513763] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.514073] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.514676] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.514935] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.515498] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.515787] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.516336] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.516614] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.517262] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.517584] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.518198] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--) [2012-05-14 06:30:56.518618] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--)
CHANGE: http://review.gluster.com/3325 (cluster/replicate: check for 'loc->path' before dereferencing it) merged in master by Anand Avati (avati)
Please verify the crash issue.
CHANGE: http://review.gluster.com/3374 (cluster/afr: Assign gfid path if path is NULL in lookup) merged in release-3.3 by Vijay Bellur (vijay)
On glusterfs-3.3.0qa42 , there are no crashes and no hangs on clients .