+++ This bug was initially created as a clone of Bug #864401 +++ Description of problem: 2x2 distributed replicate volume. ran some tests on fuse and nfs mounts. unmounted nfs mount. was cded into fuse mount and the mount point was idle for 3-4 days. Later when mount point was accessed the fuse client process asserted with the below backtrace. Core was generated by `/usr/local/sbin/glusterfs --volfile-id=mirror --volfile-server=10.70.36.4 /mnt/'. Program terminated with signal 6, Aborted. #0 0x0000003a68432885 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.12.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6_2.5.x86_64 zlib-1.2.3-27.el6.x86_64 (gdb) bt #0 0x0000003a68432885 in raise () from /lib64/libc.so.6 #1 0x0000003a68434065 in abort () from /lib64/libc.so.6 #2 0x0000003a6842b9fe in __assert_fail_base () from /lib64/libc.so.6 #3 0x0000003a6842bac0 in __assert_fail () from /lib64/libc.so.6 #4 0x00007f99a8e7d68f in afr_lookup_update_lk_counts (local=0x7f99a390ec40, this=0xb3ddb0, child_index=1, xattr=0x0) at ../../../../../xlators/cluster/afr/src/afr-common.c:1122 #5 0x00007f99a8e80520 in afr_lookup_handle_success (local=0x7f99a390ec40, this=0xb3ddb0, child_index=1, op_ret=0, op_errno=0, inode=0x7f99a2a8304c, buf=0x7fffe9b6d450, xattr=0x0, postparent=0x7fffe9b6d3e0) at ../../../../../xlators/cluster/afr/src/afr-common.c:2005 #6 0x00007f99a8e8062d in afr_lookup_cbk (frame=0x7f99ac6c7cf8, cookie=0x1, this=0xb3ddb0, op_ret=0, op_errno=0, inode=0x7f99a2a8304c, buf=0x7fffe9b6d450, xattr=0x0, postparent=0x7fffe9b6d3e0) at ../../../../../xlators/cluster/afr/src/afr-common.c:2034 #7 0x00007f99a90ca44a in client3_1_lookup_cbk (req=0x7f99a327b4dc, iov=0x7f99a327b51c, count=1, myframe=0x7f99ac6c845c) at ../../../../../xlators/protocol/client/src/client3_1-fops.c:2636 #8 0x00007f99ad665a3a in rpc_clnt_handle_reply (clnt=0xb73b90, pollin=0x1b84430) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:786 #9 0x00007f99ad665dd7 in rpc_clnt_notify (trans=0xb83720, mydata=0xb73bc0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1b84430) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:905 #10 0x00007f99ad661ec8 in rpc_transport_notify (this=0xb83720, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1b84430) at ../../../../rpc/rpc-lib/src/rpc-transport.c:489 #11 0x00007f99a9f0f250 in socket_event_poll_in (this=0xb83720) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1677 #12 0x00007f99a9f0f7d4 in socket_event_handler (fd=13, idx=4, data=0xb83720, poll_in=1, poll_out=0, poll_err=0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1792 #13 0x00007f99ad8bcd78 in event_dispatch_epoll_handler (event_pool=0xb25510, events=0xb335e0, i=0) at ../../../libglusterfs/src/event.c:785 #14 0x00007f99ad8bcf9b in event_dispatch_epoll (event_pool=0xb25510) at ../../../libglusterfs/src/event.c:847 #15 0x00007f99ad8bd326 in event_dispatch (event_pool=0xb25510) at ../../../libglusterfs/src/event.c:947 #16 0x00000000004085a7 in main (argc=4, argv=0x7fffe9b6db48) at ../../../glusterfsd/src/glusterfsd.c:1689 (gdb) f 4 #4 0x00007f99a8e7d68f in afr_lookup_update_lk_counts (local=0x7f99a390ec40, this=0xb3ddb0, child_index=1, xattr=0x0) at ../../../../../xlators/cluster/afr/src/afr-common.c:1122 1122 GF_ASSERT (xattr); (gdb) p xattr $1 = (dict_t *) 0x0 (gdb) f 7 #7 0x00007f99a90ca44a in client3_1_lookup_cbk (req=0x7f99a327b4dc, iov=0x7f99a327b51c, count=1, myframe=0x7f99ac6c845c) at ../../../../../xlators/protocol/client/src/client3_1-fops.c:2636 2636 CLIENT_STACK_UNWIND (lookup, frame, rsp.op_ret, rsp.op_errno, inode, (gdb) l 2631 else 2632 gf_log (this->name, GF_LOG_TRACE, "not found on remote node"); 2633 2634 } 2635 2636 CLIENT_STACK_UNWIND (lookup, frame, rsp.op_ret, rsp.op_errno, inode, 2637 &stbuf, xdata, &postparent); 2638 2639 if (xdata) 2640 dict_unref (xdata); (gdb) p xdata $2 = (dict_t *) 0x0 (gdb) Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: glusterfs client asserted due to NULL xattr Expected results: glusterfs client should not crash Additional info: gluster volume info Volume Name: mirror Type: Distributed-Replicate Volume ID: c059eea3-74a5-496e-bd17-db651fb910c6 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.36.3:/mnt/export/mirror Brick2: 10.70.36.4:/mnt/export/mirror Brick3: 10.70.36.3:/mnt/export2/mirror Brick4: 10.70.36.4:/mnt/export2/mirror [2012-10-03 21:26:25.960328] W [client3_1-fops.c:474:client3_1_stat_cbk] 0-mirror-client-0: remote operation failed: No such file or directory [2012-10-03 21:26:25.960537] W [client3_1-fops.c:474:client3_1_stat_cbk] 0-mirror-client-3: remote operation failed: No such file or directory [2012-10-03 21:26:25.960723] W [client3_1-fops.c:474:client3_1_stat_cbk] 0-mirror-client-1: remote operation failed: No such file or directory [2012-10-03 21:26:25.960890] W [client3_1-fops.c:474:client3_1_stat_cbk] 0-mirror-client-2: remote operation failed: No such file or directory [2012-10-07 03:16:02.424353] I [glusterfsd.c:889:reincarnate] 0-glusterfsd: Fetching the volume file from server... [2012-10-07 03:16:02.444727] I [glusterfsd-mgmt.c:1569:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing pending frames: frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2012-10-08 19:47:30 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.3.1qa3 --- Additional comment from Raghavendra Bhat on 2012-10-10 04:51:55 EDT --- The reason for the crash is because of some problem in the backend xfs filesystem. upon lstat() was returning 5 instead of 0 or -1. Thus posix xlator did not create the dictionary which contains the xattrs information of lookup and thus returned NULL dictionary. This is the o/p of dmesg: XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned.
Fixed in version and step to verify please.
Moving to modified as the patch has been accepted. Steps to reproduce: Simulate a backend xfs failure such that when a system call(lstat in this case) is executed instead of 0 (for success) or -1 (for failure) some other value is be returned.
Raghavendra, This bug has been added to Update 4 errata. Could you provide your inputs in doc text field which will enable me to update errata?? Thanks, Divya
Marking the bug as resolved, unable to reproduce the issue. Tried with NFS and FUSE clients on the volume. Was not able to induce the crash.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0691.html