Bug 895841 - [glusterfs-3.3.1qa3]: glusterfs client asserted
Summary: [glusterfs-3.3.1qa3]: glusterfs client asserted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: 2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Raghavendra Bhat
QA Contact: Sachidananda Urs
URL:
Whiteboard:
Depends On: 864401
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-16 05:54 UTC by Raghavendra Bhat
Modified: 2022-07-09 05:49 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.4.0qa8, glusterfs-3.3.0.5rhs-42
Doc Type: Bug Fix
Doc Text:
Cause: Because of some backend filesystem issue (xfs) lstat system call was getting 5 as returned valuse instead of either 0 for success or -1 for failure. Consequence: Since lstat's return value was checked only for -1, other non-zero value was assumed to be success and success was returned to other xlators. But the fop was a filed fop and many of the pointers were NULL. When some other component got success and accessed one of the pointers, the process crashed. Fix: Fix is check for 0 after lstat call and any value other than 0 is treated as a failure and the same is returned to the other components. Result: Now if there is a backend failure and a system call returns some value other that 0 or -1, then that system call is treated as failure and error is returned to the application.
Clone Of: 864401
Environment:
Last Closed: 2013-03-28 22:28:36 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2013:0691 0 normal SHIPPED_LIVE Important: Red Hat Storage 2.0 security, bug fix, and enhancement update #4 2013-03-29 02:21:19 UTC

Description Raghavendra Bhat 2013-01-16 05:54:01 UTC
+++ This bug was initially created as a clone of Bug #864401 +++

Description of problem:
2x2 distributed replicate volume. ran some tests on fuse and nfs mounts. unmounted nfs mount. was cded into fuse mount and the mount point was idle for 3-4 days. Later when mount point was accessed the fuse client process asserted with the below backtrace.

Core was generated by `/usr/local/sbin/glusterfs --volfile-id=mirror --volfile-server=10.70.36.4 /mnt/'.
Program terminated with signal 6, Aborted.
#0  0x0000003a68432885 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.12.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6_2.5.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x0000003a68432885 in raise () from /lib64/libc.so.6
#1  0x0000003a68434065 in abort () from /lib64/libc.so.6
#2  0x0000003a6842b9fe in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003a6842bac0 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f99a8e7d68f in afr_lookup_update_lk_counts (local=0x7f99a390ec40, this=0xb3ddb0, child_index=1, xattr=0x0)
    at ../../../../../xlators/cluster/afr/src/afr-common.c:1122
#5  0x00007f99a8e80520 in afr_lookup_handle_success (local=0x7f99a390ec40, this=0xb3ddb0, child_index=1, op_ret=0, 
    op_errno=0, inode=0x7f99a2a8304c, buf=0x7fffe9b6d450, xattr=0x0, postparent=0x7fffe9b6d3e0)
    at ../../../../../xlators/cluster/afr/src/afr-common.c:2005
#6  0x00007f99a8e8062d in afr_lookup_cbk (frame=0x7f99ac6c7cf8, cookie=0x1, this=0xb3ddb0, op_ret=0, op_errno=0, 
    inode=0x7f99a2a8304c, buf=0x7fffe9b6d450, xattr=0x0, postparent=0x7fffe9b6d3e0)
    at ../../../../../xlators/cluster/afr/src/afr-common.c:2034
#7  0x00007f99a90ca44a in client3_1_lookup_cbk (req=0x7f99a327b4dc, iov=0x7f99a327b51c, count=1, myframe=0x7f99ac6c845c)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:2636
#8  0x00007f99ad665a3a in rpc_clnt_handle_reply (clnt=0xb73b90, pollin=0x1b84430)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:786
#9  0x00007f99ad665dd7 in rpc_clnt_notify (trans=0xb83720, mydata=0xb73bc0, event=RPC_TRANSPORT_MSG_RECEIVED, 
    data=0x1b84430) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:905
#10 0x00007f99ad661ec8 in rpc_transport_notify (this=0xb83720, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1b84430)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:489
#11 0x00007f99a9f0f250 in socket_event_poll_in (this=0xb83720) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1677
#12 0x00007f99a9f0f7d4 in socket_event_handler (fd=13, idx=4, data=0xb83720, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1792
#13 0x00007f99ad8bcd78 in event_dispatch_epoll_handler (event_pool=0xb25510, events=0xb335e0, i=0)
    at ../../../libglusterfs/src/event.c:785
#14 0x00007f99ad8bcf9b in event_dispatch_epoll (event_pool=0xb25510) at ../../../libglusterfs/src/event.c:847
#15 0x00007f99ad8bd326 in event_dispatch (event_pool=0xb25510) at ../../../libglusterfs/src/event.c:947
#16 0x00000000004085a7 in main (argc=4, argv=0x7fffe9b6db48) at ../../../glusterfsd/src/glusterfsd.c:1689
(gdb)  f 4
#4  0x00007f99a8e7d68f in afr_lookup_update_lk_counts (local=0x7f99a390ec40, this=0xb3ddb0, child_index=1, xattr=0x0)
    at ../../../../../xlators/cluster/afr/src/afr-common.c:1122
1122            GF_ASSERT (xattr);
(gdb) p xattr
$1 = (dict_t *) 0x0
(gdb) f 7
#7  0x00007f99a90ca44a in client3_1_lookup_cbk (req=0x7f99a327b4dc, iov=0x7f99a327b51c, count=1, myframe=0x7f99ac6c845c)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:2636
2636            CLIENT_STACK_UNWIND (lookup, frame, rsp.op_ret, rsp.op_errno, inode,
(gdb) l
2631                    else
2632                            gf_log (this->name, GF_LOG_TRACE, "not found on remote node");
2633
2634            }
2635
2636            CLIENT_STACK_UNWIND (lookup, frame, rsp.op_ret, rsp.op_errno, inode,
2637                                 &stbuf, xdata, &postparent);
2638
2639            if (xdata)
2640                    dict_unref (xdata);
(gdb) p xdata
$2 = (dict_t *) 0x0
(gdb) 




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:

glusterfs client asserted due to NULL xattr

Expected results:

glusterfs client should not crash

Additional info:

 gluster volume info

Volume Name: mirror
Type: Distributed-Replicate
Volume ID: c059eea3-74a5-496e-bd17-db651fb910c6
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.36.3:/mnt/export/mirror
Brick2: 10.70.36.4:/mnt/export/mirror
Brick3: 10.70.36.3:/mnt/export2/mirror
Brick4: 10.70.36.4:/mnt/export2/mirror


[2012-10-03 21:26:25.960328] W [client3_1-fops.c:474:client3_1_stat_cbk] 0-mirror-client-0: remote operation failed: No such
 file or directory
[2012-10-03 21:26:25.960537] W [client3_1-fops.c:474:client3_1_stat_cbk] 0-mirror-client-3: remote operation failed: No such
 file or directory
[2012-10-03 21:26:25.960723] W [client3_1-fops.c:474:client3_1_stat_cbk] 0-mirror-client-1: remote operation failed: No such
 file or directory
[2012-10-03 21:26:25.960890] W [client3_1-fops.c:474:client3_1_stat_cbk] 0-mirror-client-2: remote operation failed: No such
 file or directory
[2012-10-07 03:16:02.424353] I [glusterfsd.c:889:reincarnate] 0-glusterfsd: Fetching the volume file from server...
[2012-10-07 03:16:02.444727] I [glusterfsd-mgmt.c:1569:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2012-10-08 19:47:30
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.1qa3

--- Additional comment from Raghavendra Bhat on 2012-10-10 04:51:55 EDT ---

The reason for the crash is because of some problem in the backend xfs filesystem. upon lstat()  was returning 5 instead of 0 or -1. Thus posix xlator did not create the dictionary which contains the xattrs information of lookup and thus returned NULL dictionary.


This is the o/p of dmesg:

XFS (dm-8): xfs_log_force: error 5 returned.
XFS (dm-7): xfs_log_force: error 5 returned.
XFS (dm-8): xfs_log_force: error 5 returned.
XFS (dm-7): xfs_log_force: error 5 returned.
XFS (dm-8): xfs_log_force: error 5 returned.
XFS (dm-7): xfs_log_force: error 5 returned.
XFS (dm-8): xfs_log_force: error 5 returned.
XFS (dm-7): xfs_log_force: error 5 returned.
XFS (dm-8): xfs_log_force: error 5 returned.
XFS (dm-7): xfs_log_force: error 5 returned.
XFS (dm-8): xfs_log_force: error 5 returned.
XFS (dm-7): xfs_log_force: error 5 returned.
XFS (dm-8): xfs_log_force: error 5 returned.
XFS (dm-7): xfs_log_force: error 5 returned.
XFS (dm-8): xfs_log_force: error 5 returned.
XFS (dm-7): xfs_log_force: error 5 returned.
XFS (dm-8): xfs_log_force: error 5 returned.
XFS (dm-7): xfs_log_force: error 5 returned.
XFS (dm-8): xfs_log_force: error 5 returned.
XFS (dm-7): xfs_log_force: error 5 returned.

Comment 3 Gowrishankar Rajaiyan 2013-01-17 08:07:12 UTC
Fixed in version and step to verify please.

Comment 5 Raghavendra Bhat 2013-01-21 06:40:45 UTC
Moving to modified as the patch has been accepted. Steps to reproduce:

Simulate a backend xfs failure such that when a system call(lstat in this case) is executed instead of 0 (for success) or -1 (for failure) some other value is be returned.

Comment 6 Divya 2013-02-12 12:06:10 UTC
Raghavendra,

This bug has been added to Update 4 errata. Could you provide your inputs in doc text field which will enable me to update errata??

Thanks,
Divya

Comment 7 Sachidananda Urs 2013-03-04 10:47:41 UTC
Marking the bug as resolved, unable to reproduce the issue. Tried with NFS and FUSE clients on the volume. Was not able to induce the crash.

Comment 9 errata-xmlrpc 2013-03-28 22:28:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0691.html


Note You need to log in before you can comment on or make changes to this bug.