Bug 820355 - [glusterfs-3.3.0qa40] - glusterfs fuse client crashed with loc->path being zero.
Summary: [glusterfs-3.3.0qa40] - glusterfs fuse client crashed with loc->path being zero.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: unclassified
Version: pre-release
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact: M S Vishwanath Bhat
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-05-09 18:24 UTC by M S Vishwanath Bhat
Modified: 2016-06-01 01:55 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:29:22 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: 3.3.0qa42
Embargoed:


Attachments (Terms of Use)

Description M S Vishwanath Bhat 2012-05-09 18:24:31 UTC
Description of problem:
In a striped-replicated volume, I was untarring the Linux kernel on the mountpoint. Then when I ran ls on it, fuse client crashed.

Version-Release number of selected component (if applicable):
glusterfs-3.3.0qa40

How reproducible:
Consistent

Steps to Reproduce:
1. Create and start a striped-replicated-volume. (or distributed-striped-replicated volume).
2. Now do a fuse mount and untar the Linux kernel on the mountpoint.
3. Run ls after or even during the untarring.
  
Actual results:
Fuse client crashed with following back trace.

Core was generated by `/usr/local/sbin/glusterfs --volfile-id=hosdu --volfile-server=172.17.251.63 /mn'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000035bbc7e3e6 in __strcmp_sse2 () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.9.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6_2.3.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x00000035bbc7e3e6 in __strcmp_sse2 () from /lib64/libc.so.6
#1  0x00007fc20439e1a3 in afr_lookup (frame=0x7fc207bdd744, this=0x1c71e90, loc=0x7fffca8662b0, xattr_req=0x30026ec) at afr-common.c:2122
#2  0x00007fc204128452 in stripe_readdirp_cbk (frame=0x7fc207bdb9b4, cookie=0x7fc207bdb85c, this=0x1c748d0, op_ret=4, op_errno=2, orig_entries=0x7fffca866500,
    xdata=0x0) at stripe.c:4013
#3  0x00007fc204346366 in afr_readdirp_cbk (frame=0x7fc207bdb85c, cookie=0x1, this=0x1c71e90, op_ret=4, op_errno=2, entries=0x7fffca866500, xdata=0x0)
    at afr-dir-read.c:626
#4  0x00007fc2045e55ab in client3_1_readdirp_cbk (req=0x7fc1fc47bee8, iov=0x7fc1fc47bf28, count=1, myframe=0x7fc207bdb500) at client3_1-fops.c:2311
#5  0x00007fc208b84a48 in rpc_clnt_handle_reply (clnt=0x1cfc8b0, pollin=0x2f84940) at rpc-clnt.c:797
#6  0x00007fc208b84de5 in rpc_clnt_notify (trans=0x1d0c440, mydata=0x1cfc8e0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2f84940) at rpc-clnt.c:916
#7  0x00007fc208b80ec8 in rpc_transport_notify (this=0x1d0c440, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2f84940) at rpc-transport.c:498
#8  0x00007fc20542d280 in socket_event_poll_in (this=0x1d0c440) at socket.c:1686
#9  0x00007fc20542d804 in socket_event_handler (fd=13, idx=6, data=0x1d0c440, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801
#10 0x00007fc208ddbc48 in event_dispatch_epoll_handler (event_pool=0x1c55500, events=0x1c63560, i=0) at event.c:794
#11 0x00007fc208ddbe6b in event_dispatch_epoll (event_pool=0x1c55500) at event.c:856
#12 0x00007fc208ddc1f6 in event_dispatch (event_pool=0x1c55500) at event.c:956
#13 0x00000000004082a4 in main (argc=4, argv=0x7fffca866b68) at glusterfsd.c:1652
(gdb) f 1
#1  0x00007fc20439e1a3 in afr_lookup (frame=0x7fc207bdd744, this=0x1c71e90, loc=0x7fffca8662b0, xattr_req=0x30026ec) at afr-common.c:2122
2122            if (!strcmp (loc->path, "/" GF_REPLICATE_TRASH_DIR)) {
(gdb) p loc->path
$1 = 0x0
(gdb) f 2
#2  0x00007fc204128452 in stripe_readdirp_cbk (frame=0x7fc207bdb9b4, cookie=0x7fc207bdb85c, this=0x1c748d0, op_ret=4, op_errno=2, orig_entries=0x7fffca866500, 
    xdata=0x0) at stripe.c:4013
4013                            STACK_WIND (local_frame, stripe_readdirp_lookup_cbk,
(gdb) 


Expected results:
glusterfs client should not crash.

Additional info:

I have archived the log files and core.

Comment 1 Vijaykumar Koppad 2012-05-14 07:18:50 UTC
Same thing happened to me in geo-rep testing with dist-striped-replicate volume.
All the glusterfs went down. It resulted in all the aux mount crash and geo-rep status going to faulty state.

this is the backtrace in the log file.



[2012-05-14 00:20:33.462141] I [afr-common.c:1971:afr_set_root_inode_on_first_lookup] 0-doa-replicate-2: added root inode
[2012-05-14 00:20:33.462201] I [afr-common.c:1971:afr_set_root_inode_on_first_lookup] 0-doa-replicate-3: added root inode
[2012-05-14 00:28:38.071932] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2012-05-14 00:28:40.678708] I [glusterfsd-mgmt.c:1565:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
pending frames:
frame : type(1) op(READDIR)
frame : type(1) op(READDIR)
frame : type(1) op(READDIR)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-05-14 00:54:27
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.0qa40
/lib64/libc.so.6[0x39db832900]
/usr/lib64/glusterfs/3.3.0qa40/xlator/cluster/replicate.so(afr_lookup+0xa5)[0x7f30115ceac5]
/usr/lib64/glusterfs/3.3.0qa40/xlator/cluster/stripe.so(stripe_readdirp_cbk+0x536)[0x7f301136b346]
/usr/lib64/glusterfs/3.3.0qa40/xlator/cluster/replicate.so(afr_readdirp_cbk+0x1ca)[0x7f301158a69a]
/usr/lib64/glusterfs/3.3.0qa40/xlator/protocol/client.so(client3_1_readdirp_cbk+0x170)[0x7f3011803b00]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x306240f302]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb6)[0x306240f516]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x27)[0x306240ae17]
/usr/lib64/glusterfs/3.3.0qa40/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x7f3012648c8f]
/usr/lib64/glusterfs/3.3.0qa40/rpc-transport/socket.so(socket_event_handler+0x188)[0x7f3012648e38]
/usr/lib64/libglusterfs.so.0[0x3061c3e941]
/usr/sbin/glusterfs(main+0x502)[0x4066c2]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x39db81ecdd]
/usr/sbin/glusterfs[0x404349]

Comment 2 M S Vishwanath Bhat 2012-05-14 07:26:14 UTC
Blocking many test cases. Moving the severity to high.

Comment 3 Amar Tumballi 2012-05-14 07:55:21 UTC
please see if the patch http://review.gluster.com/3325 fixes the issue, and continue your tests with the patch included.

Comment 4 M S Vishwanath Bhat 2012-05-14 11:48:53 UTC
With the patch applied, I didn't see any crash. But client got hung few times and I see a lot of below warnings in the client log file.


[2012-05-14 06:30:56.504861] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.505110] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.508427] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.508720] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.513763] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.514073] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.514676] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.514935] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.515498] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.515787] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.516336] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.516614] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.517262] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.517584] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.518198] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-0: remote operation failed: No data available. Path: (null) (--)
[2012-05-14 06:30:56.518618] W [client3_1-fops.c:1058:client3_1_getxattr_cbk] 0-hosdu-client-1: remote operation failed: No data available. Path: (null) (--)

Comment 5 Anand Avati 2012-05-16 22:48:06 UTC
CHANGE: http://review.gluster.com/3325 (cluster/replicate: check for 'loc->path' before dereferencing it) merged in master by Anand Avati (avati)

Comment 6 Amar Tumballi 2012-05-17 15:56:21 UTC
Please verify the crash issue.

Comment 7 Anand Avati 2012-05-19 12:21:49 UTC
CHANGE: http://review.gluster.com/3374 (cluster/afr: Assign gfid path if path is NULL in lookup) merged in release-3.3 by Vijay Bellur (vijay)

Comment 8 Vijaykumar Koppad 2012-05-22 09:43:06 UTC
On glusterfs-3.3.0qa42 , there are no crashes and no hangs on clients .


Note You need to log in before you can comment on or make changes to this bug.