Description of problem: When the FSCT tool is run on the Gluster Samba share on the windows client, the Fuse mount on the server node crashes. Note: This testing was done on RHEL6.4 + Glusterfs 3.4.0.1rhs build. I will retry it on the latest iso. Version-Release number of selected component (if applicable): glusterfs 3.4.0.1rhs built on Apr 9 2013 12:37:53 How reproducible: 2/2 Setup: 4 Node cluster 2 Windows clients 1 Windows controller Steps to Reproduce: 1. On a 4 node cluster, Create and start a dis-rep volume. 2. Do the FSCT test setup as mentioned in the link below. https://home.corp.redhat.com/node/69962 3. After the setup, run the tool on the controller machine. The tests run fine for the load of 100 users. When the load is increased to 200 users, the Fuse mount on the server crashes, hence making the samba share unavailable for FSCT. Actual results: Expected results: The fuse mount should not crash. Additional info: Backtrace- (gdb) bt #0 0x00007f0f9db2c9c8 in ioc_open_cbk (frame=0x7f0fa20721a4, cookie=<value optimized out>, this=0x131e830, op_ret=0, op_errno=117, fd=0x13c56dc, xdata=0x0) at io-cache.c:554 #1 0x00007f0f9dd3bd74 in ra_open_cbk (frame=0x7f0fa2072250, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=117, fd=0x13c56dc, xdata=0x0) at read-ahead.c:103 #2 0x00007f0f9e18437b in dht_open_cbk (frame=0x7f0fa20712dc, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=117, fd=<value optimized out>, xdata=0x0) at dht-inode-read.c:55 #3 0x00007f0f9e3bd91e in afr_open_cbk (frame=0x7f0fa207102c, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, fd=<value optimized out>, xdata=0x0) at afr-open.c:178 #4 0x00007f0f9e62253b in client3_3_open_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7f0fa2072500) at client-rpc-fops.c:474 #5 0x0000003cb5c0ddf5 in rpc_clnt_handle_reply (clnt=0x138f520, pollin=0x1310c70) at rpc-clnt.c:771 #6 0x0000003cb5c0e9d7 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x138f550, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:890 #7 0x0000003cb5c0a338 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:495 #8 0x00007f0f9f8872d4 in socket_event_poll_in (this=0x139ef50) at socket.c:2118 #9 0x00007f0f9f88742d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x139ef50, poll_in=1, poll_out=0, poll_err=0) at socket.c:2230 #10 0x0000003cb545b3e7 in event_dispatch_epoll_handler (event_pool=0x12f46d0) at event-epoll.c:384 #11 event_dispatch_epoll (event_pool=0x12f46d0) at event-epoll.c:445 #12 0x0000000000406676 in main (argc=4, argv=0x7fffc7e57788) at glusterfsd.c:1902
The is issue is reproducible on the RHS 2.1 ISO also. I have updated the sosreport in the rhsqe repo. Fuse mount log snippet: ======================== pending frames: frame : type(1) op(OPENDIR) frame : type(1) op(READ) frame : type(1) op(OPEN) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-04-22 05:34:45configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.1rhs /lib64/libc.so.6[0x324d832920] /usr/lib64/glusterfs/3.4.0.1rhs/xlator/performance/io-cache.so(ioc_open_cbk+0x98)[0x7f100ab519c8] /usr/lib64/glusterfs/3.4.0.1rhs/xlator/performance/read-ahead.so(ra_open_cbk+0x1d4)[0x7f100ad60d74] /usr/lib64/glusterfs/3.4.0.1rhs/xlator/cluster/distribute.so(dht_open_cbk+0xfb)[0x7f100b1a937b] /usr/lib64/glusterfs/3.4.0.1rhs/xlator/cluster/replicate.so(afr_open_cbk+0x2de)[0x7f100b3e291e] /usr/lib64/glusterfs/3.4.0.1rhs/xlator/protocol/client.so(client3_3_open_cbk+0x18b)[0x7f100b64753b] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x324f00ddf5] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x127)[0x324f00e9d7] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x324f00a338] /usr/lib64/glusterfs/3.4.0.1rhs/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7f100c8ac2d4] /usr/lib64/glusterfs/3.4.0.1rhs/rpc-transport/socket.so(socket_event_handler+0x13d)[0x7f100c8ac42d] /usr/lib64/libglusterfs.so.0[0x324e85b3e7] /usr/sbin/glusterfs(main+0x5c6)[0x406676] /lib64/libc.so.6(__libc_start_main+0xfd)[0x324d81ecdd] /usr/sbin/glusterfs[0x404559]
FUSE mount is also crashing for me - was running arequal, iozone, glustrefs_build on FUSE mount from - RHEL 6.4 client signal received: 11 time of crash: 2013-04-26 05:43:01configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.1rhs /lib64/libc.so.6[0x3387432920] /usr/lib64/glusterfs/3.4.0.1rhs/xlator/performance/io-cache.so(ioc_open_cbk+0x98)[0x7fa7932009c8] /usr/lib64/glusterfs/3.4.0.1rhs/xlator/performance/read-ahead.so(ra_open_cbk+0x1d4)[0x7fa79340fd74] /usr/lib64/glusterfs/3.4.0.1rhs/xlator/cluster/distribute.so(dht_open_cbk+0xfb)[0x7fa79385837b] /usr/lib64/glusterfs/3.4.0.1rhs/xlator/protocol/client.so(client3_3_open_cbk+0x18b)[0x7fa793a8753b] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x3c4640ddf5] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x127)[0x3c4640e9d7] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x3c4640a338] /usr/lib64/glusterfs/3.4.0.1rhs/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7fa794ace2d4] /usr/lib64/glusterfs/3.4.0.1rhs/rpc-transport/socket.so(socket_event_handler+0x13d)[0x7fa794ace42d] /usr/lib64/libglusterfs.so.0[0x3c46c5b3e7] /usr/sbin/glusterfs(main+0x5c6)[0x406676] /lib64/libc.so.6(__libc_start_main+0xfd)[0x338741ecdd] /usr/sbin/glusterfs[0x404559] --always reproducible
*** Bug 957657 has been marked as a duplicate of this bug. ***
Lookups are not coming on the files. Hence io-cache is not able to fill in the inode context. It might be because of the new fuse module which has readdirp fop. Thus as part of readdirp the inode gets linked to the inode table. The fuse module which has obtained the inode as part of the readdirp operation sends the open call directly without sending the lookup first. But io-cache would not have built the inode context and accesses the NULL context. The solution for this might be making io-cache do the caching (i.e similar to what id does for the lookup) when readdirp reply comes.
Tested it on the build - glusterfs 3.4.0.2rhs built on May 2 2013 06:08:46 I don't see the crash in both FSCT and smbtorture testing.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html