Hide Forgot
Testing with id (73eca3be5c5ccc71bbad934338c1ef58ed37c483). Setup: #gluster volume info Volume Name: dsr Type: Striped-Replicate (RAID 01) Status: Started Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: 10.1.12.172:/export/dsr1 Brick2: 10.1.12.170:/export/dsr1 Brick3: 10.1.12.173:/export/dsr1 Brick4: 10.1.12.172:/export/dsr2 Brick5: 10.1.12.170:/export/dsr2 Brick6: 10.1.12.173:/export/dsr2 After Completing kernel compile and started rm kernel directory (rm -rf kernel-version) as a background process and started iozone (iozone -a). After some time client crashed. (stored at root.12.172:/dsr_core) # tail -50 export-mnt-.log [2011-07-26 23:26:57.530378] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-dsr-client-0: remote operation failed: Transport endpoint is not connected [2011-07-26 23:26:57.530454] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x158) [0x2b5d60abdd10] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x101) [0x2b5d60abd2ab] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1c) [0x2b5d60abcdd7]))) 0-dsr-client-0: forced unwinding frame type(GlusterFS 3.1) op(STAT(1)) called at 2011-07-26 23:25:06.5708 [2011-07-26 23:26:57.530483] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-dsr-client-0: remote operation failed: Transport endpoint is not connected [2011-07-26 23:26:57.530525] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x158) [0x2b5d60abdd10] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x101) [0x2b5d60abd2ab] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1c) [0x2b5d60abcdd7]))) 0-dsr-client-0: forced unwinding frame type(GlusterFS 3.1) op(STAT(1)) called at 2011-07-26 23:25:06.5733 [2011-07-26 23:26:57.530555] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-dsr-client-0: remote operation failed: Transport endpoint is not connected [2011-07-26 23:26:57.530593] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x158) [0x2b5d60abdd10] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x101) [0x2b5d60abd2ab] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1c) [0x2b5d60abcdd7]))) 0-dsr-client-0: forced unwinding frame type(GlusterFS 3.1) op(STAT(1)) called at 2011-07-26 23:25:06.5756 [2011-07-26 23:26:57.530609] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-dsr-client-0: remote operation failed: Transport endpoint is not connected [2011-07-26 23:26:57.530644] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x158) [0x2b5d60abdd10] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x101) [0x2b5d60abd2ab] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1c) [0x2b5d60abcdd7]))) 0-dsr-client-0: forced unwinding frame type(GlusterFS 3.1) op(STAT(1)) called at 2011-07-26 23:25:06.5777 [2011-07-26 23:26:57.530659] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-dsr-client-0: remote operation failed: Transport endpoint is not connected [2011-07-26 23:26:57.530686] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x158) [0x2b5d60abdd10] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x101) [0x2b5d60abd2ab] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1c) [0x2b5d60abcdd7]))) 0-dsr-client-0: forced unwinding frame type(GlusterFS 3.1) op(STAT(1)) called at 2011-07-26 23:25:06.5802 [2011-07-26 23:26:57.530702] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-dsr-client-0: remote operation failed: Transport endpoint is not connected [2011-07-26 23:26:57.530720] W [fuse-bridge.c:2092:fuse_readdir_cbk] 0-glusterfs-fuse: 8934870: READDIR => -1 (Transport endpoint is not connected) [2011-07-26 23:26:57.530829] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x158) [0x2b5d60abdd10] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x101) [0x2b5d60abd2ab] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1c) [0x2b5d60abcdd7]))) 0-dsr-client-0: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2011-07-26 23:25:44.801875 [2011-07-26 23:26:57.530844] W [client-handshake.c:265:client_ping_cbk] 0-dsr-client-0: timer must have expired [2011-07-26 23:26:57.530858] I [client.c:1883:client_rpc_notify] 0-dsr-client-0: disconnected [2011-07-26 23:26:57.554560] W [fuse-bridge.c:2092:fuse_readdir_cbk] 0-glusterfs-fuse: 8934875: READDIR => -1 (Transport endpoint is not connected) pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2011-07-26 23:26:57 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3git /lib64/libc.so.6[0x3a8a6302d0] /usr/local/lib/glusterfs/3git/xlator/cluster/stripe.so(stripe_readdirp_cbk+0x60d)[0x2aaaab667a77] /usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_readdirp_cbk+0xbb3)[0x2aaaab3dfbdc] /usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_readdirp_cbk+0x2e8)[0x2aaaab1b46cb] /usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x1f4)[0x2b5d60abdb0a] /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x289)[0x2b5d60abde41] /usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x149)[0x2b5d60aba444] /usr/local/lib/glusterfs/3git/rpc-transport/socket.so(socket_event_poll_in+0x4b)[0x2aaaaad6d83c] /usr/local/lib/glusterfs/3git/rpc-transport/socket.so(socket_event_handler+0x213)[0x2aaaaad6dd7e] /usr/local/lib/libglusterfs.so.0[0x2b5d6086fb84] /usr/local/lib/libglusterfs.so.0[0x2b5d6086fd89] /usr/local/lib/libglusterfs.so.0(event_dispatch+0x85)[0x2b5d608700e3] /usr/local/sbin/glusterfs(main+0x139)[0x407220] /lib64/libc.so.6(__libc_start_main+0xf4)[0x3a8a61d994] /usr/local/sbin/glusterfs[0x403849]
CHANGE: http://review.gluster.com/115 (Scenario - The race window exists when before we wind to a stat call) merged in master by Anand Avati (avati)
applied above patch against commit-id "73eca3be5c5ccc71bbad934338c1ef58ed37c483". It passed. Will test again with latest commit and move it to verified.
verified against 8da4623f2274faa9e9d88f7d30babb9ea80fb141. Created a striped-replicated volume [RAID 01] & mounted it. Ran kernel compilation, iozone -a & rm -rf linux-kernel at the background. Also ran dbench, openssl untar & compile. Did not find any crash on the client. Moving to verified.