Bug 764989 (GLUSTER-3257)

Summary: stripe-replicated : fuse client core created
Product: [Community] GlusterFS Reporter: Lakshmipathi G <lakshmipathi>
Component: stripeAssignee: shishir gowda <sgowda>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.1CC: gluster-bugs, nsathyan, rahulcs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: fuse
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Lakshmipathi G 2011-07-27 07:00:50 UTC
Testing with id (73eca3be5c5ccc71bbad934338c1ef58ed37c483).
Setup:
#gluster volume info
 
Volume Name: dsr
Type: Striped-Replicate (RAID 01)
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.1.12.172:/export/dsr1
Brick2: 10.1.12.170:/export/dsr1
Brick3: 10.1.12.173:/export/dsr1
Brick4: 10.1.12.172:/export/dsr2
Brick5: 10.1.12.170:/export/dsr2
Brick6: 10.1.12.173:/export/dsr2


After Completing kernel compile and started rm kernel directory (rm -rf kernel-version) as a background process and started iozone  (iozone -a).

After some time client crashed. (stored at root.12.172:/dsr_core)
# tail -50 export-mnt-.log
[2011-07-26 23:26:57.530378] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-dsr-client-0: remote operation failed: Transport endpoint is not connected
[2011-07-26 23:26:57.530454] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x158) [0x2b5d60abdd10] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x101) [0x2b5d60abd2ab] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1c) [0x2b5d60abcdd7]))) 0-dsr-client-0: forced unwinding frame type(GlusterFS 3.1) op(STAT(1)) called at 2011-07-26 23:25:06.5708
[2011-07-26 23:26:57.530483] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-dsr-client-0: remote operation failed: Transport endpoint is not connected
[2011-07-26 23:26:57.530525] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x158) [0x2b5d60abdd10] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x101) [0x2b5d60abd2ab] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1c) [0x2b5d60abcdd7]))) 0-dsr-client-0: forced unwinding frame type(GlusterFS 3.1) op(STAT(1)) called at 2011-07-26 23:25:06.5733
[2011-07-26 23:26:57.530555] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-dsr-client-0: remote operation failed: Transport endpoint is not connected
[2011-07-26 23:26:57.530593] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x158) [0x2b5d60abdd10] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x101) [0x2b5d60abd2ab] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1c) [0x2b5d60abcdd7]))) 0-dsr-client-0: forced unwinding frame type(GlusterFS 3.1) op(STAT(1)) called at 2011-07-26 23:25:06.5756
[2011-07-26 23:26:57.530609] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-dsr-client-0: remote operation failed: Transport endpoint is not connected
[2011-07-26 23:26:57.530644] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x158) [0x2b5d60abdd10] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x101) [0x2b5d60abd2ab] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1c) [0x2b5d60abcdd7]))) 0-dsr-client-0: forced unwinding frame type(GlusterFS 3.1) op(STAT(1)) called at 2011-07-26 23:25:06.5777
[2011-07-26 23:26:57.530659] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-dsr-client-0: remote operation failed: Transport endpoint is not connected
[2011-07-26 23:26:57.530686] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x158) [0x2b5d60abdd10] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x101) [0x2b5d60abd2ab] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1c) [0x2b5d60abcdd7]))) 0-dsr-client-0: forced unwinding frame type(GlusterFS 3.1) op(STAT(1)) called at 2011-07-26 23:25:06.5802
[2011-07-26 23:26:57.530702] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-dsr-client-0: remote operation failed: Transport endpoint is not connected
[2011-07-26 23:26:57.530720] W [fuse-bridge.c:2092:fuse_readdir_cbk] 0-glusterfs-fuse: 8934870: READDIR => -1 (Transport endpoint is not connected)
[2011-07-26 23:26:57.530829] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x158) [0x2b5d60abdd10] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x101) [0x2b5d60abd2ab] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1c) [0x2b5d60abcdd7]))) 0-dsr-client-0: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2011-07-26 23:25:44.801875
[2011-07-26 23:26:57.530844] W [client-handshake.c:265:client_ping_cbk] 0-dsr-client-0: timer must have expired
[2011-07-26 23:26:57.530858] I [client.c:1883:client_rpc_notify] 0-dsr-client-0: disconnected
[2011-07-26 23:26:57.554560] W [fuse-bridge.c:2092:fuse_readdir_cbk] 0-glusterfs-fuse: 8934875: READDIR => -1 (Transport endpoint is not connected)
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2011-07-26 23:26:57
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3git
/lib64/libc.so.6[0x3a8a6302d0]
/usr/local/lib/glusterfs/3git/xlator/cluster/stripe.so(stripe_readdirp_cbk+0x60d)[0x2aaaab667a77]
/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_readdirp_cbk+0xbb3)[0x2aaaab3dfbdc]
/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_readdirp_cbk+0x2e8)[0x2aaaab1b46cb]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x1f4)[0x2b5d60abdb0a]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x289)[0x2b5d60abde41]
/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x149)[0x2b5d60aba444]
/usr/local/lib/glusterfs/3git/rpc-transport/socket.so(socket_event_poll_in+0x4b)[0x2aaaaad6d83c]
/usr/local/lib/glusterfs/3git/rpc-transport/socket.so(socket_event_handler+0x213)[0x2aaaaad6dd7e]
/usr/local/lib/libglusterfs.so.0[0x2b5d6086fb84]
/usr/local/lib/libglusterfs.so.0[0x2b5d6086fd89]
/usr/local/lib/libglusterfs.so.0(event_dispatch+0x85)[0x2b5d608700e3]
/usr/local/sbin/glusterfs(main+0x139)[0x407220]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3a8a61d994]
/usr/local/sbin/glusterfs[0x403849]

Comment 1 Anand Avati 2011-07-28 07:03:15 UTC
CHANGE: http://review.gluster.com/115 (Scenario - The race window exists when before we wind to a stat call) merged in master by Anand Avati (avati)

Comment 2 Lakshmipathi G 2011-07-29 03:35:20 UTC
applied above patch against commit-id "73eca3be5c5ccc71bbad934338c1ef58ed37c483". It passed. Will test again with latest commit and move it to verified.

Comment 3 Rahul C S 2011-08-02 08:25:58 UTC
verified against 8da4623f2274faa9e9d88f7d30babb9ea80fb141.

Created a striped-replicated volume [RAID 01] & mounted it.
Ran kernel compilation, iozone -a & rm -rf linux-kernel at the background. 
Also ran dbench, openssl untar & compile. 

Did not find any crash on the client. Moving to verified.