Created attachment 559599 [details] rdma fuse client log Description of problem: I was running sanity tests on dist-rep volume with rdma transport type. rdma fuse client crashed with signal 6. Version-Release number of selected component (if applicable): glusterfs-3.3.0qa19 How reproducible: Often (2/2) Steps to Reproduce: 1. Create a dist-rep volume with rdma transport type. 2. Start sanity tests. Actual results: fuse client crashed with following back trace. Core was generated by `/usr/local/sbin/glusterfs --volfile-id=hosdu --volfile-server=10.1.10.24 /mnt/'. Program terminated with signal 6, Aborted. #0 0x0000003d4f232905 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64 libibverbs-1.1.4-2.el6.x86_64 libmlx4-1.0.1-7.el6.x86_64 (gdb) bt #0 0x0000003d4f232905 in raise () from /lib64/libc.so.6 #1 0x0000003d4f2340e5 in abort () from /lib64/libc.so.6 #2 0x0000003d4f22b9be in __assert_fail_base () from /lib64/libc.so.6 #3 0x0000003d4f22ba80 in __assert_fail () from /lib64/libc.so.6 #4 0x00007fb723db987d in afr_get_call_child (this=0x17686c0, child_up=0x7fb710011720 "", read_child=-1, fresh_children=0x7fb71000cd60, call_child=0x7fb71d81986c, last_index=0x7fb71001d918) at afr-common.c:670 #5 0x00007fb723d5e599 in afr_stat (frame=0x7fb72ae67c78, this=0x17686c0, loc=0x7fb7100120e8) at afr-inode-read.c:257 #6 0x00007fb723b0e6c9 in dht_stat (frame=0x7fb72ae63ca4, this=0x176a560, loc=0x7fb7100120e8) at dht-inode-read.c:302 #7 0x00007fb72389bc55 in wb_stat (frame=0x7fb72ae66198, this=0x176b810, loc=0x7fb7100120e8) at write-behind.c:753 #8 0x00007fb72c270142 in default_stat (frame=0x7fb72ae68080, this=0x176caf0, loc=0x7fb7100120e8) at defaults.c:1147 #9 0x00007fb72c270142 in default_stat (frame=0x7fb72ae679c8, this=0x176dd20, loc=0x7fb7100120e8) at defaults.c:1147 #10 0x00007fb72c270142 in default_stat (frame=0x7fb72ae64810, this=0x176eee0, loc=0x7fb7100120e8) at defaults.c:1147 #11 0x00007fb72301d661 in sp_stat (frame=0x7fb72ae69ebc, this=0x17701b0, loc=0x7fb7100120e8) at stat-prefetch.c:3644 #12 0x00007fb722dde15b in io_stats_stat (frame=0x7fb72ae64158, this=0x1771510, loc=0x7fb7100120e8) at io-stats.c:1836 #13 0x00007fb72a9124ec in fuse_getattr_resume (state=0x7fb7100120d0) at fuse-bridge.c:536 #14 0x00007fb72a90e804 in fuse_resolve_and_resume (state=0x7fb7100120d0, fn=0x7fb72a911ef5 <fuse_getattr_resume>) at fuse-resolve.c:754 #15 0x00007fb72a913783 in fuse_getattr (this=0x1759d50, finh=0x7fb7100344c0, msg=0x7fb7100344e8) at fuse-bridge.c:615 #16 0x00007fb72a92c56e in fuse_thread_proc (data=0x1759d50) at fuse-bridge.c:3482 #17 0x0000003d4fa077e1 in start_thread () from /lib64/libpthread.so.0 #18 0x0000003d4f2e577d in clone () from /lib64/libc.so.6 (gdb) f 5 #5 0x00007fb723d5e599 in afr_stat (frame=0x7fb72ae67c78, this=0x17686c0, loc=0x7fb7100120e8) at afr-inode-read.c:257 257 ret = afr_get_call_child (this, local->child_up, read_child, (gdb) f 4 #4 0x00007fb723db987d in afr_get_call_child (this=0x17686c0, child_up=0x7fb710011720 "", read_child=-1, fresh_children=0x7fb71000cd60, call_child=0x7fb71d81986c, last_index=0x7fb71001d918) at afr-common.c:670 670 GF_ASSERT (read_child >= 0); (gdb) Expected results: There should be no crashes. Additional info: Entries from the client log. [2012-02-06 01:29:10.891992] W [client3_1-fops.c:418:client3_1_stat_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected [2012-02-06 01:29:10.892069] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc. so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-2: forced unwindi ng frame type(GlusterFS 3.1) op(RELEASEDIR(42)) called at 2012-02-06 01:29:10.890713 [2012-02-06 01:29:10.892115] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc. so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-2: forced unwindi ng frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-02-06 01:29:10.890985 [2012-02-06 01:29:10.892135] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected. Path: /run31647/pa /f2 [2012-02-06 01:29:10.892169] I [client.c:1885:client_rpc_notify] 0-hosdu-client-2: disconnected [2012-02-06 01:29:10.893072] E [rpc-clnt.c:771:rpc_clnt_handle_reply] 0-hosdu-client-3: cannot lookup the saved frame for reply with xid (1440190) [2012-02-06 01:29:10.893102] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc. so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-3: forced unwindi ng frame type(GlusterFS 3.1) op(INODELK(29)) called at 2012-02-06 01:29:10.892259 [2012-02-06 01:29:10.893137] W [client3_1-fops.c:1235:client3_1_inodelk_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected [2012-02-06 01:29:10.893160] W [client3_1-fops.c:4721:client3_1_inodelk] 0-hosdu-client-2: failed to send the fop: Transport endpoint is not connected [2012-02-06 01:29:10.896806] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-3: failed to submit rpc-request (XID: 0x1440192x Program: GlusterFS 3.1, ProgVers: 310, P roc: 29) to rpc-transport (hosdu-client-3) [2012-02-06 01:29:10.896834] W [client3_1-fops.c:1235:client3_1_inodelk_cbk] 0-hosdu-client-3: remote operation failed: Transport endpoint is not connected [2012-02-06 01:29:10.896852] I [afr-lk-common.c:993:afr_lock_blocking] 0-hosdu-replicate-1: unable to lock on even one child [2012-02-06 01:29:10.896869] I [afr-transaction.c:952:afr_post_blocking_inodelk_cbk] 0-hosdu-replicate-1: Blocking inodelks failed. [2012-02-06 01:29:10.896926] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc. so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-3: forced unwindi ng frame type(GlusterFS 3.1) op(READLINK(2)) called at 2012-02-06 01:29:10.891941 [2012-02-06 01:29:10.896947] W [client3_1-fops.c:460:client3_1_readlink_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected [2012-02-06 01:29:10.896968] W [fuse-bridge.c:1127:fuse_readlink_cbk] 0-glusterfs-fuse: 1487166: /run31647/pd/l2 => -1 (Transport endpoint is not connected) [2012-02-06 01:29:10.897040] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc. so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-3: forced unwindi ng frame type(GlusterFS 3.1) op(STAT(1)) called at 2012-02-06 01:29:10.892036 [2012-02-06 01:29:10.897088] W [client3_1-fops.c:418:client3_1_stat_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected [2012-02-06 01:29:10.900609] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-3: failed to submit rpc-request (XID: 0x1440193x Program: GlusterFS 3.1, ProgVers: 310, P roc: 27) to rpc-transport (hosdu-client-3) [2012-02-06 01:29:10.900638] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 0-hosdu-client-3: remote operation failed: Transport endpoint is not connected. Path: /run316 47/p6/f2 [2012-02-06 01:29:10.904378] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-3: failed to submit rpc-request (XID: 0x1440194x Program: GlusterFS 3.1, ProgVers: 310, P roc: 29) to rpc-transport (hosdu-client-3) [2012-02-06 01:29:10.904407] W [client3_1-fops.c:1235:client3_1_inodelk_cbk] 0-hosdu-client-3: remote operation failed: Transport endpoint is not connected I have attached the client log. I have archived the core file and other logs.
because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice. If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.