+++ This bug was initially created as a clone of Bug #787612 +++ Created attachment 559599 [details] rdma fuse client log Description of problem: I was running sanity tests on dist-rep volume with rdma transport type. rdma fuse client crashed with signal 6. Version-Release number of selected component (if applicable): glusterfs-3.3.0qa19 How reproducible: Often (2/2) Steps to Reproduce: 1. Create a dist-rep volume with rdma transport type. 2. Start sanity tests. Actual results: fuse client crashed with following back trace. Core was generated by `/usr/local/sbin/glusterfs --volfile-id=hosdu --volfile-server=10.1.10.24 /mnt/'. Program terminated with signal 6, Aborted. #0 0x0000003d4f232905 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64 libibverbs-1.1.4-2.el6.x86_64 libmlx4-1.0.1-7.el6.x86_64 (gdb) bt #0 0x0000003d4f232905 in raise () from /lib64/libc.so.6 #1 0x0000003d4f2340e5 in abort () from /lib64/libc.so.6 #2 0x0000003d4f22b9be in __assert_fail_base () from /lib64/libc.so.6 #3 0x0000003d4f22ba80 in __assert_fail () from /lib64/libc.so.6 #4 0x00007fb723db987d in afr_get_call_child (this=0x17686c0, child_up=0x7fb710011720 "", read_child=-1, fresh_children=0x7fb71000cd60, call_child=0x7fb71d81986c, last_index=0x7fb71001d918) at afr-common.c:670 #5 0x00007fb723d5e599 in afr_stat (frame=0x7fb72ae67c78, this=0x17686c0, loc=0x7fb7100120e8) at afr-inode-read.c:257 #6 0x00007fb723b0e6c9 in dht_stat (frame=0x7fb72ae63ca4, this=0x176a560, loc=0x7fb7100120e8) at dht-inode-read.c:302 #7 0x00007fb72389bc55 in wb_stat (frame=0x7fb72ae66198, this=0x176b810, loc=0x7fb7100120e8) at write-behind.c:753 #8 0x00007fb72c270142 in default_stat (frame=0x7fb72ae68080, this=0x176caf0, loc=0x7fb7100120e8) at defaults.c:1147 #9 0x00007fb72c270142 in default_stat (frame=0x7fb72ae679c8, this=0x176dd20, loc=0x7fb7100120e8) at defaults.c:1147 #10 0x00007fb72c270142 in default_stat (frame=0x7fb72ae64810, this=0x176eee0, loc=0x7fb7100120e8) at defaults.c:1147 #11 0x00007fb72301d661 in sp_stat (frame=0x7fb72ae69ebc, this=0x17701b0, loc=0x7fb7100120e8) at stat-prefetch.c:3644 #12 0x00007fb722dde15b in io_stats_stat (frame=0x7fb72ae64158, this=0x1771510, loc=0x7fb7100120e8) at io-stats.c:1836 #13 0x00007fb72a9124ec in fuse_getattr_resume (state=0x7fb7100120d0) at fuse-bridge.c:536 #14 0x00007fb72a90e804 in fuse_resolve_and_resume (state=0x7fb7100120d0, fn=0x7fb72a911ef5 <fuse_getattr_resume>) at fuse-resolve.c:754 #15 0x00007fb72a913783 in fuse_getattr (this=0x1759d50, finh=0x7fb7100344c0, msg=0x7fb7100344e8) at fuse-bridge.c:615 #16 0x00007fb72a92c56e in fuse_thread_proc (data=0x1759d50) at fuse-bridge.c:3482 #17 0x0000003d4fa077e1 in start_thread () from /lib64/libpthread.so.0 #18 0x0000003d4f2e577d in clone () from /lib64/libc.so.6 (gdb) f 5 #5 0x00007fb723d5e599 in afr_stat (frame=0x7fb72ae67c78, this=0x17686c0, loc=0x7fb7100120e8) at afr-inode-read.c:257 257 ret = afr_get_call_child (this, local->child_up, read_child, (gdb) f 4 #4 0x00007fb723db987d in afr_get_call_child (this=0x17686c0, child_up=0x7fb710011720 "", read_child=-1, fresh_children=0x7fb71000cd60, call_child=0x7fb71d81986c, last_index=0x7fb71001d918) at afr-common.c:670 670 GF_ASSERT (read_child >= 0); (gdb) Expected results: There should be no crashes. Additional info: Entries from the client log. [2012-02-06 01:29:10.891992] W [client3_1-fops.c:418:client3_1_stat_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected [2012-02-06 01:29:10.892069] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc. so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-2: forced unwindi ng frame type(GlusterFS 3.1) op(RELEASEDIR(42)) called at 2012-02-06 01:29:10.890713 [2012-02-06 01:29:10.892115] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc. so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-2: forced unwindi ng frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-02-06 01:29:10.890985 [2012-02-06 01:29:10.892135] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected. Path: /run31647/pa /f2 [2012-02-06 01:29:10.892169] I [client.c:1885:client_rpc_notify] 0-hosdu-client-2: disconnected [2012-02-06 01:29:10.893072] E [rpc-clnt.c:771:rpc_clnt_handle_reply] 0-hosdu-client-3: cannot lookup the saved frame for reply with xid (1440190) [2012-02-06 01:29:10.893102] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc. so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-3: forced unwindi ng frame type(GlusterFS 3.1) op(INODELK(29)) called at 2012-02-06 01:29:10.892259 [2012-02-06 01:29:10.893137] W [client3_1-fops.c:1235:client3_1_inodelk_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected [2012-02-06 01:29:10.893160] W [client3_1-fops.c:4721:client3_1_inodelk] 0-hosdu-client-2: failed to send the fop: Transport endpoint is not connected [2012-02-06 01:29:10.896806] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-3: failed to submit rpc-request (XID: 0x1440192x Program: GlusterFS 3.1, ProgVers: 310, P roc: 29) to rpc-transport (hosdu-client-3) [2012-02-06 01:29:10.896834] W [client3_1-fops.c:1235:client3_1_inodelk_cbk] 0-hosdu-client-3: remote operation failed: Transport endpoint is not connected [2012-02-06 01:29:10.896852] I [afr-lk-common.c:993:afr_lock_blocking] 0-hosdu-replicate-1: unable to lock on even one child [2012-02-06 01:29:10.896869] I [afr-transaction.c:952:afr_post_blocking_inodelk_cbk] 0-hosdu-replicate-1: Blocking inodelks failed. [2012-02-06 01:29:10.896926] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc. so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-3: forced unwindi ng frame type(GlusterFS 3.1) op(READLINK(2)) called at 2012-02-06 01:29:10.891941 [2012-02-06 01:29:10.896947] W [client3_1-fops.c:460:client3_1_readlink_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected [2012-02-06 01:29:10.896968] W [fuse-bridge.c:1127:fuse_readlink_cbk] 0-glusterfs-fuse: 1487166: /run31647/pd/l2 => -1 (Transport endpoint is not connected) [2012-02-06 01:29:10.897040] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc. so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-3: forced unwindi ng frame type(GlusterFS 3.1) op(STAT(1)) called at 2012-02-06 01:29:10.892036 [2012-02-06 01:29:10.897088] W [client3_1-fops.c:418:client3_1_stat_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected [2012-02-06 01:29:10.900609] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-3: failed to submit rpc-request (XID: 0x1440193x Program: GlusterFS 3.1, ProgVers: 310, P roc: 27) to rpc-transport (hosdu-client-3) [2012-02-06 01:29:10.900638] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 0-hosdu-client-3: remote operation failed: Transport endpoint is not connected. Path: /run316 47/p6/f2 [2012-02-06 01:29:10.904378] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-3: failed to submit rpc-request (XID: 0x1440194x Program: GlusterFS 3.1, ProgVers: 310, P roc: 29) to rpc-transport (hosdu-client-3) [2012-02-06 01:29:10.904407] W [client3_1-fops.c:1235:client3_1_inodelk_cbk] 0-hosdu-client-3: remote operation failed: Transport endpoint is not connected I have attached the client log. I have archived the core file and other logs.
Moving out of Big Bend since RDMA support is not available in Big Bend,2.1
we ran the sanity tests for rdma, and couldn't reproduce the bug. Hence closing the bug as cuurent release.