Bug 787612 - glusterfs rdma fuse client crashed due to possible split brain situation.
glusterfs rdma fuse client crashed due to possible split brain situation.
Status: CLOSED EOL
Product: GlusterFS
Classification: Community
Component: rdma (Show other bugs)
mainline
Unspecified Unspecified
low Severity medium
: ---
: ---
Assigned To: Raghavendra G
: Triaged
Depends On:
Blocks: 849133 858454
  Show dependency treegraph
 
Reported: 2012-02-06 05:23 EST by M S Vishwanath Bhat
Modified: 2016-05-31 21:57 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 849133 (view as bug list)
Environment:
Last Closed: 2015-10-22 11:46:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
rdma fuse client log (205.74 KB, text/x-log)
2012-02-06 05:23 EST, M S Vishwanath Bhat
no flags Details

  None (edit)
Description M S Vishwanath Bhat 2012-02-06 05:23:38 EST
Created attachment 559599 [details]
rdma fuse client log

Description of problem:
I was running sanity tests on dist-rep volume with rdma transport type. rdma fuse 
client crashed with signal 6.
Version-Release number of selected component (if applicable):
glusterfs-3.3.0qa19

How reproducible:
Often (2/2)

Steps to Reproduce:
1. Create a dist-rep volume with rdma transport type.
2. Start sanity tests.
  
Actual results:
fuse client crashed with following back trace.

Core was generated by `/usr/local/sbin/glusterfs --volfile-id=hosdu --volfile-server=10.1.10.24 /mnt/'.
Program terminated with signal 6, Aborted.
#0  0x0000003d4f232905 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64 libibverbs-1.1.4-2.el6.x86_64 libmlx4-1.0.1-7.el6.x86_64
(gdb) bt
#0  0x0000003d4f232905 in raise () from /lib64/libc.so.6
#1  0x0000003d4f2340e5 in abort () from /lib64/libc.so.6
#2  0x0000003d4f22b9be in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003d4f22ba80 in __assert_fail () from /lib64/libc.so.6
#4  0x00007fb723db987d in afr_get_call_child (this=0x17686c0, child_up=0x7fb710011720 "", read_child=-1, fresh_children=0x7fb71000cd60, call_child=0x7fb71d81986c, last_index=0x7fb71001d918) at afr-common.c:670
#5  0x00007fb723d5e599 in afr_stat (frame=0x7fb72ae67c78, this=0x17686c0, loc=0x7fb7100120e8) at afr-inode-read.c:257
#6  0x00007fb723b0e6c9 in dht_stat (frame=0x7fb72ae63ca4, this=0x176a560, loc=0x7fb7100120e8) at dht-inode-read.c:302
#7  0x00007fb72389bc55 in wb_stat (frame=0x7fb72ae66198, this=0x176b810, loc=0x7fb7100120e8) at write-behind.c:753
#8  0x00007fb72c270142 in default_stat (frame=0x7fb72ae68080, this=0x176caf0, loc=0x7fb7100120e8) at defaults.c:1147
#9  0x00007fb72c270142 in default_stat (frame=0x7fb72ae679c8, this=0x176dd20, loc=0x7fb7100120e8) at defaults.c:1147
#10 0x00007fb72c270142 in default_stat (frame=0x7fb72ae64810, this=0x176eee0, loc=0x7fb7100120e8) at defaults.c:1147
#11 0x00007fb72301d661 in sp_stat (frame=0x7fb72ae69ebc, this=0x17701b0, loc=0x7fb7100120e8) at stat-prefetch.c:3644
#12 0x00007fb722dde15b in io_stats_stat (frame=0x7fb72ae64158, this=0x1771510, loc=0x7fb7100120e8) at io-stats.c:1836
#13 0x00007fb72a9124ec in fuse_getattr_resume (state=0x7fb7100120d0) at fuse-bridge.c:536
#14 0x00007fb72a90e804 in fuse_resolve_and_resume (state=0x7fb7100120d0, fn=0x7fb72a911ef5 <fuse_getattr_resume>) at fuse-resolve.c:754
#15 0x00007fb72a913783 in fuse_getattr (this=0x1759d50, finh=0x7fb7100344c0, msg=0x7fb7100344e8) at fuse-bridge.c:615
#16 0x00007fb72a92c56e in fuse_thread_proc (data=0x1759d50) at fuse-bridge.c:3482
#17 0x0000003d4fa077e1 in start_thread () from /lib64/libpthread.so.0
#18 0x0000003d4f2e577d in clone () from /lib64/libc.so.6
(gdb) f 5
#5  0x00007fb723d5e599 in afr_stat (frame=0x7fb72ae67c78, this=0x17686c0, loc=0x7fb7100120e8) at afr-inode-read.c:257
257             ret = afr_get_call_child (this, local->child_up, read_child,
(gdb) f 4
#4  0x00007fb723db987d in afr_get_call_child (this=0x17686c0, child_up=0x7fb710011720 "", read_child=-1, fresh_children=0x7fb71000cd60, call_child=0x7fb71d81986c, last_index=0x7fb71001d918) at afr-common.c:670
670             GF_ASSERT (read_child >= 0);
(gdb) 




Expected results:
There should be no crashes.

Additional info:

Entries from the client log. 


[2012-02-06 01:29:10.891992] W [client3_1-fops.c:418:client3_1_stat_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
[2012-02-06 01:29:10.892069] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc.
so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-2: forced unwindi
ng frame type(GlusterFS 3.1) op(RELEASEDIR(42)) called at 2012-02-06 01:29:10.890713
[2012-02-06 01:29:10.892115] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc.
so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-2: forced unwindi
ng frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-02-06 01:29:10.890985
[2012-02-06 01:29:10.892135] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected. Path: /run31647/pa
/f2
[2012-02-06 01:29:10.892169] I [client.c:1885:client_rpc_notify] 0-hosdu-client-2: disconnected
[2012-02-06 01:29:10.893072] E [rpc-clnt.c:771:rpc_clnt_handle_reply] 0-hosdu-client-3: cannot lookup the saved frame for reply with xid (1440190)
[2012-02-06 01:29:10.893102] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc.
so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-3: forced unwindi
ng frame type(GlusterFS 3.1) op(INODELK(29)) called at 2012-02-06 01:29:10.892259
[2012-02-06 01:29:10.893137] W [client3_1-fops.c:1235:client3_1_inodelk_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
[2012-02-06 01:29:10.893160] W [client3_1-fops.c:4721:client3_1_inodelk] 0-hosdu-client-2: failed to send the fop: Transport endpoint is not connected
[2012-02-06 01:29:10.896806] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-3: failed to submit rpc-request (XID: 0x1440192x Program: GlusterFS 3.1, ProgVers: 310, P
roc: 29) to rpc-transport (hosdu-client-3)
[2012-02-06 01:29:10.896834] W [client3_1-fops.c:1235:client3_1_inodelk_cbk] 0-hosdu-client-3: remote operation failed: Transport endpoint is not connected
[2012-02-06 01:29:10.896852] I [afr-lk-common.c:993:afr_lock_blocking] 0-hosdu-replicate-1: unable to lock on even one child
[2012-02-06 01:29:10.896869] I [afr-transaction.c:952:afr_post_blocking_inodelk_cbk] 0-hosdu-replicate-1: Blocking inodelks failed.
[2012-02-06 01:29:10.896926] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc.
so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-3: forced unwindi
ng frame type(GlusterFS 3.1) op(READLINK(2)) called at 2012-02-06 01:29:10.891941
[2012-02-06 01:29:10.896947] W [client3_1-fops.c:460:client3_1_readlink_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
[2012-02-06 01:29:10.896968] W [fuse-bridge.c:1127:fuse_readlink_cbk] 0-glusterfs-fuse: 1487166: /run31647/pd/l2 => -1 (Transport endpoint is not connected)
[2012-02-06 01:29:10.897040] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x186) [0x7fb72c0245d5] (-->/usr/local/lib/libgfrpc.
so.0(rpc_clnt_connection_cleanup+0x1c5) [0x7fb72c0234d6] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x45) [0x7fb72c022c0e]))) 0-hosdu-client-3: forced unwindi
ng frame type(GlusterFS 3.1) op(STAT(1)) called at 2012-02-06 01:29:10.892036
[2012-02-06 01:29:10.897088] W [client3_1-fops.c:418:client3_1_stat_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
[2012-02-06 01:29:10.900609] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-3: failed to submit rpc-request (XID: 0x1440193x Program: GlusterFS 3.1, ProgVers: 310, P
roc: 27) to rpc-transport (hosdu-client-3)
[2012-02-06 01:29:10.900638] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 0-hosdu-client-3: remote operation failed: Transport endpoint is not connected. Path: /run316
47/p6/f2
[2012-02-06 01:29:10.904378] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-3: failed to submit rpc-request (XID: 0x1440194x Program: GlusterFS 3.1, ProgVers: 310, P
roc: 29) to rpc-transport (hosdu-client-3)
[2012-02-06 01:29:10.904407] W [client3_1-fops.c:1235:client3_1_inodelk_cbk] 0-hosdu-client-3: remote operation failed: Transport endpoint is not connected


I have attached the client log. I have archived the core file and other logs.
Comment 1 Kaleb KEITHLEY 2015-10-22 11:46:38 EDT
because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.

Note You need to log in before you can comment on or make changes to this bug.