Bug 849131

Summary: [5303f98f674ab5cb600dde0394ff7ddd5ba3c98a] - gluster fuse client hung during sanity runs
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vidya Sakar <vinaraya>
Component: rdmaAssignee: Raghavendra G <rgowdapp>
Status: CLOSED CURRENTRELEASE QA Contact: shylesh <shmohan>
Severity: high Docs Contact:
Priority: low    
Version: 2.0CC: aavati, gluster-bugs, rwheeler, sdharane, surs, vagarwal, vbhat
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 772880
: 858452 (view as bug list) Environment:
Last Closed: 2015-02-13 09:51:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 772880, 822337    
Bug Blocks: 858452    

Description Vidya Sakar 2012-08-17 11:55:05 UTC
+++ This bug was initially created as a clone of Bug #772880 +++

Description of problem:
I was running sanity tests on 2 way replicate system with 'rdma' transport type. Sanity got hung. But mountpoint is accessible.

Version-Release number of selected component (if applicable):
git master with head at 5303f98f674ab5cb600dde0394ff7ddd5ba3c98a

How reproducible:
2/2

Steps to Reproduce:
1. Create a replicate volume with rdma transport type.
2. Start running the sanity tests.
  
Actual results:
Sanity test hung

Expected results:
Sanity should not hang.

Additional info:

following is the entries ion client log.

[2012-01-09 23:49:29.518564] W [client3_1-fops.c:373:client3_1_open_cbk] 0-hosdu-client-0: remote operation failed: Transport endpoint is not connected. Path: /run2040/_24876_tiotest.0
[2012-01-09 23:49:29.518586] E [afr-self-heal-data.c:1278:afr_sh_data_open_cbk] 0-hosdu-replicate-0: open of /run2040/_24876_tiotest.0 failed on child hosdu-client-0 (Transport endpoint is not connected)
[2012-01-09 23:49:29.551551] E [afr-self-heal-common.c:2045:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background  data self-heal failed on /run2040/_24876_tiotest.0
[2012-01-09 23:49:29.552042] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-0: failed to submit rpc-request (XID: 0x238212x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) to rpc-transport (hosdu-client-0)
[2012-01-09 23:49:29.552065] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 0-hosdu-client-0: remote operation failed: Transport endpoint is not connected. Path: /run2040/_24876_tiotest.1
[2012-01-09 23:49:29.552697] I [afr-common.c:1297:afr_launch_self_heal] 0-hosdu-replicate-0: background  data self-heal triggered. path: /run2040/_24876_tiotest.1, reason: lookup detected pending operations
[2012-01-09 23:49:29.552753] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-0: failed to submit rpc-request (XID: 0x238213x Program: GlusterFS 3.1, ProgVers: 310, Proc: 11) to rpc-transport (hosdu-client-0)
[2012-01-09 23:49:29.552775] W [client3_1-fops.c:373:client3_1_open_cbk] 0-hosdu-client-0: remote operation failed: Transport endpoint is not connected. Path: /run2040/_24876_tiotest.1
[2012-01-09 23:49:29.552791] E [afr-self-heal-data.c:1278:afr_sh_data_open_cbk] 0-hosdu-replicate-0: open of /run2040/_24876_tiotest.1 failed on child hosdu-client-0 (Transport endpoint is not connected)
[2012-01-09 23:49:29.552957] E [afr-self-heal-common.c:2045:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background  data self-heal failed on /run2040/_24876_tiotest.1
[2012-01-09 23:49:29.553187] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-0: failed to submit rpc-request (XID: 0x238214x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) to rpc-transport (hosdu-client-0)
[2012-01-09 23:49:29.553214] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 0-hosdu-client-0: remote operation failed: Transport endpoint is not connected. Path: /run2040/_24876_tiotest.2
[2012-01-09 23:49:29.553860] I [afr-common.c:1297:afr_launch_self_heal] 0-hosdu-replicate-0: background  data self-heal triggered. path: /run2040/_24876_tiotest.2, reason: lookup detected pending operations
[2012-01-09 23:49:29.553907] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-0: failed to submit rpc-request (XID: 0x238215x Program: GlusterFS 3.1, ProgVers: 310, Proc: 11) to rpc-transport (hosdu-client-0)
[2012-01-09 23:49:29.553927] W [client3_1-fops.c:373:client3_1_open_cbk] 0-hosdu-client-0: remote operation failed: Transport endpoint is not connected. Path: /run2040/_24876_tiotest.2
[2012-01-09 23:49:29.553943] E [afr-self-heal-data.c:1278:afr_sh_data_open_cbk] 0-hosdu-replicate-0: open of /run2040/_24876_tiotest.2 failed on child hosdu-client-0 (Transport endpoint is not connected)
[2012-01-09 23:49:29.554088] E [afr-self-heal-common.c:2045:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background  data self-heal failed on /run2040/_24876_tiotest.2
[2012-01-09 23:49:29.554285] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-0: failed to submit rpc-request (XID: 0x238216x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) to rpc-transport (hosdu-client-0)
[2012-01-09 23:49:29.554310] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 0-hosdu-client-0: remote operation failed: Transport endpoint is not connected. Path: /run2040/_24876_tiotest.3
[2012-01-09 23:49:29.554826] I [afr-common.c:1297:afr_launch_self_heal] 0-hosdu-replicate-0: background  data self-heal triggered. path: /run2040/_24876_tiotest.3, reason: lookup detected pending operations
[2012-01-09 23:49:29.554873] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-0: failed to submit rpc-request (XID: 0x238217x Program: GlusterFS 3.1, ProgVers: 310, Proc: 11) to rpc-transport (hosdu-client-0)
[2012-01-09 23:49:29.554904] W [client3_1-fops.c:373:client3_1_open_cbk] 0-hosdu-client-0: remote operation failed: Transport endpoint is not connected. Path: /run2040/_24876_tiotest.3
[2012-01-09 23:49:29.554921] E [afr-self-heal-data.c:1278:afr_sh_data_open_cbk] 0-hosdu-replicate-0: open of /run2040/_24876_tiotest.3 failed on child hosdu-client-0 (Transport endpoint is not connected)
[2012-01-09 23:49:29.555072] E [afr-self-heal-common.c:2045:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background  data self-heal failed on /run2040/_24876_tiotest.3
[2012-01-09 23:49:29.555262] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-0: failed to submit rpc-request (XID: 0x238218x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) to rpc-transport (hosdu-client-0)
[2012-01-09 23:49:29.555287] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 0-hosdu-client-0: remote operation failed: Transport endpoint is not connected. Path: /run2040/p0
[2012-01-09 23:49:29.556156] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-0: failed to submit rpc-request (XID: 0x238219x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) to rpc-transport (hosdu-client-0)
[2012-01-09 23:49:29.556179] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 0-hosdu-client-0: remote operation failed: Transport endpoint is not connected. Path: /run2040/p1
[2012-01-09 23:49:29.556902] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-0: failed to submit rpc-request (XID: 0x238220x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) to rpc-transport (hosdu-client-0)
[2012-01-09 23:49:29.556925] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 0-hosdu-client-0: remote operation failed: Transport endpoint is not connected. Path: /run2040/p2
[2012-01-09 23:49:29.557709] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-0: failed to submit rpc-request (XID: 0x238221x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) to rpc-transport (hosdu-client-0)
[2012-01-09 23:49:29.557732] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 0-hosdu-client-0: remote operation failed: Transport endpoint is not connected. Path: /run2040/p3
[2012-01-09 23:49:29.558451] W [rpc-clnt.c:1478:rpc_clnt_submit] 0-hosdu-client-0: failed to submit rpc-request (XID: 0x238222x Program: GlusterFS 3.1, ProgVers: 310, Proc: 27) 


I have attached the statedumps of client and first server brick.

Comment 3 Sachidananda Urs 2013-08-08 05:45:22 UTC
Moving out of Big Bend since RDMA support is not available in Big Bend,2.1