Bug 976641 - RDMA mount fails with hang with transport type RDMA only
RDMA mount fails with hang with transport type RDMA only
Status: CLOSED EOL
Product: GlusterFS
Classification: Community
Component: rdma (Show other bugs)
3.4.0-beta
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Raghavendra G
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-21 01:29 EDT by Eco
Modified: 2015-10-07 10:05 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-07 10:05:55 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Eco 2013-06-21 01:29:00 EDT
Description of problem:
Per summary

Version-Release number of selected component (if applicable):
3.4.0-beta3

How reproducible:
100%

Steps to Reproduce:
1. Created single brick volume with rdma only for transport type
2. Mounted locally using IP of IB interface

Actual results:
Mount reports failure, mount shows that the volume is mounted, running ls freezes the terminal

Expected results:
Mounting local or remote should allow the volume to be accessed

Additional info:
Testing TCP transport works as expected

These errors repeatedly show in the logs:
[2013-06-20 19:45:00.545466] W [rdma.c:1079:gf_rdma_cm_event_handler] 0-rmda-single-client-0: cma event RDMA_CM_EVENT_ROUTE_ERROR, error -110 (me:192.168.13.3:1022 peer:192.168.13.3:24008)

[2013-06-20 19:45:03.549472] W [rdma.c:1079:gf_rdma_cm_event_handler] 0-rmda-single-client-0: cma event RDMA_CM_EVENT_ROUTE_ERROR, error -110 (me:192.168.13.3:1022 peer:192.168.13.3:24008)

[2013-06-20 19:45:06.554468] W [rdma.c:1079:gf_rdma_cm_event_handler] 0-rmda-single-client-0: cma event RDMA_CM_EVENT_ROUTE_ERROR, error -110 (me:192.168.13.3:1022 peer:192.168.13.3:24008)

[2013-06-20 19:45:09.558467] W [rdma.c:1079:gf_rdma_cm_event_handler] 0-rmda-single-client-0: cma event RDMA_CM_EVENT_ROUTE_ERROR, error -110 (me:192.168.13.3:1022 peer:192.168.13.3:24008)

-----------------------------------------------------------------------------+
  1: volume rmda-single-client-0
  2:     type protocol/client
  3:     option password 40f5e31e-ea6b-49f6-83da-1666dbaa0674
  4:     option username 7679fbef-e213-4aa9-a317-1ddcd7d9e793
  5:     option transport-type rdma
  6:     option remote-subvolume /home/glustermount
  7:     option remote-host 192.168.13.3
  8: end-volume
  9: 
 10: volume rmda-single-dht
 11:     type cluster/distribute
 12:     subvolumes rmda-single-client-0
 13: end-volume
 14: 
 15: volume rmda-single-write-behind
 16:     type performance/write-behind
 17:     subvolumes rmda-single-dht
 18: end-volume
 19: 
 20: volume rmda-single-read-ahead
 21:     type performance/read-ahead
 22:     subvolumes rmda-single-write-behind
 23: end-volume
 24: 
 25: volume rmda-single-io-cache
 26:     type performance/io-cache
 27:     subvolumes rmda-single-read-ahead
 28: end-volume
 29: 
 30: volume rmda-single-quick-read
 31:     type performance/quick-read
 32:     subvolumes rmda-single-io-cache
 33: end-volume
 34: 
 35: volume rmda-single-open-behind
 36:     type performance/open-behind
 37:     subvolumes rmda-single-quick-read
 38: end-volume
 39: 
 40: volume rmda-single-md-cache
 41:     type performance/md-cache
 42:     subvolumes rmda-single-open-behind
 43: end-volume
 44: 
 45: volume rmda-single
 46:     type debug/io-stats
 47:     option count-fop-hits off
 48:     option latency-measurement off
 49:     subvolumes rmda-single-md-cache
 50: end-volume
Comment 1 Eco 2013-06-21 11:25:03 EDT
# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination     

# telnet `hostname` 24008
Trying 10.16.157.99...
telnet: connect to address 10.16.157.99: Connection refused

# telnet `hostname` 24007
Trying 10.16.157.99...
Connected to gqaib-03.sbu.lab.eng.bos.redhat.com.
Escape character is '^]'.

So for some reason the port for RDMA mgmt does not appear to be responding
Comment 2 Raghavendra G 2013-06-24 02:04:14 EDT
Hi Eco,

Is 192.168.13.3 on server an IPoIB address? Its a requirement for rdma-cm that initial connection establishment (done by rdma connection manager) should happen on an IP over IB interface. It would be helpful if you can give us the output of ifconfig on client and server.

regards,
Raghavendra.
Comment 3 Eco 2013-06-26 13:07:46 EDT
Ragha,

Initiating with IPOIB resolved the mount issue.  Is this requirement documented? I did not see it listed searching "Infinband" or "rdma" in http://www.gluster.org/wp-content/uploads/2012/05/Gluster_File_System-3.3.0-Administration_Guide-en-US.pdf
Comment 5 Niels de Vos 2015-05-17 17:58:23 EDT
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5.

This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs@gluster.org".

If there is no response by the end of the month, this bug will get automatically closed.
Comment 6 Kaleb KEITHLEY 2015-10-07 10:05:55 EDT
GlusterFS 3.4.x has reached end-of-life.

If this bug still exists in a later release please reopen this and change the version or open a new bug.

Note You need to log in before you can comment on or make changes to this bug.