Bug 878883 - Fuse mount hangs for a volume with RDMA transport
Fuse mount hangs for a volume with RDMA transport
Status: CLOSED CURRENTRELEASE
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs-rdma (Show other bugs)
2.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Raghavendra G
shylesh
:
: 849122 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-21 08:16 EST by Ujjwala
Modified: 2015-02-13 05:24 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-02-13 05:24:04 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Ujjwala 2012-11-21 08:16:23 EST
Description of problem:
The fuse mount hangs for the volume with transport type RDMA.

Version-Release number of selected component (if applicable):
glusterfs 3.4.0qa2 built on Nov  5 2012 04:15:56

How reproducible:
Everytime

Steps to Reproduce:
1. Do the RDMA setup on the IB supported machines.
2. Create volume with the IPoIB and start the volume.
3. Try to do the fuse mount from one of the nodes, the mount hangs but the mount log is created.

[root@rhs-hpc-srv4 ~]# gluster v i
 
Volume Name: dht
Type: Distribute
Volume ID: 18079413-9add-4282-b3c9-d3d135a752c6
Status: Started
Number of Bricks: 2
Transport-type: rdma
Bricks:
Brick1: 192.168.0.1:/home/bricks/dht/b1
Brick2: 192.168.0.2:/home/bricks/dht/b2


[root@rhs-hpc-srv1 ~]# ps -ef | grep gluster
root     15535     1  0 02:49 ?        00:00:01 /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid
root     15910     1  0 03:21 ?        00:00:00 /usr/sbin/glusterfsd -s localhost --volfile-id dht.192.168.0.1.home-bricks-dht-b1 -p /var/lib/glusterd/vols/dht/run/192.168.0.1-home-bricks-dht-b1.pid -S /var/run/afc7e2b3a07656157587de11bbfc3110.socket --brick-name /home/bricks/dht/b1 -l /var/log/glusterfs/bricks/home-bricks-dht-b1.log --xlator-option *-posix.glusterd-uuid=e7c11fca-bd89-4d6b-ac59-c6ec0b217cc3 --brick-port 49152 --xlator-option dht-server.listen-port=49152
root     15922     1  0 03:21 ?        00:00:22 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/f9f22903a14167c8f07e3082ea2e97b4.socket
root     28309     1  0 05:52 ?        00:00:02 /usr/sbin/glusterfs --volfile-id=dht --volfile-server=192.168.0.2 /mnt/gfs
root     28318     1  0 05:52 pts/0    00:00:00 /bin/sh /sbin/mount.glusterfs 192.168.0.2:dht /mnt/gfs -o rw



Additional info:

[2012-11-21 11:04:14.301795] W [rdma.c:4518:gf_rdma_handshake_pollerr] (-->/usr/sbin/glusterfs(main+0x531) [0x406641] (-->/usr/lib64/libglusterfs.so.0() [0x32ac8596e7] (-->/usr/lib64/glusterfs/3.4.0qa2/rpc-transport/rdma.so(+0x7228) [0x7fa3fc6aa228]))) 0-rpc-transport/rdma: dht-client-1: peer () disconnected, cleaning up
[2012-11-21 11:04:17.305074] E [rdma.c:4601:tcp_connect_finish] 0-dht-client-0: tcp connect to  failed (Connection refused)
[2012-11-21 11:04:17.305130] W [rdma.c:4184:gf_rdma_disconnect] (-->/usr/sbin/glusterfs(main+0x531) [0x406641] (-->/usr/lib64/libglusterfs.so.0() [0x32ac8596e7] (-->/usr/lib64/glusterfs/3.4.0qa2/rpc-transport/rdma.so(+0x7328) [0x7fa3fc6aa328]))) 0-dht-client-0: disconnect called (peer:)
[2012-11-21 11:04:17.305159] W [rdma.c:4518:gf_rdma_handshake_pollerr] (-->/usr/sbin/glusterfs(main+0x531) [0x406641] (-->/usr/lib64/libglusterfs.so.0() [0x32ac8596e7] (-->/usr/lib64/glusterfs/3.4.0qa2/rpc-transport/rdma.so(+0x7228) [0x7fa3fc6aa228]))) 0-rpc-transport/rdma: dht-client-0: peer () disconnected, cleaning up
[2012-11-21 11:04:17.308184] E [rdma.c:4601:tcp_connect_finish] 0-dht-client-1: tcp connect to  failed (Connection refused)
[2012-11-21 11:04:17.308227] W [rdma.c:4184:gf_rdma_disconnect] (-->/usr/sbin/glusterfs(main+0x531) [0x406641] (-->/usr/lib64/libglusterfs.so.0() [0x32ac8596e7] (-->/usr/lib64/glusterfs/3.4.0qa2/rpc-transport/rdma.so(+0x7328) [0x7fa3fc6aa328]))) 0-dht-client-1: disconnect called (peer:)
[2012-11-21 11:04:17.308256] W [rdma.c:4518:gf_rdma_handshake_pollerr] (-->/usr/sbin/glusterfs(main+0x531) [0x406641] (-->/usr/lib64/libglusterfs.so.0() [0x32ac8596e7] (-->/usr/lib64/glusterfs/3.4.0qa2/rpc-transport/rdma.so(+0x7228) [0x7fa3fc6aa228]))) 0-rpc-transport/rdma: dht-client-1: peer () disconnected, cleaning up
[2012-11-21 11:04:20.311501] E [rdma.c:4601:tcp_connect_finish] 0-dht-client-0: tcp connect to  failed (Connection refused)
[2012-11-21 11:04:20.311559] W [rdma.c:4184:gf_rdma_disconnect] (-->/usr/sbin/glusterfs(main+0x531) [0x406641] (-->/usr/lib64/libglusterfs.so.0() [0x32ac8596e7] (-->/usr/lib64/glusterfs/3.4.0qa2/rpc-transport/rdma.so(+0x7328) [0x7fa3fc6aa328]))) 0-dht-client-0: disconnect called (peer:)
[2012-11-21 11:04:20.311588] W [rdma.c:4518:gf_rdma_handshake_pollerr] (-->/usr/sbin/g
Comment 1 Ujjwala 2012-11-21 08:22:14 EST
sosreport at: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/878883/
Comment 3 Raghavendra G 2012-11-27 03:41:31 EST
Problem is that rdma transport is being bound to port 65535 instead of 24008. Its because of a bug, where listen_port is initialised to -1 instead of 24008 (default rdma listen port).

Following patch which makes use of rdma-cm for connection establishment also contains the fix to this bug.
http://review.gluster.com/#change,149

regards,
Raghavendra.
Comment 4 Niels de Vos 2012-12-18 06:12:44 EST
If I understand this correctly, setting the option transport.rdma.listen-port to transport.rdma.listen-port in the glusterd.vol is a workaround?
Comment 5 Vijay Bellur 2012-12-18 18:43:10 EST
CHANGE: http://review.gluster.org/4323 (rpc-transport/rdma: use 24008 as default listen port.) merged in master by Anand Avati (avati@redhat.com)
Comment 6 Raghavendra G 2013-01-24 05:19:44 EST
*** Bug 849122 has been marked as a duplicate of this bug. ***
Comment 7 Joe Julian 2013-02-18 12:28:40 EST
Please backport to release-3.3
Comment 8 Sachidananda Urs 2013-08-08 01:46:50 EDT
Moving out of Big Bend since RDMA support is not available in Big Bend,2.1
Comment 11 Niels de Vos 2014-02-21 11:20:12 EST
(In reply to Niels de Vos from comment #4)
> If I understand this correctly, setting the option
> transport.rdma.listen-port to transport.rdma.listen-port in the glusterd.vol
> is a workaround?

This does not seem to be the case. Also glusterfs-3.4.0.44rhs-1.el6rhs.x86_64 already contains the patch from comment #5. Still, mounting a volume over RDMA fails.

Because these changes are not sufficient, I'm moving the state back from MODIFIED to ASSIGNED.

Note You need to log in before you can comment on or make changes to this bug.