Bug 878883

Summary: Fuse mount hangs for a volume with RDMA transport
Product: Red Hat Gluster Storage Reporter: Ujjwala <ujjwala>
Component: rdmaAssignee: Raghavendra G <rgowdapp>
Status: CLOSED CURRENTRELEASE QA Contact: shylesh <shmohan>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.0CC: aavati, joe, mozes, ndevos, poelstra, rhs-bugs, rwheeler, sdharane, surs, tnagata, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-13 05:24:04 EST Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Ujjwala 2012-11-21 08:16:23 EST
Description of problem:
The fuse mount hangs for the volume with transport type RDMA.

Version-Release number of selected component (if applicable):
glusterfs 3.4.0qa2 built on Nov  5 2012 04:15:56

How reproducible:
Everytime

Steps to Reproduce:
1. Do the RDMA setup on the IB supported machines.
2. Create volume with the IPoIB and start the volume.
3. Try to do the fuse mount from one of the nodes, the mount hangs but the mount log is created.

[root@rhs-hpc-srv4 ~]# gluster v i
 
Volume Name: dht
Type: Distribute
Volume ID: 18079413-9add-4282-b3c9-d3d135a752c6
Status: Started
Number of Bricks: 2
Transport-type: rdma
Bricks:
Brick1: 192.168.0.1:/home/bricks/dht/b1
Brick2: 192.168.0.2:/home/bricks/dht/b2


[root@rhs-hpc-srv1 ~]# ps -ef | grep gluster
root     15535     1  0 02:49 ?        00:00:01 /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid
root     15910     1  0 03:21 ?        00:00:00 /usr/sbin/glusterfsd -s localhost --volfile-id dht.192.168.0.1.home-bricks-dht-b1 -p /var/lib/glusterd/vols/dht/run/192.168.0.1-home-bricks-dht-b1.pid -S /var/run/afc7e2b3a07656157587de11bbfc3110.socket --brick-name /home/bricks/dht/b1 -l /var/log/glusterfs/bricks/home-bricks-dht-b1.log --xlator-option *-posix.glusterd-uuid=e7c11fca-bd89-4d6b-ac59-c6ec0b217cc3 --brick-port 49152 --xlator-option dht-server.listen-port=49152
root     15922     1  0 03:21 ?        00:00:22 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/f9f22903a14167c8f07e3082ea2e97b4.socket
root     28309     1  0 05:52 ?        00:00:02 /usr/sbin/glusterfs --volfile-id=dht --volfile-server=192.168.0.2 /mnt/gfs
root     28318     1  0 05:52 pts/0    00:00:00 /bin/sh /sbin/mount.glusterfs 192.168.0.2:dht /mnt/gfs -o rw



Additional info:

[2012-11-21 11:04:14.301795] W [rdma.c:4518:gf_rdma_handshake_pollerr] (-->/usr/sbin/glusterfs(main+0x531) [0x406641] (-->/usr/lib64/libglusterfs.so.0() [0x32ac8596e7] (-->/usr/lib64/glusterfs/3.4.0qa2/rpc-transport/rdma.so(+0x7228) [0x7fa3fc6aa228]))) 0-rpc-transport/rdma: dht-client-1: peer () disconnected, cleaning up
[2012-11-21 11:04:17.305074] E [rdma.c:4601:tcp_connect_finish] 0-dht-client-0: tcp connect to  failed (Connection refused)
[2012-11-21 11:04:17.305130] W [rdma.c:4184:gf_rdma_disconnect] (-->/usr/sbin/glusterfs(main+0x531) [0x406641] (-->/usr/lib64/libglusterfs.so.0() [0x32ac8596e7] (-->/usr/lib64/glusterfs/3.4.0qa2/rpc-transport/rdma.so(+0x7328) [0x7fa3fc6aa328]))) 0-dht-client-0: disconnect called (peer:)
[2012-11-21 11:04:17.305159] W [rdma.c:4518:gf_rdma_handshake_pollerr] (-->/usr/sbin/glusterfs(main+0x531) [0x406641] (-->/usr/lib64/libglusterfs.so.0() [0x32ac8596e7] (-->/usr/lib64/glusterfs/3.4.0qa2/rpc-transport/rdma.so(+0x7228) [0x7fa3fc6aa228]))) 0-rpc-transport/rdma: dht-client-0: peer () disconnected, cleaning up
[2012-11-21 11:04:17.308184] E [rdma.c:4601:tcp_connect_finish] 0-dht-client-1: tcp connect to  failed (Connection refused)
[2012-11-21 11:04:17.308227] W [rdma.c:4184:gf_rdma_disconnect] (-->/usr/sbin/glusterfs(main+0x531) [0x406641] (-->/usr/lib64/libglusterfs.so.0() [0x32ac8596e7] (-->/usr/lib64/glusterfs/3.4.0qa2/rpc-transport/rdma.so(+0x7328) [0x7fa3fc6aa328]))) 0-dht-client-1: disconnect called (peer:)
[2012-11-21 11:04:17.308256] W [rdma.c:4518:gf_rdma_handshake_pollerr] (-->/usr/sbin/glusterfs(main+0x531) [0x406641] (-->/usr/lib64/libglusterfs.so.0() [0x32ac8596e7] (-->/usr/lib64/glusterfs/3.4.0qa2/rpc-transport/rdma.so(+0x7228) [0x7fa3fc6aa228]))) 0-rpc-transport/rdma: dht-client-1: peer () disconnected, cleaning up
[2012-11-21 11:04:20.311501] E [rdma.c:4601:tcp_connect_finish] 0-dht-client-0: tcp connect to  failed (Connection refused)
[2012-11-21 11:04:20.311559] W [rdma.c:4184:gf_rdma_disconnect] (-->/usr/sbin/glusterfs(main+0x531) [0x406641] (-->/usr/lib64/libglusterfs.so.0() [0x32ac8596e7] (-->/usr/lib64/glusterfs/3.4.0qa2/rpc-transport/rdma.so(+0x7328) [0x7fa3fc6aa328]))) 0-dht-client-0: disconnect called (peer:)
[2012-11-21 11:04:20.311588] W [rdma.c:4518:gf_rdma_handshake_pollerr] (-->/usr/sbin/g
Comment 1 Ujjwala 2012-11-21 08:22:14 EST
sosreport at: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/878883/
Comment 3 Raghavendra G 2012-11-27 03:41:31 EST
Problem is that rdma transport is being bound to port 65535 instead of 24008. Its because of a bug, where listen_port is initialised to -1 instead of 24008 (default rdma listen port).

Following patch which makes use of rdma-cm for connection establishment also contains the fix to this bug.
http://review.gluster.com/#change,149

regards,
Raghavendra.
Comment 4 Niels de Vos 2012-12-18 06:12:44 EST
If I understand this correctly, setting the option transport.rdma.listen-port to transport.rdma.listen-port in the glusterd.vol is a workaround?
Comment 5 Vijay Bellur 2012-12-18 18:43:10 EST
CHANGE: http://review.gluster.org/4323 (rpc-transport/rdma: use 24008 as default listen port.) merged in master by Anand Avati (avati@redhat.com)
Comment 6 Raghavendra G 2013-01-24 05:19:44 EST
*** Bug 849122 has been marked as a duplicate of this bug. ***
Comment 7 Joe Julian 2013-02-18 12:28:40 EST
Please backport to release-3.3
Comment 8 Sachidananda Urs 2013-08-08 01:46:50 EDT
Moving out of Big Bend since RDMA support is not available in Big Bend,2.1
Comment 11 Niels de Vos 2014-02-21 11:20:12 EST
(In reply to Niels de Vos from comment #4)
> If I understand this correctly, setting the option
> transport.rdma.listen-port to transport.rdma.listen-port in the glusterd.vol
> is a workaround?

This does not seem to be the case. Also glusterfs-3.4.0.44rhs-1.el6rhs.x86_64 already contains the patch from comment #5. Still, mounting a volume over RDMA fails.

Because these changes are not sufficient, I'm moving the state back from MODIFIED to ASSIGNED.