Bug 1411281 - Ganesha with Gluster transport RDMA does not work
Summary: Ganesha with Gluster transport RDMA does not work
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: nfs-ganesha
Classification: Retired
Component: FSAL_GLUSTER
Version: 2.4
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Jiffin
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-09 10:37 UTC by Andreas Kurzac
Modified: 2019-11-22 15:32 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-11-22 15:32:32 UTC
Embargoed:


Attachments (Terms of Use)
rpm1 (676.18 KB, application/x-rpm)
2017-01-16 12:30 UTC, Jiffin
no flags Details
rpm2 (30.16 KB, application/x-rpm)
2017-01-16 12:31 UTC, Jiffin
no flags Details
rpm3 (1.86 MB, application/x-rpm)
2017-01-16 12:32 UTC, Jiffin
no flags Details

Description Andreas Kurzac 2017-01-09 10:37:02 UTC
Description of problem:
Ganesha with Gluster transport RDMA does not work when
Gluster volume transport is set to rdma only.

Environment:
glusterfs pool with 3 servers with Centos7.3, Glusterfs 3.8.5, network is Infiniband. Pacemaker/Corosync and Ganesha-NFS is installed and all seems to be OK, no error logged.
I created a replica 3 volume with transport rdma (without tcp!).
When i mount this volume via glusterfs and do some IO, no errors are logged and everything seems to go pretty well.

How reproducible:
When i mount the volume via nfs and do some IO, nfs freezes immediatly and following logs are written to

ganesha-gfapi.log:
[2017-01-05 23:23:53.536526] W [MSGID: 103004] [rdma.c:452:gf_rdma_register_arena] 0-rdma: allocation of mr failed
[2017-01-05 23:23:53.541519] W [MSGID: 103004] [rdma.c:1463:__gf_rdma_create_read_chunks_from_vector] 0-rpc-transport/rdma: memory registration failed (peer:10.40.1.1:49152) [Keine Berechtigung]
[2017-01-05 23:23:53.541547] W [MSGID: 103029] [rdma.c:1558:__gf_rdma_create_read_chunks] 0-rpc-transport/rdma: cannot create read chunks from vector entry->prog_payload
[2017-01-05 23:23:53.541553] W [MSGID: 103033] [rdma.c:2063:__gf_rdma_ioq_churn_request] 0-rpc-transport/rdma: creation of read chunks failed
[2017-01-05 23:23:53.541557] W [MSGID: 103040] [rdma.c:2775:__gf_rdma_ioq_churn_entry] 0-rpc-transport/rdma: failed to process request ioq entry to peer(10.40.1.1:49152)
[2017-01-05 23:23:53.541562] W [MSGID: 103040] [rdma.c:2859:gf_rdma_writev] 0-vmstor1-client-0: processing ioq entry destined to (10.40.1.1:49152) failed
[2017-01-05 23:23:53.541569] W [MSGID: 103037] [rdma.c:3016:gf_rdma_submit_request] 0-rpc-transport/rdma: sending request to peer (10.40.1.1:49152) failed
[…]

Additional info:
Firewall is disabled, SELinux is disabled.
Different hardware with Centos 7.1 and the Mellanox OFED 3.4 packages instead of the Centos Infiniband packages lead to the same results.
Just to mention: I am not trying to do NFS over RDMA, the Ganesha FSAL is just configured to "glusterfs".

Comment 1 Kaleb KEITHLEY 2017-01-10 11:44:23 UTC
Jiffin wrote in gluster-users:

By checking the code IMO currently this is limitation with in FSAL_GLUSTER. It tries to

establish connection with glusterfs servers only using "tcp". It is easy to fix as well.

You can raise a bug in  https://bugzilla.redhat.com/enter_bug.cgi?product=nfs-ganesha

under FSAL_GLUSTER. I don't have any hardware to test the fix. I can either help you in

writing up fix for the issue or provide a test rpms with the fix .

Comment 2 Jiffin 2017-01-16 12:12:43 UTC
I have posted fix upstream https://review.gerrithub.io/#/c/342572/ and build rpm based on that attaching the rpms with this bug.Please test them
You may need to add following changes to export configuartion file(EXPORT {} block)
EXPORT
{
--
--
--
       FSAL {
                Name = "GLUSTER";
                Hostname = localhost;
                Volume = "testvol"; #volume name
                transport = "tcp"; #transport type tcp/rdma
        }
---
---
---
}

Comment 3 Jiffin 2017-01-16 12:30:56 UTC
Created attachment 1241211 [details]
rpm1

Comment 4 Jiffin 2017-01-16 12:31:34 UTC
Created attachment 1241212 [details]
rpm2

Comment 5 Jiffin 2017-01-16 12:32:10 UTC
Created attachment 1241213 [details]
rpm3

Comment 6 Andreas Kurzac 2017-01-24 15:54:37 UTC
I tested the fix and it works if the client is not at the same time a server.
When i mount a volume on the server, it freezes, here the details:

Separate client mounts via TCP/Ethernet: OK
Separate client mounts via TCP(IPoIB)/Infiniband:	OK
Local mount (client/server is part of cluster) via TCP/Ethernet: FAILS
ganesha-gfapi.log:
[2017-01-24 13:11:30.183538] W [MSGID: 103004] [rdma.c:452:gf_rdma_register_arena] 0-rdma: allocation of mr failed
[2017-01-24 13:11:30.184588] W [MSGID: 103004] [rdma.c:1463:__gf_rdma_create_read_chunks_from_vector] 0-rpc-transport/rdma: memory registration failed (peer:10.40.1.1:49152) [Keine Berechtigung]
[2017-01-24 13:11:30.184656] W [MSGID: 103029] [rdma.c:1558:__gf_rdma_create_read_chunks] 0-rpc-transport/rdma: cannot create read chunks from vector entry->prog_payload
[2017-01-24 13:11:30.184673] W [MSGID: 103033] [rdma.c:2063:__gf_rdma_ioq_churn_request] 0-rpc-transport/rdma: creation of read chunks failed
[2017-01-24 13:11:30.184689] W [MSGID: 103040] [rdma.c:2775:__gf_rdma_ioq_churn_entry] 0-rpc-transport/rdma: failed to process request ioq entry to peer(10.40.1.1:49152)
[2017-01-24 13:11:30.184710] W [MSGID: 103040] [rdma.c:2859:gf_rdma_writev] 0-vmstor1-client-0: processing ioq entry destined to (10.40.1.1:49152) failed
[2017-01-24 13:11:30.184723] W [MSGID: 103037] [rdma.c:3016:gf_rdma_submit_request] 0-rpc-transport/rdma: sending request to peer (10.40.1.1:49152) failed
[2017-01-24 13:11:30.184739] W [rpc-clnt.c:1640:rpc_clnt_submit] 0-vmstor1-client-0: failed to submit rpc-request (XID: 0x2c Program: GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (vmstor1-client-0)

Local mount (client/server is part of cluster) via TCP(IPoIB)/Infiniband:	FAILS
ganesha-gfapi.log:
[2017-01-24 13:59:37.990454] W [MSGID: 103004] [rdma.c:452:gf_rdma_register_arena] 0-rdma: allocation of mr failed
[2017-01-24 13:59:37.991559] W [MSGID: 103004] [rdma.c:1463:__gf_rdma_create_read_chunks_from_vector] 0-rpc-transport/rdma: memory registration failed (peer:10.40.1.1:49152) [Keine Berechtigung]
[2017-01-24 13:59:37.991586] W [MSGID: 103029] [rdma.c:1558:__gf_rdma_create_read_chunks] 0-rpc-transport/rdma: cannot create read chunks from vector entry->prog_payload
[2017-01-24 13:59:37.991591] W [MSGID: 103033] [rdma.c:2063:__gf_rdma_ioq_churn_request] 0-rpc-transport/rdma: creation of read chunks failed
[2017-01-24 13:59:37.991606] W [MSGID: 103040] [rdma.c:2775:__gf_rdma_ioq_churn_entry] 0-rpc-transport/rdma: failed to process request ioq entry to peer(10.40.1.1:49152)
[2017-01-24 13:59:37.991611] W [MSGID: 103040] [rdma.c:2859:gf_rdma_writev] 0-vmstor1-client-0: processing ioq entry destined to (10.40.1.1:49152) failed
[2017-01-24 13:59:37.991615] W [MSGID: 103037] [rdma.c:3016:gf_rdma_submit_request] 0-rpc-transport/rdma: sending request to peer (10.40.1.1:49152) failed
[2017-01-24 13:59:37.991622] W [rpc-clnt.c:1640:rpc_clnt_submit] 0-vmstor1-client-0: failed to submit rpc-request (XID: 0x2d Program: GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (vmstor1-client-0)
[2017-01-24 13:59:37.991628] W [MSGID: 114031] [client-rpc-fops.c:854:client3_3_writev_cbk] 0-vmstor1-client-0: remote operation failed [Der Socket ist nicht verbunden]

Comment 7 Jiffin 2018-01-11 09:11:25 UTC
The change got merged upstream https://review.gerrithub.io/#/c/342572/. will be available from 2.6 onwards IMO.


Note You need to log in before you can comment on or make changes to this bug.