Bug 1411281

Summary: Ganesha with Gluster transport RDMA does not work
Product: [Retired] nfs-ganesha Reporter: Andreas Kurzac <a.kurzac>
Component: FSAL_GLUSTERAssignee: Jiffin <jthottan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.4CC: bugs, jthottan, kkeithle, ndevos, pasik, rgowdapp, srangana
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-22 15:32:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
rpm1
none
rpm2
none
rpm3 none

Description Andreas Kurzac 2017-01-09 10:37:02 UTC
Description of problem:
Ganesha with Gluster transport RDMA does not work when
Gluster volume transport is set to rdma only.

Environment:
glusterfs pool with 3 servers with Centos7.3, Glusterfs 3.8.5, network is Infiniband. Pacemaker/Corosync and Ganesha-NFS is installed and all seems to be OK, no error logged.
I created a replica 3 volume with transport rdma (without tcp!).
When i mount this volume via glusterfs and do some IO, no errors are logged and everything seems to go pretty well.

How reproducible:
When i mount the volume via nfs and do some IO, nfs freezes immediatly and following logs are written to

ganesha-gfapi.log:
[2017-01-05 23:23:53.536526] W [MSGID: 103004] [rdma.c:452:gf_rdma_register_arena] 0-rdma: allocation of mr failed
[2017-01-05 23:23:53.541519] W [MSGID: 103004] [rdma.c:1463:__gf_rdma_create_read_chunks_from_vector] 0-rpc-transport/rdma: memory registration failed (peer:10.40.1.1:49152) [Keine Berechtigung]
[2017-01-05 23:23:53.541547] W [MSGID: 103029] [rdma.c:1558:__gf_rdma_create_read_chunks] 0-rpc-transport/rdma: cannot create read chunks from vector entry->prog_payload
[2017-01-05 23:23:53.541553] W [MSGID: 103033] [rdma.c:2063:__gf_rdma_ioq_churn_request] 0-rpc-transport/rdma: creation of read chunks failed
[2017-01-05 23:23:53.541557] W [MSGID: 103040] [rdma.c:2775:__gf_rdma_ioq_churn_entry] 0-rpc-transport/rdma: failed to process request ioq entry to peer(10.40.1.1:49152)
[2017-01-05 23:23:53.541562] W [MSGID: 103040] [rdma.c:2859:gf_rdma_writev] 0-vmstor1-client-0: processing ioq entry destined to (10.40.1.1:49152) failed
[2017-01-05 23:23:53.541569] W [MSGID: 103037] [rdma.c:3016:gf_rdma_submit_request] 0-rpc-transport/rdma: sending request to peer (10.40.1.1:49152) failed
[…]

Additional info:
Firewall is disabled, SELinux is disabled.
Different hardware with Centos 7.1 and the Mellanox OFED 3.4 packages instead of the Centos Infiniband packages lead to the same results.
Just to mention: I am not trying to do NFS over RDMA, the Ganesha FSAL is just configured to "glusterfs".

Comment 1 Kaleb KEITHLEY 2017-01-10 11:44:23 UTC
Jiffin wrote in gluster-users:

By checking the code IMO currently this is limitation with in FSAL_GLUSTER. It tries to

establish connection with glusterfs servers only using "tcp". It is easy to fix as well.

You can raise a bug in  https://bugzilla.redhat.com/enter_bug.cgi?product=nfs-ganesha

under FSAL_GLUSTER. I don't have any hardware to test the fix. I can either help you in

writing up fix for the issue or provide a test rpms with the fix .

Comment 2 Jiffin 2017-01-16 12:12:43 UTC
I have posted fix upstream https://review.gerrithub.io/#/c/342572/ and build rpm based on that attaching the rpms with this bug.Please test them
You may need to add following changes to export configuartion file(EXPORT {} block)
EXPORT
{
--
--
--
       FSAL {
                Name = "GLUSTER";
                Hostname = localhost;
                Volume = "testvol"; #volume name
                transport = "tcp"; #transport type tcp/rdma
        }
---
---
---
}

Comment 3 Jiffin 2017-01-16 12:30:56 UTC
Created attachment 1241211 [details]
rpm1

Comment 4 Jiffin 2017-01-16 12:31:34 UTC
Created attachment 1241212 [details]
rpm2

Comment 5 Jiffin 2017-01-16 12:32:10 UTC
Created attachment 1241213 [details]
rpm3

Comment 6 Andreas Kurzac 2017-01-24 15:54:37 UTC
I tested the fix and it works if the client is not at the same time a server.
When i mount a volume on the server, it freezes, here the details:

Separate client mounts via TCP/Ethernet: OK
Separate client mounts via TCP(IPoIB)/Infiniband:	OK
Local mount (client/server is part of cluster) via TCP/Ethernet: FAILS
ganesha-gfapi.log:
[2017-01-24 13:11:30.183538] W [MSGID: 103004] [rdma.c:452:gf_rdma_register_arena] 0-rdma: allocation of mr failed
[2017-01-24 13:11:30.184588] W [MSGID: 103004] [rdma.c:1463:__gf_rdma_create_read_chunks_from_vector] 0-rpc-transport/rdma: memory registration failed (peer:10.40.1.1:49152) [Keine Berechtigung]
[2017-01-24 13:11:30.184656] W [MSGID: 103029] [rdma.c:1558:__gf_rdma_create_read_chunks] 0-rpc-transport/rdma: cannot create read chunks from vector entry->prog_payload
[2017-01-24 13:11:30.184673] W [MSGID: 103033] [rdma.c:2063:__gf_rdma_ioq_churn_request] 0-rpc-transport/rdma: creation of read chunks failed
[2017-01-24 13:11:30.184689] W [MSGID: 103040] [rdma.c:2775:__gf_rdma_ioq_churn_entry] 0-rpc-transport/rdma: failed to process request ioq entry to peer(10.40.1.1:49152)
[2017-01-24 13:11:30.184710] W [MSGID: 103040] [rdma.c:2859:gf_rdma_writev] 0-vmstor1-client-0: processing ioq entry destined to (10.40.1.1:49152) failed
[2017-01-24 13:11:30.184723] W [MSGID: 103037] [rdma.c:3016:gf_rdma_submit_request] 0-rpc-transport/rdma: sending request to peer (10.40.1.1:49152) failed
[2017-01-24 13:11:30.184739] W [rpc-clnt.c:1640:rpc_clnt_submit] 0-vmstor1-client-0: failed to submit rpc-request (XID: 0x2c Program: GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (vmstor1-client-0)

Local mount (client/server is part of cluster) via TCP(IPoIB)/Infiniband:	FAILS
ganesha-gfapi.log:
[2017-01-24 13:59:37.990454] W [MSGID: 103004] [rdma.c:452:gf_rdma_register_arena] 0-rdma: allocation of mr failed
[2017-01-24 13:59:37.991559] W [MSGID: 103004] [rdma.c:1463:__gf_rdma_create_read_chunks_from_vector] 0-rpc-transport/rdma: memory registration failed (peer:10.40.1.1:49152) [Keine Berechtigung]
[2017-01-24 13:59:37.991586] W [MSGID: 103029] [rdma.c:1558:__gf_rdma_create_read_chunks] 0-rpc-transport/rdma: cannot create read chunks from vector entry->prog_payload
[2017-01-24 13:59:37.991591] W [MSGID: 103033] [rdma.c:2063:__gf_rdma_ioq_churn_request] 0-rpc-transport/rdma: creation of read chunks failed
[2017-01-24 13:59:37.991606] W [MSGID: 103040] [rdma.c:2775:__gf_rdma_ioq_churn_entry] 0-rpc-transport/rdma: failed to process request ioq entry to peer(10.40.1.1:49152)
[2017-01-24 13:59:37.991611] W [MSGID: 103040] [rdma.c:2859:gf_rdma_writev] 0-vmstor1-client-0: processing ioq entry destined to (10.40.1.1:49152) failed
[2017-01-24 13:59:37.991615] W [MSGID: 103037] [rdma.c:3016:gf_rdma_submit_request] 0-rpc-transport/rdma: sending request to peer (10.40.1.1:49152) failed
[2017-01-24 13:59:37.991622] W [rpc-clnt.c:1640:rpc_clnt_submit] 0-vmstor1-client-0: failed to submit rpc-request (XID: 0x2d Program: GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (vmstor1-client-0)
[2017-01-24 13:59:37.991628] W [MSGID: 114031] [client-rpc-fops.c:854:client3_3_writev_cbk] 0-vmstor1-client-0: remote operation failed [Der Socket ist nicht verbunden]

Comment 7 Jiffin 2018-01-11 09:11:25 UTC
The change got merged upstream https://review.gerrithub.io/#/c/342572/. will be available from 2.6 onwards IMO.