Bug 1241621

Summary: gfapi+rdma IO errors with large block sizes (Transport endpoint is not connected)
Product: [Community] GlusterFS Reporter: dgbaley27
Component: rdmaAssignee: Mohammed Rafi KC <rkavunga>
Status: CLOSED EOL QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.7.0CC: bugs, chrisw, dgbaley27, nlevinki, rwheeler, sankarshan, smohan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-08 10:57:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description dgbaley27 2015-07-09 15:48:48 UTC
Description of problem:

When performing IO benchmarks using FIO and libgfapi+rdma, I encountered errors with sequential read workloads which turned out to be because of reads with large block sizes (16M in my cases). The error comes from "Transport endpoint is not connected". It does not occur with an identical setup but with TCP instead of RDMA.

Version-Release number of selected component (if applicable):

I'm using GlusterFS 3.7. I have not tried with an earlier version.

How reproducible:

This is reproducible 100% of the time.

Steps to Reproduce:
1. Create a volume with RDMA enabled. In detail my volume is group=virt, nfs.disable=on, and server.allow-insecure=on. Additionally I'm using replicate-3 and no striping.
2. Set memlock to unlimited
3. Run as root or normal user
4. fio --name=test --ioengine=gfapi --brick HOSTNAME --volume=VOLNAME --numjobs=1 --ramp_time=1 --runtime=5 --time_based --fallocate=keep --direct=1 --bs=16m --rw=read --size=2g --unlink=1 --minimal

Actual results:

"Transport endpoint is not connected"

Expected results:

No errors, valid benchmark data


Additional info:

Run as root or normal user. My hardware is Mellanox 40G Ethernet NICs with RoCE.

Comment 2 Mohammed Rafi KC 2015-07-13 09:58:24 UTC
This happens because, process failed to register large amount of data with rdma device, Please try to increase log_num_mtt (when loading the mlx4_core driver) and check if this helps.

Comment 3 dgbaley27 2015-07-14 02:57:57 UTC
The param that I see in mlx4_core is log_mtts_per_seg which I increased from 3 (which seems to be the default) to 7. I did this on my client and all servers. No change though, I still get the error.

Comment 5 dgbaley27 2015-07-14 17:38:11 UTC
The problem is related to performance.io-cache=off which is set by group=virt. If I set group=virt and then reset performance.io-cache, I do not get an error.

Comment 6 dgbaley27 2015-07-17 15:01:13 UTC
Eh, I'm not sure anymore. I still hit the error with io-cache. So maybe io-cache being off can hide the issue, but not always...

Comment 7 Kaushal 2017-03-08 10:57:06 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.