Bug 1241621

Summary:	gfapi+rdma IO errors with large block sizes (Transport endpoint is not connected)
Product:	[Community] GlusterFS	Reporter:	dgbaley27
Component:	rdma	Assignee:	Mohammed Rafi KC <rkavunga>
Status:	CLOSED EOL	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.7.0	CC:	bugs, chrisw, dgbaley27, nlevinki, rwheeler, sankarshan, smohan
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-03-08 10:57:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description dgbaley27 2015-07-09 15:48:48 UTC

Description of problem:

When performing IO benchmarks using FIO and libgfapi+rdma, I encountered errors with sequential read workloads which turned out to be because of reads with large block sizes (16M in my cases). The error comes from "Transport endpoint is not connected". It does not occur with an identical setup but with TCP instead of RDMA.

Version-Release number of selected component (if applicable):

I'm using GlusterFS 3.7. I have not tried with an earlier version.

How reproducible:

This is reproducible 100% of the time.

Steps to Reproduce:
1. Create a volume with RDMA enabled. In detail my volume is group=virt, nfs.disable=on, and server.allow-insecure=on. Additionally I'm using replicate-3 and no striping.
2. Set memlock to unlimited
3. Run as root or normal user
4. fio --name=test --ioengine=gfapi --brick HOSTNAME --volume=VOLNAME --numjobs=1 --ramp_time=1 --runtime=5 --time_based --fallocate=keep --direct=1 --bs=16m --rw=read --size=2g --unlink=1 --minimal

Actual results:

"Transport endpoint is not connected"

Expected results:

No errors, valid benchmark data


Additional info:

Run as root or normal user. My hardware is Mellanox 40G Ethernet NICs with RoCE.

Comment 2 Mohammed Rafi KC 2015-07-13 09:58:24 UTC

This happens because, process failed to register large amount of data with rdma device, Please try to increase log_num_mtt (when loading the mlx4_core driver) and check if this helps.

Comment 3 dgbaley27 2015-07-14 02:57:57 UTC

The param that I see in mlx4_core is log_mtts_per_seg which I increased from 3 (which seems to be the default) to 7. I did this on my client and all servers. No change though, I still get the error.

Comment 5 dgbaley27 2015-07-14 17:38:11 UTC

The problem is related to performance.io-cache=off which is set by group=virt. If I set group=virt and then reset performance.io-cache, I do not get an error.

Comment 6 dgbaley27 2015-07-17 15:01:13 UTC

Eh, I'm not sure anymore. I still hit the error with io-cache. So maybe io-cache being off can hide the issue, but not always...

Comment 7 Kaushal 2017-03-08 10:57:06 UTC

This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.