1241621 – gfapi+rdma IO errors with large block sizes (Transport endpoint is not connected)

Bug 1241621 - gfapi+rdma IO errors with large block sizes (Transport endpoint is not connected)

Summary: gfapi+rdma IO errors with large block sizes (Transport endpoint is not connec...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	rdma
Sub Component:
Version:	3.7.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Mohammed Rafi KC
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-07-09 15:48 UTC by dgbaley27
Modified:	2017-03-08 10:57 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-03-08 10:57:06 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description dgbaley27 2015-07-09 15:48:48 UTC

Description of problem:

When performing IO benchmarks using FIO and libgfapi+rdma, I encountered errors with sequential read workloads which turned out to be because of reads with large block sizes (16M in my cases). The error comes from "Transport endpoint is not connected". It does not occur with an identical setup but with TCP instead of RDMA.

Version-Release number of selected component (if applicable):

I'm using GlusterFS 3.7. I have not tried with an earlier version.

How reproducible:

This is reproducible 100% of the time.

Steps to Reproduce:
1. Create a volume with RDMA enabled. In detail my volume is group=virt, nfs.disable=on, and server.allow-insecure=on. Additionally I'm using replicate-3 and no striping.
2. Set memlock to unlimited
3. Run as root or normal user
4. fio --name=test --ioengine=gfapi --brick HOSTNAME --volume=VOLNAME --numjobs=1 --ramp_time=1 --runtime=5 --time_based --fallocate=keep --direct=1 --bs=16m --rw=read --size=2g --unlink=1 --minimal

Actual results:

"Transport endpoint is not connected"

Expected results:

No errors, valid benchmark data


Additional info:

Run as root or normal user. My hardware is Mellanox 40G Ethernet NICs with RoCE.

Comment 2 Mohammed Rafi KC 2015-07-13 09:58:24 UTC

This happens because, process failed to register large amount of data with rdma device, Please try to increase log_num_mtt (when loading the mlx4_core driver) and check if this helps.

Comment 3 dgbaley27 2015-07-14 02:57:57 UTC

The param that I see in mlx4_core is log_mtts_per_seg which I increased from 3 (which seems to be the default) to 7. I did this on my client and all servers. No change though, I still get the error.

Comment 5 dgbaley27 2015-07-14 17:38:11 UTC

The problem is related to performance.io-cache=off which is set by group=virt. If I set group=virt and then reset performance.io-cache, I do not get an error.

Comment 6 dgbaley27 2015-07-17 15:01:13 UTC

Eh, I'm not sure anymore. I still hit the error with io-cache. So maybe io-cache being off can hide the issue, but not always...

Comment 7 Kaushal 2017-03-08 10:57:06 UTC

This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.

Note You need to log in before you can comment on or make changes to this bug.