Bug 1241621
Summary: | gfapi+rdma IO errors with large block sizes (Transport endpoint is not connected) | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | dgbaley27 |
Component: | rdma | Assignee: | Mohammed Rafi KC <rkavunga> |
Status: | CLOSED EOL | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.7.0 | CC: | bugs, chrisw, dgbaley27, nlevinki, rwheeler, sankarshan, smohan |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-08 10:57:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
dgbaley27
2015-07-09 15:48:48 UTC
This happens because, process failed to register large amount of data with rdma device, Please try to increase log_num_mtt (when loading the mlx4_core driver) and check if this helps. The param that I see in mlx4_core is log_mtts_per_seg which I increased from 3 (which seems to be the default) to 7. I did this on my client and all servers. No change though, I still get the error. The problem is related to performance.io-cache=off which is set by group=virt. If I set group=virt and then reset performance.io-cache, I do not get an error. Eh, I'm not sure anymore. I still hit the error with io-cache. So maybe io-cache being off can hide the issue, but not always... This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release. |