Bug 1582056

Summary: Input/Output errors on a disperse volume with concurrent reads and writes
Product: [Community] GlusterFS Reporter: Xavi Hernandez <jahernan>
Component: disperseAssignee: Xavi Hernandez <jahernan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.1CC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-v4.1.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1578325 Environment:
Last Closed: 2018-06-20 18:06:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1578325    
Bug Blocks:    

Description Xavi Hernandez 2018-05-24 05:52:44 UTC
+++ This bug was initially created as a clone of Bug #1578325 +++

Description of problem:

When parallel-writes is enabled and multiple reads and writes (non-overlapping) are sent concurrently, sometimes writes fail with an Input/Output error of file not found error.

Version-Release number of selected component (if applicable): mainline


How reproducible:

Randomly on a Ganesha mount. I've been unable to reproduce it on FUSE.

Steps to Reproduce:
1. Create a disperse volume
2. Create a Ganesha cluster using gfapi
3. Mount three clients to three different NFS servers
4. From one client run Bonnie++
5. From second client run ls -la in a loop
6. From third client run du -sh in a loop

Actual results:

Bonnie++ fails in the rewriting test.

Expected results:

Bonnie++ shouldn't fail

Additional info:

As a workaround, disabling parallel-writes hides the problem.

Comment 1 Worker Ant 2018-05-24 06:07:58 UTC
REVIEW: https://review.gluster.org/20075 (cluster/ec: Fix pre-op xattrop management) posted (#1) for review on release-4.1 by Xavi Hernandez

Comment 2 Worker Ant 2018-05-25 02:07:27 UTC
COMMIT: https://review.gluster.org/20075 committed in release-4.1 by "Shyamsundar Ranganathan" <srangana> with a commit message- cluster/ec: Fix pre-op xattrop management

Multiple pre-op xattrop can be simultaneously being processed. On the cbk
it was checked if the fop was waiting for some specific data (like size and
version) and, if so, it was assumed that this answer should contain that
data.

This is not true, since a fop can be waiting for some data, but it may come
from the xattrop of another fop.

This patch differentiates between needing some information and providing it.

This is related to parallel writes. Disabling them fixed the problem, but
also prevented concurrent reads. A change has been made so that disabling
parallel writes still allows parallel reads.

Backport of:
> BUG: 1578325

Fixes: bz#1582056
Change-Id: I74772ad6b80b7b37805da93d5ec3ae099e96b041
Signed-off-by: Xavi Hernandez <xhernandez>

Comment 3 Shyamsundar 2018-06-20 18:06:39 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/