Bug 1730715 - An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file
Summary: An Input/Output error happens on a disperse volume when doing unaligned write...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Xavi Hernandez
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1731448 1732779 1739427 1739451 1805053
TreeView+ depends on / blocked
 
Reported: 2019-07-17 12:44 UTC by Xavi Hernandez
Modified: 2020-03-03 14:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1731448 1739427 1805053 (view as bug list)
Environment:
Last Closed: 2019-11-26 10:17:32 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 23066 0 None Merged cluster/ec: fix EIO error for concurrent writes on sparse files 2019-07-24 10:20:47 UTC

Description Xavi Hernandez 2019-07-17 12:44:14 UTC
Description of problem:

When a write not aligned to the stripe size is done concurrently with other wirtes on a sparse file of a disperse volume, EIO error can be returned in some cases.

Version-Release number of selected component (if applicable): mainline


How reproducible:

randomly

Steps to Reproduce:
1. Create a disperse volume
2. Create an empty file
3. Write to two non-overlapping areas of the file with unaligned offsets

Actual results:

In some cases the write to the lower offset fails with EIO.

Expected results:

Both writes should succeed.

Additional info:

EC doesn't allow concurrent writes on overlapping areas, they are serialized. However non-overlapping writes are serviced in parallel. When a write is not aligned, EC first needs to read the entire chunk from disk, apply the modified fragment and write it again.

Suppose we have a 4+2 disperse volume.

The problem appears on sparse files because a write to an offset implicitly creates data on offsets below it. For example, if a file is empty and we read 10 bytes from offset 10, read() will return 0 bytes. Now, if we write one byte at offset 1M and retry the same read, the system call will return 10 bytes (all containing 0's).

So if we have two writes, the first one at offset 10 and the second one at offset 1M, EC will send both in parallel because they do not overlap. However, the first one will try to read missing data from the first chunk (i.e. offset 0 to 9) to recombine the entire chunk and do the final write. This read will happen in parallel with the write to 1M. What could happen is that 3 bricks process the write before the read, and the other 3 process the read before the write. First 3 bricks will return 10 bytes, while the latest three will return 0 (because the file on the brick has not been expanded yet).

When EC tries to recombine the answers from the bricks, it can't, because it needs at least 4 consistent answers to recover the data. So this read fails with EIO error. This error is propagated to the parent write, which is aborted and EIO is returned to the application.

Comment 1 Worker Ant 2019-07-17 12:58:56 UTC
REVIEW: https://review.gluster.org/23066 (cluster/ec: fix EIO error for concurrent writes on sparse files) posted (#1) for review on master by Xavi Hernandez

Comment 2 Worker Ant 2019-07-24 10:20:48 UTC
REVIEW: https://review.gluster.org/23066 (cluster/ec: fix EIO error for concurrent writes on sparse files) merged (#4) on master by Pranith Kumar Karampuri

Comment 3 Worker Ant 2019-07-27 06:41:19 UTC
REVIEW: https://review.gluster.org/23113 (cluster/ec: fix EIO error for concurrent writes on sparse files) posted (#1) for review on release-6 by lidi

Comment 4 Worker Ant 2019-08-09 11:26:11 UTC
REVISION POSTED: https://review.gluster.org/23113 (cluster/ec: fix EIO error for concurrent writes on sparse files) posted (#2) for review on release-6 by Pranith Kumar Karampuri

Comment 5 Xavi Hernandez 2019-11-26 10:17:32 UTC
This bug has been accidentally reopened. The patch is already merged.


Note You need to log in before you can comment on or make changes to this bug.