1243655 – Sharding - Use (f)xattrop (as opposed to (f)setxattr) to update shard size and block count

Bug 1243655 - Sharding - Use (f)xattrop (as opposed to (f)setxattr) to update shard size and block count

Summary: Sharding - Use (f)xattrop (as opposed to (f)setxattr) to update shard size an...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	sharding
Sub Component:
Version:	3.7.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:	bugs@gluster.org
Docs Contact:
URL:
Whiteboard:
Depends On:	1232391
Blocks:
TreeView+	depends on / blocked

Reported:	2015-07-16 03:20 UTC by Krutika Dhananjay
Modified:	2015-07-30 09:48 UTC (History)
CC List:	1 user (show)
Fixed In Version:	glusterfs-3.7.3
Clone Of:	1232391
Environment:
Last Closed:	2015-07-30 09:48:46 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Krutika Dhananjay 2015-07-16 03:20:11 UTC

+++ This bug was initially created as a clone of Bug #1232391 +++

Description of problem:

Running iozone on a sharded volume fails with EBADFD.
From the strace output of iozone, it was found that the application ended up reading fewer bytes than it was expecting to, and bailing out with EBADFD.
On closer inspection of the file on the backend, it was found that the 'size' xattr was reflecting a smaller number than the actual size of the file.

Turns out this can happen when write-behind flushes the cached writes in one go , causing them to hit the disk in an out-of-order fashion where the different io-threads perform the writes in parallel, without any serialisation. For example, when a write in the range [0-100] races with a write on the same file in [101-200] range, it could so happen that the second write hits the disk before the first. And this could cause the second write request to persist the file size as 200 followed by the second write request to persist the size as 100 bytes, leading to incorrect file size accounting.

And then, this bug can also be hit in cases where the applications performing I/O are multi-threaded in nature.

The solution involves using xattrop (adding/subtracting only the delta byte count) as opposed to setxattr to update the size.

Note that even with this approach, things could go wrong if two or more threads of an application, as part of writing off the end of a file, end up creating holes in overlapping regions where the hole's contribution to the file size could be counted more than once, leading to incorrect file size accounting. But this would be tackled through restructuring of the /.shard backend, which is a bigger change and will come in much later.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Anand Avati on 2015-06-30 09:27:29 EDT ---

REVIEW: http://review.gluster.org/11467 (features/shard: Use xattrop (as opposed to setxattr) for updates to size xattr) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

--- Additional comment from Anand Avati on 2015-07-15 03:43:31 EDT ---

REVIEW: http://review.gluster.org/11467 (features/shard: Use xattrop (as opposed to setxattr) for updates to size xattr) posted (#3) for review on master by wangzhen (linux_wz)

--- Additional comment from Anand Avati on 2015-07-15 04:27:01 EDT ---

REVIEW: http://review.gluster.org/11467 (features/shard: Use xattrop (as opposed to setxattr) for updates to size xattr) posted (#4) for review on master by Krutika Dhananjay (kdhananj)

Comment 1 Anand Avati 2015-07-16 03:27:51 UTC

REVIEW: http://review.gluster.org/11689 (features/shard: Use xattrop (as opposed to setxattr) for updates to size xattr) posted (#1) for review on release-3.7 by Krutika Dhananjay (kdhananj)

Comment 2 Anand Avati 2015-07-21 13:22:09 UTC

REVIEW: http://review.gluster.org/11689 (features/shard: Use xattrop (as opposed to setxattr) for updates to size xattr) posted (#2) for review on release-3.7 by Krutika Dhananjay (kdhananj)

Comment 3 Kaushal 2015-07-30 09:48:46 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report.

glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.