Bug 1293265
Summary: | md5sum of files mismatch after the self-heal is complete on the file | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Krutika Dhananjay <kdhananj> |
Component: | replicate | Assignee: | bugs <bugs> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.7.6 | CC: | bugs, mzywusko, ravishankar, smohan, spandura |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.7.7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | 1292379 | Environment: | |
Last Closed: | 2016-04-19 07:51:26 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1141750, 1292379, 1293240 | ||
Bug Blocks: | 1139599, 1154491, 1293239 |
Description
Krutika Dhananjay
2015-12-21 09:25:22 UTC
REVIEW: http://review.gluster.org/13035 (cluster/afr: Fix data loss due to race between sh and ongoing write) posted (#1) for review on release-3.7 by Krutika Dhananjay (kdhananj) REVIEW: http://review.gluster.org/13036 (cluster/afr: Fix data loss due to race between sh and ongoing write) posted (#1) for review on release-3.7 by Krutika Dhananjay (kdhananj) REVIEW: http://review.gluster.org/13036 (cluster/afr: Fix data loss due to race between sh and ongoing write) posted (#2) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu) REVIEW: http://review.gluster.org/13036 (cluster/afr: Fix data loss due to race between sh and ongoing write) posted (#3) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu) REVIEW: http://review.gluster.org/13036 (cluster/afr: Fix data loss due to race between sh and ongoing write.) posted (#4) for review on release-3.7 by Krutika Dhananjay (kdhananj) REVIEW: http://review.gluster.org/13036 (cluster/afr: Fix data loss due to race between sh and ongoing write.) posted (#5) for review on release-3.7 by Krutika Dhananjay (kdhananj) REVIEW: http://review.gluster.org/13036 (cluster/afr: Fix data loss due to race between sh and ongoing write.) posted (#6) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu) COMMIT: http://review.gluster.org/13036 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) ------ commit b43aa481712dab5df813050119ba6c08f50dbfd9 Author: Krutika Dhananjay <kdhananj> Date: Thu Dec 17 17:41:08 2015 +0530 cluster/afr: Fix data loss due to race between sh and ongoing write. Backport of: http://review.gluster.org/#/c/13001/ Problem: When IO is happening on a file and a brick goes down comes back up during this time, protocol/client translator attempts reopening of the fd on the gfid handle of the file. But if another client renames this file while a brick was down && writes were in progress on it, once this brick is back up, there can be a race between reopening of the fd and entry self-heal replaying the effect of the rename() on the sink brick. If the reopening of the fd happens first, the application's writes continue to go into the data blocks associated with the gfid. Now entry-self-heal deletes 'src' and creates 'dst' file on the sink, marking dst as a 'newentry'. Data self-heal is also completed on 'dst' as a result and self-heal terminates. If at this point the application is still writing to this fd, all writes on the file after self-heal would go into the data blocks associated with this fd, which would be lost once the fd is closed. The result - the 'dst' file on the source and sink are not the same and there is no pending heal on the file, leading to silent corruption on the sink. Fix: Leverage http://review.gluster.org/#/c/12816/ to ensure the gfid handle path gets saved in .glusterfs/unlink until the fd is closed on the file. During this time, when self-heal sends mknod() with gfid of the file, do the following: link() the gfid handle under .glusterfs/unlink to the new path to be created in mknod() and rename() the gfid handle to go back under .glusterfs/ab/cd/. Change-Id: I5dc49c127ef0a1bf3cf4ce1b24610b1527f84d6f BUG: 1293265 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: http://review.gluster.org/13036 Reviewed-by: Pranith Kumar Karampuri <pkarampu> Tested-by: Pranith Kumar Karampuri <pkarampu> Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.7, please open a new bug report. glusterfs-3.7.7 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-users/2016-February/025292.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |