Bug 1292379
Summary: | md5sum of files mismatch after the self-heal is complete on the file | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Krutika Dhananjay <kdhananj> | |
Component: | replicate | Assignee: | Krutika Dhananjay <kdhananj> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | mainline | CC: | bugs, mzywusko, ravishankar, rmekala, smohan, spandura | |
Target Milestone: | --- | Keywords: | Triaged | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8rc2 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1141750 | |||
: | 1293265 (view as bug list) | Environment: | ||
Last Closed: | 2016-06-16 13:51:17 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1141750, 1293240 | |||
Bug Blocks: | 1139599, 1154491, 1293239, 1293265 |
Description
Krutika Dhananjay
2015-12-17 09:22:02 UTC
REVIEW: http://review.gluster.org/13001 (cluster/afr: Fix data loss due to race between sh and ongoing write) posted (#1) for review on master by Krutika Dhananjay (kdhananj) REVIEW: http://review.gluster.org/13001 (cluster/afr: Fix data loss due to race between sh and ongoing write) posted (#2) for review on master by Krutika Dhananjay (kdhananj) REVIEW: http://review.gluster.org/13001 (cluster/afr: Fix data loss due to race between sh and ongoing write) posted (#3) for review on master by Krutika Dhananjay (kdhananj) REVIEW: http://review.gluster.org/13001 (cluster/afr: Fix data loss due to race between sh and ongoing write) posted (#4) for review on master by Krutika Dhananjay (kdhananj) REVIEW: http://review.gluster.org/13001 (cluster/afr: Fix data loss due to race between sh and ongoing write) posted (#5) for review on master by Krutika Dhananjay (kdhananj) COMMIT: http://review.gluster.org/13001 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 683c880a02086effc5009a8420289b445ea423f0 Author: Krutika Dhananjay <kdhananj> Date: Thu Dec 17 17:41:08 2015 +0530 cluster/afr: Fix data loss due to race between sh and ongoing write Problem: When IO is happening on a file and a brick goes down comes back up during this time, protocol/client translator attempts reopening of the fd on the gfid handle of the file. But if another client renames this file while a brick was down && writes were in progress on it, once this brick is back up, there can be a race between reopening of the fd and entry self-heal replaying the effect of the rename() on the sink brick. If the reopening of the fd happens first, the application's writes continue to go into the data blocks associated with the gfid. Now entry-self-heal deletes 'src' and creates 'dst' file on the sink, marking dst as a 'newentry'. Data self-heal is also completed on 'dst' as a result and self-heal terminates. If at this point the application is still writing to this fd, all writes on the file after self-heal would go into the data blocks associated with this fd, which would be lost once the fd is closed. The result - the 'dst' file on the source and sink are not the same and there is no pending heal on the file, leading to silent corruption on the sink. Fix: Leverage http://review.gluster.org/#/c/12816/ to ensure the gfid handle path gets saved in .glusterfs/unlink until the fd is closed on the file. During this time, when self-heal sends mknod() with gfid of the file, do the following: link() the gfid handle under .glusterfs/unlink to the new path to be created in mknod() and rename() the gfid handle to go back under .glusterfs/ab/cd/. Change-Id: I86ef1f97a76ffe11f32653bb995f575f7648f798 BUG: 1292379 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: http://review.gluster.org/13001 Reviewed-by: Pranith Kumar Karampuri <pkarampu> Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com> Tested with build glusterfs-server-3.7.5-13, and fallowing test passed so marking this bug as verified 1. Create 1x2 volume and mount it on client and using DD continuously write to the file (Disable selfheal and data and entry heal) 2. Bring down one of the brick and rename the file from other mount or session 3. Bring back the brick and enable self heal and lunch self heal and after completion of heal wait for some time and then kill the DD 4. Now on both bricks md5sum is matching sorry wrongly updated Ignore my previous comments This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |