Bug 1232173

Summary: Incomplete self-heal and split-brain on directories found when self-healing files/dirs on a replaced disk
Product: [Community] GlusterFS Reporter: Anuradha <atalur>
Component: replicateAssignee: Anuradha <atalur>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.7.1CC: atalur, bugs, gluster-bugs, mzywusko, ravishankar, smohan, spandura
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1207829 Environment:
Last Closed: 2015-07-30 09:47:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1207829, 1255611    
Bug Blocks: 1140649    

Comment 1 Anand Avati 2015-06-16 11:56:51 UTC
REVIEW: http://review.gluster.org/11253 (glusterd/ afr : set afr pending xattrs on replace brick) posted (#1) for review on release-3.7 by Anuradha Talur (atalur)

Comment 2 Anand Avati 2015-06-16 12:11:05 UTC
REVIEW: http://review.gluster.org/11254 (cluster/afr : set pending xattrs for replaced brick) posted (#2) for review on release-3.7 by Anuradha Talur (atalur)

Comment 3 Anand Avati 2015-06-22 06:52:26 UTC
REVIEW: http://review.gluster.org/11253 (glusterd/ afr : set afr pending xattrs on replace brick) posted (#2) for review on release-3.7 by Anuradha Talur (atalur)

Comment 4 Anand Avati 2015-06-26 07:09:15 UTC
REVIEW: http://review.gluster.org/11254 (cluster/afr : set pending xattrs for replaced brick) posted (#4) for review on release-3.7 by Anuradha Talur (atalur)

Comment 5 Anand Avati 2015-06-27 11:27:16 UTC
COMMIT: http://review.gluster.org/11253 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit e28ac41c5ffc7b87f09b5bf2fe7f43cd4d4a5af5
Author: Anuradha <atalur>
Date:   Fri Jun 5 16:46:39 2015 +0530

    glusterd/ afr : set afr pending xattrs on replace brick
    
             Backport of: http://review.gluster.org/10076/
    
    This patch is part one change to prevent data loss
    in a replicate volume on doing a replace-brick commit
    force operation.
    
    Problem: After doing replace-brick commit force, there is a
    chance that self heal happens from the replaced (sink) brick
    rather than the source brick leading to data loss.
    
    Solution: During the commit phase of replace brick, after old
    brick is brought down, create a temporary mount and perform
    setfattr operation (on virtual xattr) indicating AFR to mark
    the replaced brick as sink.
    
    As a part of this change replace-brick command is being changed
    to use mgmt_v3 framework rather than op-state-machine framework.
    
    Many thanks to Krishnan Parthasarathi for helping me out on this.
    
    Change-Id: If0d51b5b3cef5b34d5672d46ea12eaa9d35fd894
    BUG: 1232173
    Signed-off-by: Anuradha Talur <atalur>
    Reviewed-on: http://review.gluster.org/11253
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Ravishankar N <ravishankar>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 6 Anand Avati 2015-06-27 13:09:02 UTC
COMMIT: http://review.gluster.org/11254 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit b319d712e97e1074cc6030220d00970d1262458b
Author: Anuradha <atalur>
Date:   Thu Jun 11 14:58:05 2015 +0530

    cluster/afr : set pending xattrs for replaced brick
    
           Backport of: http://review.gluster.org/10448/
                      & http://review.gluster.org/11416
    
    This patch is part two change to prevent data loss
    in a replicate volume on doing a replace-brick commit
    force operation.
    
    Problem: After doing replace-brick commit force, there is a
    chance that self heal might happen from the replaced (sink) brick
    rather than the source brick leading to data loss.
    
    Solution: Mark pending changelogs on afr children for
    the replaced afr-child so that heal is performed in the
    correct direction.
    
    Credits to Ravishankar N for patch 11416.
    
    Change-Id: Icb9807e49b4c1c4f1dcab115318d9a58ccf95675
    BUG: 1232173
    Reviewed-on: http://review.gluster.org/10448
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Krutika Dhananjay <kdhananj>
    Signed-off-by: Anuradha Talur <atalur>
    Reviewed-on: http://review.gluster.org/11254
    Tested-by: Gluster Build System <jenkins.com>

Comment 7 Kaushal 2015-07-30 09:47:26 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report.

glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 8 Kaushal 2015-07-30 09:49:13 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report.

glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user