Bug 1276203

Summary: add-brick on a replicate volume could lead to data-loss
Product: [Community] GlusterFS Reporter: Anuradha <atalur>
Component: replicateAssignee: Anuradha <atalur>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: high    
Version: mainlineCC: bugs, sabose, sasundar, smohan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8rc2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1320020 (view as bug list) Environment:
Last Closed: 2016-06-16 13:41:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1248998, 1277939, 1320020, 1365455, 1366440, 1366444    

Description Anuradha 2015-10-29 05:14:26 UTC
Description of problem:
On increasing the replica count of a replicate volume (by add-brick command),
self-heal upon failure of a fop on old-brick and success on newly added brick could lead to reverse heal and hence data loss.

Pending xattrs should be marked indicating the new brick doesn't have the latest copy of data yet.
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Vijay Bellur 2015-10-29 05:18:04 UTC
REVIEW: http://review.gluster.org/12451 (glusterd / afr : Enable auto heal when replica count increases) posted (#1) for review on master by Anuradha Talur (atalur)

Comment 2 Vijay Bellur 2015-10-29 07:40:55 UTC
REVIEW: http://review.gluster.org/12451 (glusterd / afr : Enable auto heal when replica count increases) posted (#2) for review on master by Anuradha Talur (atalur)

Comment 3 Vijay Bellur 2016-01-13 06:18:27 UTC
REVIEW: http://review.gluster.org/12451 (glusterd / afr : Enable auto heal when replica count increases) posted (#3) for review on master by Anuradha Talur (atalur)

Comment 4 Vijay Bellur 2016-01-13 06:18:30 UTC
REVIEW: http://review.gluster.org/12454 (afr :  Enable auto heal when replica count increases) posted (#3) for review on master by Anuradha Talur (atalur)

Comment 5 Vijay Bellur 2016-02-03 07:35:54 UTC
REVIEW: http://review.gluster.org/12454 (afr :  Enable auto heal when replica count increases) posted (#4) for review on master by Anuradha Talur (atalur)

Comment 6 Vijay Bellur 2016-02-23 05:29:21 UTC
REVIEW: http://review.gluster.org/12451 (glusterd / afr : Enable auto heal when replica count increases) posted (#4) for review on master by Anuradha Talur (atalur)

Comment 7 Vijay Bellur 2016-02-23 05:29:24 UTC
REVIEW: http://review.gluster.org/12454 (afr : Enable auto heal when replica count increases) posted (#5) for review on master by Anuradha Talur (atalur)

Comment 8 Vijay Bellur 2016-02-23 06:43:38 UTC
REVIEW: http://review.gluster.org/12451 (glusterd / afr : Enable auto heal when replica count increases) posted (#5) for review on master by Anuradha Talur (atalur)

Comment 9 Vijay Bellur 2016-02-23 06:43:41 UTC
REVIEW: http://review.gluster.org/12454 (afr : Enable auto heal when replica count increases) posted (#6) for review on master by Anuradha Talur (atalur)

Comment 10 Vijay Bellur 2016-02-29 05:14:15 UTC
REVIEW: http://review.gluster.org/12451 (glusterd / afr : Enable auto heal when replica count increases) posted (#6) for review on master by Anuradha Talur (atalur)

Comment 11 Vijay Bellur 2016-03-02 07:32:46 UTC
REVIEW: http://review.gluster.org/12451 (glusterd / afr : Enable auto heal when replica count increases) posted (#7) for review on master by Anuradha Talur (atalur)

Comment 12 Vijay Bellur 2016-03-03 08:55:01 UTC
REVIEW: http://review.gluster.org/12454 (afr : Enable auto heal when replica count increases) posted (#7) for review on master by Anuradha Talur (atalur)

Comment 13 Vijay Bellur 2016-03-03 12:27:31 UTC
REVIEW: http://review.gluster.org/12454 (afr : Enable auto heal when replica count increases) posted (#8) for review on master by Anuradha Talur (atalur)

Comment 14 Vijay Bellur 2016-03-14 08:25:47 UTC
REVIEW: http://review.gluster.org/12451 (glusterd / afr : Enable auto heal when replica count increases) posted (#8) for review on master by Anuradha Talur (atalur)

Comment 15 Vijay Bellur 2016-03-14 08:25:50 UTC
REVIEW: http://review.gluster.org/12454 (afr : Enable auto heal when replica count increases) posted (#9) for review on master by Anuradha Talur (atalur)

Comment 16 Vijay Bellur 2016-03-16 05:25:46 UTC
REVIEW: http://review.gluster.org/12451 (glusterd / afr : Enable auto heal when replica count increases) posted (#9) for review on master by Anuradha Talur (atalur)

Comment 17 Vijay Bellur 2016-03-17 06:18:59 UTC
REVIEW: http://review.gluster.org/12454 (afr : Enable auto heal when replica count increases) posted (#10) for review on master by Anuradha Talur (atalur)

Comment 18 Vijay Bellur 2016-03-21 17:51:11 UTC
COMMIT: http://review.gluster.org/12451 committed in master by Atin Mukherjee (amukherj) 
------
commit 020bc022c342c4c015e29c63399757e36d653a49
Author: Anuradha Talur <atalur>
Date:   Wed Mar 16 10:55:09 2016 +0530

    glusterd / afr : Enable auto heal when replica count increases
    
    In replicate volumes, when a brick is added to a replicate
    group, heal to the new brick should be triggered.
    Also, the new brick should not be considered as source for
    healing till it is up to date.
    
    Previously, extended attributes had to be set manually on
    the bricks for this to happen. This patch is part 1 patch
    to automate this process.
    
    Change-Id: I29958448618372bfde23bf1dac5dd23dba1ad98f
    BUG: 1276203
    Signed-off-by: Anuradha Talur <atalur>
    Reviewed-on: http://review.gluster.org/12451
    Reviewed-by: Atin Mukherjee <amukherj>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Ravishankar N <ravishankar>
    Smoke: Gluster Build System <jenkins.com>

Comment 19 Vijay Bellur 2016-03-22 05:37:28 UTC
COMMIT: http://review.gluster.org/12454 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 8eaa3506ead4f11b81b146a9e56575c79f3aad7b
Author: Anuradha Talur <atalur>
Date:   Tue Feb 23 10:56:51 2016 +0530

    afr : Enable auto heal when replica count increases
    
    This patch is part two change to prevent data loss
    in a replicate volume on doing a add-brick operation.
    
    Problem: After doing add-brick, there is a chance
    that self heal might happen from the newly added
    brick rather than the source brick, leading to data loss.
    
    Solution: Mark pending changelogs on afr children for
    the new afr-child so that heal is performed in the
    correct direction.
    
    Change-Id: I11871e55eef3593aec874f92214a2d97da229b17
    BUG: 1276203
    Signed-off-by: Anuradha Talur <atalur>
    Reviewed-on: http://review.gluster.org/12454
    Smoke: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: Pranith Kumar Karampuri <pkarampu>
    CentOS-regression: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>

Comment 20 Niels de Vos 2016-06-16 13:41:47 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user