Bug 1420982

Summary: Automatic split brain resolution must check for all the bricks to be up to avoiding serving of inconsistent data(visible on x3 or more)
Product: [Community] GlusterFS Reporter: Ravishankar N <ravishankar>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.10CC: bugs, ravishankar
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.10.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1417522 Environment:
Last Closed: 2017-03-06 17:45:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1417177, 1417522, 1420983, 1420984    
Bug Blocks:    

Description Ravishankar N 2017-02-10 04:12:41 UTC
+++ This bug was initially created as a clone of Bug #1417522 +++

+++ This bug was initially created as a clone of Bug #1417177 +++

Description of problem:
======================
Automatic split brain resolution must come into effect only when all the bricks are up, else we would be serving inconsistent or undesired data as explained below




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. create a 1x3 volume (clientside quorum is enabled by default) with say b1, b2 ,b3
also set favorite child policy to say mtime(automatic resolution of splitbrain)

2. fuse mount the volume on three different clients in below fashion
c1: can ping only b1, b2 bricks and not b3
c2: can ping only b2,b3 and not b1
c3: can ping all bricks

3. now create a file say f1 from c3 ==>that means c3 is now Available on all bricks
4. now append  from c1 say line-c1 and from c2 line-c2 to file f1
 that means b2 will mark b1 pending with line-c2 
            b2 will also mark b2 pending with line-c1

that means b2 has the only good copy

5. Now bring down b2 
6. heal info will now show f1 as in splitbrain as b1 blames b3 and b3 blames b1

Ideally the file should now give IO error for new writes
7. however that means automatic splitbrain resolution will pick this file f1 for resolving.
But that is wrong as the good copy is on b2 which is down.

With the resolving users can now access the file f1 which must not actually be allowed, as this means the contents on the actual good copy are lost when b2 comes back up, as that is healed because now b1 and b3 blame b2


expected behvior:
1)b2 has the good copy which is down, hence not further writes must be allowed
2) when b2 comes back up, it must be soruce to b1 and b3 instead of healing via automatic splitbrain and marking b2 as bad copy


Solution:
make sure automatic splitbrain doesnt take effect on afr replica set when even one  of the bricks are down



Actual results:


Expected results:


Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-01-27 07:23:49 EST ---

This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.2.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Worker Ant on 2017-01-29 23:33:02 EST ---

REVIEW: https://review.gluster.org/16476 (afr: all children of AFR must be up to resolve s-brain) posted (#1) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-02-07 07:07:30 EST ---

REVIEW: https://review.gluster.org/16476 (afr: all children of AFR must be up to resolve s-brain) posted (#2) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-02-08 07:39:52 EST ---

REVIEW: https://review.gluster.org/16476 (afr: all children of AFR must be up to resolve s-brain) posted (#3) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-02-09 20:37:04 EST ---

COMMIT: https://review.gluster.org/16476 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 0e03336a9362e5717e561f76b0c543e5a197b31b
Author: Ravishankar N <ravishankar>
Date:   Mon Jan 30 09:54:16 2017 +0530

    afr: all children of AFR must be up to resolve s-brain
    
    Problem:
    The various split-brain resolution policies (favorite-child-policy based,
    CLI based and mount (get/setfattr) based) attempt to resolve split-brain
    even when not all bricks of replica are up. This can be a problem when
    say in a replica 3, the only good copy is down and the other 2 bricks
    are up and blame each other (i.e. split-brain). We end up healing the
    file in such a  case and allow I/O on it.
    
    Fix:
    A decision on whether the file is in split-brain or not must be taken
    only if we are able to examine the afr xattrs of *all* bricks of a given
    replica.
    
    Change-Id: Icddb1268b380005799990f5379ef957d84639ef9
    BUG: 1417522
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: https://review.gluster.org/16476
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 1 Worker Ant 2017-02-10 04:14:23 UTC
REVIEW: https://review.gluster.org/16587 (afr: all children of AFR must be up to resolve s-brain) posted (#1) for review on release-3.10 by Ravishankar N (ravishankar)

Comment 2 Worker Ant 2017-02-15 12:31:49 UTC
COMMIT: https://review.gluster.org/16587 committed in release-3.10 by Shyamsundar Ranganathan (srangana) 
------
commit 8de5213db8771088ae214d42bcae056e409d7b6a
Author: Ravishankar N <ravishankar>
Date:   Mon Jan 30 09:54:16 2017 +0530

    afr: all children of AFR must be up to resolve s-brain
    
    Problem:
    The various split-brain resolution policies (favorite-child-policy based,
    CLI based and mount (get/setfattr) based) attempt to resolve split-brain
    even when not all bricks of replica are up. This can be a problem when
    say in a replica 3, the only good copy is down and the other 2 bricks
    are up and blame each other (i.e. split-brain). We end up healing the
    file in such a  case and allow I/O on it.
    
    Fix:
    A decision on whether the file is in split-brain or not must be taken
    only if we are able to examine the afr xattrs of *all* bricks of a given
    replica.
    
    Signed-off-by: Ravishankar N <ravishankar>
    > Reviewed-on: https://review.gluster.org/16476
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    
    (cherry picked from commit 0e03336a9362e5717e561f76b0c543e5a197b31b)
    Change-Id: Icddb1268b380005799990f5379ef957d84639ef9
    BUG: 1420982
    Reviewed-on: https://review.gluster.org/16587
    Tested-by: Ravishankar N <ravishankar>
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 3 Shyamsundar 2017-03-06 17:45:52 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/