+++ This bug was initially created as a clone of Bug #1417522 +++ +++ This bug was initially created as a clone of Bug #1417177 +++ Description of problem: ====================== Automatic split brain resolution must come into effect only when all the bricks are up, else we would be serving inconsistent or undesired data as explained below Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. create a 1x3 volume (clientside quorum is enabled by default) with say b1, b2 ,b3 also set favorite child policy to say mtime(automatic resolution of splitbrain) 2. fuse mount the volume on three different clients in below fashion c1: can ping only b1, b2 bricks and not b3 c2: can ping only b2,b3 and not b1 c3: can ping all bricks 3. now create a file say f1 from c3 ==>that means c3 is now Available on all bricks 4. now append from c1 say line-c1 and from c2 line-c2 to file f1 that means b2 will mark b1 pending with line-c2 b2 will also mark b2 pending with line-c1 that means b2 has the only good copy 5. Now bring down b2 6. heal info will now show f1 as in splitbrain as b1 blames b3 and b3 blames b1 Ideally the file should now give IO error for new writes 7. however that means automatic splitbrain resolution will pick this file f1 for resolving. But that is wrong as the good copy is on b2 which is down. With the resolving users can now access the file f1 which must not actually be allowed, as this means the contents on the actual good copy are lost when b2 comes back up, as that is healed because now b1 and b3 blame b2 expected behvior: 1)b2 has the good copy which is down, hence not further writes must be allowed 2) when b2 comes back up, it must be soruce to b1 and b3 instead of healing via automatic splitbrain and marking b2 as bad copy Solution: make sure automatic splitbrain doesnt take effect on afr replica set when even one of the bricks are down Actual results: Expected results: Additional info: --- Additional comment from Red Hat Bugzilla Rules Engine on 2017-01-27 07:23:49 EST --- This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.2.0' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from Worker Ant on 2017-01-29 23:33:02 EST --- REVIEW: https://review.gluster.org/16476 (afr: all children of AFR must be up to resolve s-brain) posted (#1) for review on master by Ravishankar N (ravishankar) --- Additional comment from Worker Ant on 2017-02-07 07:07:30 EST --- REVIEW: https://review.gluster.org/16476 (afr: all children of AFR must be up to resolve s-brain) posted (#2) for review on master by Ravishankar N (ravishankar) --- Additional comment from Worker Ant on 2017-02-08 07:39:52 EST --- REVIEW: https://review.gluster.org/16476 (afr: all children of AFR must be up to resolve s-brain) posted (#3) for review on master by Ravishankar N (ravishankar) --- Additional comment from Worker Ant on 2017-02-09 20:37:04 EST --- COMMIT: https://review.gluster.org/16476 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 0e03336a9362e5717e561f76b0c543e5a197b31b Author: Ravishankar N <ravishankar> Date: Mon Jan 30 09:54:16 2017 +0530 afr: all children of AFR must be up to resolve s-brain Problem: The various split-brain resolution policies (favorite-child-policy based, CLI based and mount (get/setfattr) based) attempt to resolve split-brain even when not all bricks of replica are up. This can be a problem when say in a replica 3, the only good copy is down and the other 2 bricks are up and blame each other (i.e. split-brain). We end up healing the file in such a case and allow I/O on it. Fix: A decision on whether the file is in split-brain or not must be taken only if we are able to examine the afr xattrs of *all* bricks of a given replica. Change-Id: Icddb1268b380005799990f5379ef957d84639ef9 BUG: 1417522 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: https://review.gluster.org/16476 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
REVIEW: https://review.gluster.org/16587 (afr: all children of AFR must be up to resolve s-brain) posted (#1) for review on release-3.10 by Ravishankar N (ravishankar)
COMMIT: https://review.gluster.org/16587 committed in release-3.10 by Shyamsundar Ranganathan (srangana) ------ commit 8de5213db8771088ae214d42bcae056e409d7b6a Author: Ravishankar N <ravishankar> Date: Mon Jan 30 09:54:16 2017 +0530 afr: all children of AFR must be up to resolve s-brain Problem: The various split-brain resolution policies (favorite-child-policy based, CLI based and mount (get/setfattr) based) attempt to resolve split-brain even when not all bricks of replica are up. This can be a problem when say in a replica 3, the only good copy is down and the other 2 bricks are up and blame each other (i.e. split-brain). We end up healing the file in such a case and allow I/O on it. Fix: A decision on whether the file is in split-brain or not must be taken only if we are able to examine the afr xattrs of *all* bricks of a given replica. Signed-off-by: Ravishankar N <ravishankar> > Reviewed-on: https://review.gluster.org/16476 > Smoke: Gluster Build System <jenkins.org> > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu> (cherry picked from commit 0e03336a9362e5717e561f76b0c543e5a197b31b) Change-Id: Icddb1268b380005799990f5379ef957d84639ef9 BUG: 1420982 Reviewed-on: https://review.gluster.org/16587 Tested-by: Ravishankar N <ravishankar> Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Shyamsundar Ranganathan <srangana>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report. glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html [2] https://www.gluster.org/pipermail/gluster-users/