Bug 1339639

Summary: RFE : Feature: Automagic unsplit-brain policies for AFR
Product: [Community] GlusterFS Reporter: Ravishankar N <ravishankar>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.8.0CC: bugs
Target Milestone: ---Keywords: FutureFeature, Triaged
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: glusterfs-3.8.0 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 1328224 Environment:
Last Closed: 2016-06-16 12:32:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1262161, 1328224    
Bug Blocks:    

Description Ravishankar N 2016-05-25 13:54:47 UTC
+++ This bug was initially created as a clone of Bug #1328224 +++

+++ This bug was initially created as a clone of Bug #1262161 +++

Description of problem:
From time to time, GlusterFS users, admins (and even developers) can do unfortunate things to a volume which cause split-brain to files and directories.  In such cases where the so-called "wise fool" algorithm (aka change logs) cannot determine a clean version of the file an IO error will be bubbled up to the user; thus ruining their GlusterFS clustered storage experience.

The present solution for these cases is to go into the backend and delete or move the copies of the file that aren't desired, or "pinning" to a specific replica index (which is basically choosing randomly).  For large scale installations of GlusterFS this really isn't a workable solution, and quite often a simple heuristic based on time, size or majority will suffice to resolve things automagically to most end-users satisfaction.

This patch introduces policy based split-brain resolution.

Version-Release number of selected component (if applicable):
v3.6.x

How reproducible:
100%

Steps to Reproduce:
N/A

Actual results:
N/A

Expected results:
N/A

Additional info:
N/A

Comment 1 Vijay Bellur 2016-05-25 15:51:56 UTC
REVIEW: http://review.gluster.org/14535 (afr: Automagic unsplit-brain by [ctime|mtime|size|majority]) posted (#1) for review on release-3.8 by Ravishankar N (ravishankar)

Comment 2 Vijay Bellur 2016-05-27 14:56:40 UTC
COMMIT: http://review.gluster.org/14535 committed in release-3.8 by Niels de Vos (ndevos) 
------
commit 6f08b9f2b006a4eafaa176cfd792038eed7f6c98
Author: Ravishankar N <ravishankar>
Date:   Wed May 25 21:18:19 2016 +0530

    afr: Automagic unsplit-brain by [ctime|mtime|size|majority]
    
    Backport of http://review.gluster.org/#/c/14026/
    
    Introduce cluster.favorite-child-policy which when enabled with
    [ctime|mtime|size|majority], automatically heals files that are in
    split-brian.
    
    The majority policy will not pick a source if there is no majority.
    The other three policies pick the first brick with a valid reply and
    non-zero ctime/mtime/size as source.
    
    Change-Id: I93623a914dce2839957fce87b514050e9d274d4c
    BUG: 1339639
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/14535
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 3 Niels de Vos 2016-06-16 12:32:47 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user