Bug 1339639 - RFE : Feature: Automagic unsplit-brain policies for AFR
Summary: RFE : Feature: Automagic unsplit-brain policies for AFR
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.8.0
Hardware: All
OS: All
medium
medium
Target Milestone: ---
Assignee: Ravishankar N
QA Contact:
URL:
Whiteboard:
Depends On: 1262161 1328224
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-25 13:54 UTC by Ravishankar N
Modified: 2016-06-16 12:32 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.8.0
Doc Type: Enhancement
Doc Text:
Clone Of: 1328224
Environment:
Last Closed: 2016-06-16 12:32:47 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Ravishankar N 2016-05-25 13:54:47 UTC
+++ This bug was initially created as a clone of Bug #1328224 +++

+++ This bug was initially created as a clone of Bug #1262161 +++

Description of problem:
From time to time, GlusterFS users, admins (and even developers) can do unfortunate things to a volume which cause split-brain to files and directories.  In such cases where the so-called "wise fool" algorithm (aka change logs) cannot determine a clean version of the file an IO error will be bubbled up to the user; thus ruining their GlusterFS clustered storage experience.

The present solution for these cases is to go into the backend and delete or move the copies of the file that aren't desired, or "pinning" to a specific replica index (which is basically choosing randomly).  For large scale installations of GlusterFS this really isn't a workable solution, and quite often a simple heuristic based on time, size or majority will suffice to resolve things automagically to most end-users satisfaction.

This patch introduces policy based split-brain resolution.

Version-Release number of selected component (if applicable):
v3.6.x

How reproducible:
100%

Steps to Reproduce:
N/A

Actual results:
N/A

Expected results:
N/A

Additional info:
N/A

Comment 1 Vijay Bellur 2016-05-25 15:51:56 UTC
REVIEW: http://review.gluster.org/14535 (afr: Automagic unsplit-brain by [ctime|mtime|size|majority]) posted (#1) for review on release-3.8 by Ravishankar N (ravishankar)

Comment 2 Vijay Bellur 2016-05-27 14:56:40 UTC
COMMIT: http://review.gluster.org/14535 committed in release-3.8 by Niels de Vos (ndevos) 
------
commit 6f08b9f2b006a4eafaa176cfd792038eed7f6c98
Author: Ravishankar N <ravishankar>
Date:   Wed May 25 21:18:19 2016 +0530

    afr: Automagic unsplit-brain by [ctime|mtime|size|majority]
    
    Backport of http://review.gluster.org/#/c/14026/
    
    Introduce cluster.favorite-child-policy which when enabled with
    [ctime|mtime|size|majority], automatically heals files that are in
    split-brian.
    
    The majority policy will not pick a source if there is no majority.
    The other three policies pick the first brick with a valid reply and
    non-zero ctime/mtime/size as source.
    
    Change-Id: I93623a914dce2839957fce87b514050e9d274d4c
    BUG: 1339639
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/14535
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 3 Niels de Vos 2016-06-16 12:32:47 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.