Bug 812963

Summary: Persistent split-brain on dir despite merge being possible
Product: [Community] GlusterFS Reporter: Jeff Darcy <jdarcy>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact: Shwetha Panduranga <shwetha.h.panduranga>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs, rfortier, rodrigo, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:26:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 817967    

Description Jeff Darcy 2012-04-16 16:43:06 UTC
From the mailing-list thread:

http://gluster.org/pipermail/gluster-users/2012-April/010080.html


How reproducible:

Every time.


Steps to Reproduce:

(1) Create a directory on a replica*2 volume.
(2) Kill one glusterfsd, create file_a from the client.
(3) Use "gluster volume start ... force" to restore normal operation.
(4) Kill the *other* glusterfsd, create file_b from the client.
(5) Repeat step 3.
(6) Call stat on the directory.
  

Actual results:

I/O error on the client, changelog flags remain set on servers.


Expected results:

Conservative merge should cross-heal both files, resulting in both file_a and file_b being present on both servers with no changelog flags set.  Actually this point is arguable because the case is indistinguishable from one where files were *deleted* on both bricks, but let's assume for the sake of argument that options are set to allow conservative (i.e. "conserve maximal data") self-heal in these cases.

Comment 1 Jeff Darcy 2012-04-16 16:45:20 UTC
Patch submitted.  http://review.gluster.com/3161

This is a nasty, ugly, horrible patch.  I really hope it's not adopted.  However, there is an unusual level of complexity in these code paths, and a concrete "this works no matter how ugly it is" might help others cut through the complexity to see what's necessary as we consider a better fix.

Comment 2 Pranith Kumar K 2012-04-16 18:14:22 UTC
Jeff,
     This is already handled in 3.2.6p3 release.
This is the forward port. It is yet to be accepted.
http://review.gluster.com/#change,3091,patchset=2

Pranith

Comment 3 Jeff Darcy 2012-04-16 20:51:42 UTC
I created a standalone version of Pranith's forward port (i.e. not depending on other patches which seem stuck in the Gerrit queue) and it does seem to work.

Comment 4 Pranith Kumar K 2012-05-23 02:55:00 UTC
Fix to 765587 fixes this.

Comment 5 Shwetha Panduranga 2012-05-29 08:33:14 UTC
Verified the fix on 3.3.0qa43. Bug is fixed.