Bug 812963 - Persistent split-brain on dir despite merge being possible
Summary: Persistent split-brain on dir despite merge being possible
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact: Shwetha Panduranga
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-04-16 16:43 UTC by Jeff Darcy
Modified: 2018-11-28 20:41 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:26:39 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Jeff Darcy 2012-04-16 16:43:06 UTC
From the mailing-list thread:

http://gluster.org/pipermail/gluster-users/2012-April/010080.html


How reproducible:

Every time.


Steps to Reproduce:

(1) Create a directory on a replica*2 volume.
(2) Kill one glusterfsd, create file_a from the client.
(3) Use "gluster volume start ... force" to restore normal operation.
(4) Kill the *other* glusterfsd, create file_b from the client.
(5) Repeat step 3.
(6) Call stat on the directory.
  

Actual results:

I/O error on the client, changelog flags remain set on servers.


Expected results:

Conservative merge should cross-heal both files, resulting in both file_a and file_b being present on both servers with no changelog flags set.  Actually this point is arguable because the case is indistinguishable from one where files were *deleted* on both bricks, but let's assume for the sake of argument that options are set to allow conservative (i.e. "conserve maximal data") self-heal in these cases.

Comment 1 Jeff Darcy 2012-04-16 16:45:20 UTC
Patch submitted.  http://review.gluster.com/3161

This is a nasty, ugly, horrible patch.  I really hope it's not adopted.  However, there is an unusual level of complexity in these code paths, and a concrete "this works no matter how ugly it is" might help others cut through the complexity to see what's necessary as we consider a better fix.

Comment 2 Pranith Kumar K 2012-04-16 18:14:22 UTC
Jeff,
     This is already handled in 3.2.6p3 release.
This is the forward port. It is yet to be accepted.
http://review.gluster.com/#change,3091,patchset=2

Pranith

Comment 3 Jeff Darcy 2012-04-16 20:51:42 UTC
I created a standalone version of Pranith's forward port (i.e. not depending on other patches which seem stuck in the Gerrit queue) and it does seem to work.

Comment 4 Pranith Kumar K 2012-05-23 02:55:00 UTC
Fix to 765587 fixes this.

Comment 5 Shwetha Panduranga 2012-05-29 08:33:14 UTC
Verified the fix on 3.3.0qa43. Bug is fixed.


Note You need to log in before you can comment on or make changes to this bug.