Bug 812963 - Persistent split-brain on dir despite merge being possible
Persistent split-brain on dir despite merge being possible
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Pranith Kumar K
Shwetha Panduranga
Depends On:
Blocks: 817967
  Show dependency treegraph
Reported: 2012-04-16 12:43 EDT by Jeff Darcy
Modified: 2013-07-24 13:26 EDT (History)
4 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-07-24 13:26:39 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Jeff Darcy 2012-04-16 12:43:06 EDT
From the mailing-list thread:


How reproducible:

Every time.

Steps to Reproduce:

(1) Create a directory on a replica*2 volume.
(2) Kill one glusterfsd, create file_a from the client.
(3) Use "gluster volume start ... force" to restore normal operation.
(4) Kill the *other* glusterfsd, create file_b from the client.
(5) Repeat step 3.
(6) Call stat on the directory.

Actual results:

I/O error on the client, changelog flags remain set on servers.

Expected results:

Conservative merge should cross-heal both files, resulting in both file_a and file_b being present on both servers with no changelog flags set.  Actually this point is arguable because the case is indistinguishable from one where files were *deleted* on both bricks, but let's assume for the sake of argument that options are set to allow conservative (i.e. "conserve maximal data") self-heal in these cases.
Comment 1 Jeff Darcy 2012-04-16 12:45:20 EDT
Patch submitted.  http://review.gluster.com/3161

This is a nasty, ugly, horrible patch.  I really hope it's not adopted.  However, there is an unusual level of complexity in these code paths, and a concrete "this works no matter how ugly it is" might help others cut through the complexity to see what's necessary as we consider a better fix.
Comment 2 Pranith Kumar K 2012-04-16 14:14:22 EDT
     This is already handled in 3.2.6p3 release.
This is the forward port. It is yet to be accepted.

Comment 3 Jeff Darcy 2012-04-16 16:51:42 EDT
I created a standalone version of Pranith's forward port (i.e. not depending on other patches which seem stuck in the Gerrit queue) and it does seem to work.
Comment 4 Pranith Kumar K 2012-05-22 22:55:00 EDT
Fix to 765587 fixes this.
Comment 5 Shwetha Panduranga 2012-05-29 04:33:14 EDT
Verified the fix on 3.3.0qa43. Bug is fixed.

Note You need to log in before you can comment on or make changes to this bug.