Red Hat Bugzilla – Bug 812963
Persistent split-brain on dir despite merge being possible
Last modified: 2013-07-24 13:26:39 EDT
From the mailing-list thread:
Steps to Reproduce:
(1) Create a directory on a replica*2 volume.
(2) Kill one glusterfsd, create file_a from the client.
(3) Use "gluster volume start ... force" to restore normal operation.
(4) Kill the *other* glusterfsd, create file_b from the client.
(5) Repeat step 3.
(6) Call stat on the directory.
I/O error on the client, changelog flags remain set on servers.
Conservative merge should cross-heal both files, resulting in both file_a and file_b being present on both servers with no changelog flags set. Actually this point is arguable because the case is indistinguishable from one where files were *deleted* on both bricks, but let's assume for the sake of argument that options are set to allow conservative (i.e. "conserve maximal data") self-heal in these cases.
Patch submitted. http://review.gluster.com/3161
This is a nasty, ugly, horrible patch. I really hope it's not adopted. However, there is an unusual level of complexity in these code paths, and a concrete "this works no matter how ugly it is" might help others cut through the complexity to see what's necessary as we consider a better fix.
This is already handled in 3.2.6p3 release.
This is the forward port. It is yet to be accepted.
I created a standalone version of Pranith's forward port (i.e. not depending on other patches which seem stuck in the Gerrit queue) and it does seem to work.
Fix to 765587 fixes this.
Verified the fix on 3.3.0qa43. Bug is fixed.