From the mailing-list thread: http://gluster.org/pipermail/gluster-users/2012-April/010080.html How reproducible: Every time. Steps to Reproduce: (1) Create a directory on a replica*2 volume. (2) Kill one glusterfsd, create file_a from the client. (3) Use "gluster volume start ... force" to restore normal operation. (4) Kill the *other* glusterfsd, create file_b from the client. (5) Repeat step 3. (6) Call stat on the directory. Actual results: I/O error on the client, changelog flags remain set on servers. Expected results: Conservative merge should cross-heal both files, resulting in both file_a and file_b being present on both servers with no changelog flags set. Actually this point is arguable because the case is indistinguishable from one where files were *deleted* on both bricks, but let's assume for the sake of argument that options are set to allow conservative (i.e. "conserve maximal data") self-heal in these cases.
Patch submitted. http://review.gluster.com/3161 This is a nasty, ugly, horrible patch. I really hope it's not adopted. However, there is an unusual level of complexity in these code paths, and a concrete "this works no matter how ugly it is" might help others cut through the complexity to see what's necessary as we consider a better fix.
Jeff, This is already handled in 3.2.6p3 release. This is the forward port. It is yet to be accepted. http://review.gluster.com/#change,3091,patchset=2 Pranith
I created a standalone version of Pranith's forward port (i.e. not depending on other patches which seem stuck in the Gerrit queue) and it does seem to work.
Fix to 765587 fixes this.
Verified the fix on 3.3.0qa43. Bug is fixed.