812963 – Persistent split-brain on dir despite merge being possible

Bug 812963 - Persistent split-brain on dir despite merge being possible

Summary: Persistent split-brain on dir despite merge being possible

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Pranith Kumar K
QA Contact:	Shwetha Panduranga
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	817967
TreeView+	depends on / blocked

Reported:	2012-04-16 16:43 UTC by Jeff Darcy
Modified:	2018-11-28 20:41 UTC (History)
CC List:	4 users (show)
Fixed In Version:	glusterfs-3.4.0
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-07-24 17:26:39 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Jeff Darcy 2012-04-16 16:43:06 UTC

From the mailing-list thread:

http://gluster.org/pipermail/gluster-users/2012-April/010080.html


How reproducible:

Every time.


Steps to Reproduce:

(1) Create a directory on a replica*2 volume.
(2) Kill one glusterfsd, create file_a from the client.
(3) Use "gluster volume start ... force" to restore normal operation.
(4) Kill the *other* glusterfsd, create file_b from the client.
(5) Repeat step 3.
(6) Call stat on the directory.
  

Actual results:

I/O error on the client, changelog flags remain set on servers.


Expected results:

Conservative merge should cross-heal both files, resulting in both file_a and file_b being present on both servers with no changelog flags set.  Actually this point is arguable because the case is indistinguishable from one where files were *deleted* on both bricks, but let's assume for the sake of argument that options are set to allow conservative (i.e. "conserve maximal data") self-heal in these cases.

Comment 1 Jeff Darcy 2012-04-16 16:45:20 UTC

Patch submitted.  http://review.gluster.com/3161

This is a nasty, ugly, horrible patch.  I really hope it's not adopted.  However, there is an unusual level of complexity in these code paths, and a concrete "this works no matter how ugly it is" might help others cut through the complexity to see what's necessary as we consider a better fix.

Comment 2 Pranith Kumar K 2012-04-16 18:14:22 UTC

Jeff,
     This is already handled in 3.2.6p3 release.
This is the forward port. It is yet to be accepted.
http://review.gluster.com/#change,3091,patchset=2

Pranith

Comment 3 Jeff Darcy 2012-04-16 20:51:42 UTC

I created a standalone version of Pranith's forward port (i.e. not depending on other patches which seem stuck in the Gerrit queue) and it does seem to work.

Comment 4 Pranith Kumar K 2012-05-23 02:55:00 UTC

Fix to 765587 fixes this.

Comment 5 Shwetha Panduranga 2012-05-29 08:33:14 UTC

Verified the fix on 3.3.0qa43. Bug is fixed.

Note You need to log in before you can comment on or make changes to this bug.