Bug 1379178

Summary: split brain on file recreate during "downed" brick.
Product: [Community] GlusterFS Reporter: Jaco Kroon <jaco>
Component: selfhealAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED EOL QA Contact:
Severity: urgent Docs Contact:
Priority: high    
Version: 3.7.14CC: bugs, hgowtham, joe
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-08 10:48:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaco Kroon 2016-09-25 19:40:55 UTC
Description of problem:

If a brick goes down, and a file gets recreated the GFID values end up differing resulting in a split brain due to mismatching GFID values.

In essence, let's say we have a file called some_file, with a gfid of 112233.  link count is 1 (2 on the brick, some_file and gfid link).

We have two bricks, A and B (replicated).

Brick B goes down.

On brick A some_file gets removed, and recreated (in the case I've seen with courier-imap it's actually a rename of a different file into some_file).

When some_file got removed the GFID gets discarded too.  When the file gets recreated a new GFID gets allocated, say aabbcc.

Brick B comes back up.

At this point some_file exists on both Brick A and Brick B but with differing GFID values.  Since the directory contents was modified during B outage the directory is marked for healing.  Due to the differing GFID values this fails.

Version-Release number of selected component (if applicable): 3.7.14

How reproducible:

Extremely.

Steps to Reproduce:
1.  Create a file when both bricks are up.
2.  Down a brick.
3.  rm the file.
4.  recreate the file.
5.  Up the downed brick.

Actual results:

File gives I/O errors and containing directory heal fails.

Expected results:

For downed brick to track the remove and recreate.

Additional info:

http://jkroon.blogs.uls.co.za/filesystems/glusterfs-and-courier-imap - I've tried to perform a thorough write-up and analysis - including possible solutions.

FYI - I definitely have an outage coming next weekend (Friday evening we've got another scheduled power outage on the other power rail this time).  I'm pretty sure this won't be solved by then, but at this stage I'm probably better off sticking to a single server with NFS solution and taking the beatings as they come.

If there is a way in which I can say that when brick A returns on Saturday morning to completely discard it's contents and heal in full from B (ie, full resync) that is something I'll need to consider.

Comment 1 Jaco Kroon 2016-09-25 19:43:14 UTC
I think a possible solution is to only unlink some_file in such a case with the brick down, that way the GFID will still exist on A, and since it's link-count is 1 we can know that all related files has been unlinked, and as such when we encounter the GFID on the opposing brick we can follow suit.

Comment 2 Kaushal 2017-03-08 10:48:24 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.