Bug 1379178 - split brain on file recreate during "downed" brick.
Summary: split brain on file recreate during "downed" brick.
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: selfheal
Version: 3.7.14
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-25 19:40 UTC by Jaco Kroon
Modified: 2017-03-08 10:48 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-08 10:48:24 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Jaco Kroon 2016-09-25 19:40:55 UTC
Description of problem:

If a brick goes down, and a file gets recreated the GFID values end up differing resulting in a split brain due to mismatching GFID values.

In essence, let's say we have a file called some_file, with a gfid of 112233.  link count is 1 (2 on the brick, some_file and gfid link).

We have two bricks, A and B (replicated).

Brick B goes down.

On brick A some_file gets removed, and recreated (in the case I've seen with courier-imap it's actually a rename of a different file into some_file).

When some_file got removed the GFID gets discarded too.  When the file gets recreated a new GFID gets allocated, say aabbcc.

Brick B comes back up.

At this point some_file exists on both Brick A and Brick B but with differing GFID values.  Since the directory contents was modified during B outage the directory is marked for healing.  Due to the differing GFID values this fails.

Version-Release number of selected component (if applicable): 3.7.14

How reproducible:

Extremely.

Steps to Reproduce:
1.  Create a file when both bricks are up.
2.  Down a brick.
3.  rm the file.
4.  recreate the file.
5.  Up the downed brick.

Actual results:

File gives I/O errors and containing directory heal fails.

Expected results:

For downed brick to track the remove and recreate.

Additional info:

http://jkroon.blogs.uls.co.za/filesystems/glusterfs-and-courier-imap - I've tried to perform a thorough write-up and analysis - including possible solutions.

FYI - I definitely have an outage coming next weekend (Friday evening we've got another scheduled power outage on the other power rail this time).  I'm pretty sure this won't be solved by then, but at this stage I'm probably better off sticking to a single server with NFS solution and taking the beatings as they come.

If there is a way in which I can say that when brick A returns on Saturday morning to completely discard it's contents and heal in full from B (ie, full resync) that is something I'll need to consider.

Comment 1 Jaco Kroon 2016-09-25 19:43:14 UTC
I think a possible solution is to only unlink some_file in such a case with the brick down, that way the GFID will still exist on A, and since it's link-count is 1 we can know that all related files has been unlinked, and as such when we encounter the GFID on the opposing brick we can follow suit.

Comment 2 Kaushal 2017-03-08 10:48:24 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.


Note You need to log in before you can comment on or make changes to this bug.