1154491 – split-brain reported on files whose change-logs are all zeros

Bug 1154491 - split-brain reported on files whose change-logs are all zeros

Summary: split-brain reported on files whose change-logs are all zeros

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Pranith Kumar K
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1139599 1141750 1292379 1293239 1293240 1293265
Blocks:
TreeView+	depends on / blocked

Reported:	2014-10-20 04:04 UTC by Pranith Kumar K
Modified:	2017-08-24 12:02 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:	1139599
Environment:
Last Closed:	2017-08-24 12:02:53 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 1 Pranith Kumar K 2014-10-20 04:05:04 UTC

Steps to Reproduce: (Not sure of the exact steps which led to this issue. Here are the steps performed)
===============================================================================
1. Create 2 x 2 distribute-replicate volume. Start the volume

2. Create 2 fuse and 2 nfs mounts from 2 client machines. 

3. Start the script on fuse mounts from each client.

"./create_dirs_files_multi_thread.py --number-of-threads 100 --num-files-per-dir 25 --min-file-size 1024 --max-file-size 10240 --starting-dir-num 1 --dir-depth 5 --dir-width 4" 

"./create_dirs_files_multi_thread.py --number-of-threads 100 --num-files-per-dir 25 --min-file-size 1024 --max-file-size 10240 --starting-dir-num 101 --dir-depth 5 --dir-width 4"

4. While creation is in process, bring down brick1 and brick4. (crashed the brick devices using godown xfstests utility)

5. Start the script on nfs mounts from each client.

"./create_dirs_files_multi_thread.py --number-of-threads 100 --num-files-per-dir 25 --min-file-size 1024 --max-file-size 10240 --starting-dir-num 1 --dir-depth 5 --dir-width 4" 

"./create_dirs_files_multi_thread.py --number-of-threads 100 --num-files-per-dir 25 --min-file-size 1024 --max-file-size 10240 --starting-dir-num 101 --dir-depth 5 --dir-width 4"

6. On the other fuse mount from both the clients execute: "find . -type d | wc"

7. On the other nfs mount from both the clients execute : "find . -type f | wc"

6. Add 4 bricks to the volume to make it 4 x 2 dis-rep volume.

7. bring back the bricks brick1, brick4.

8. start rebalance. 

9. while rebalance is in progress, renames files.

Comment 2 Pranith Kumar K 2014-10-20 04:06:45 UTC

RCA from https://bugzilla.redhat.com/show_bug.cgi?id=1141750#c3 which has same RCA but different manifestation:

Simpler test to re-create the bug:
0) Create a replicate volume 1x2 start it and mount it.
1) Open a file 'a' from the mount and keep writing to it.
2) Bring one of the bricks down
3) rename the file '<mnt>/a' to '<mnt>/b'
4) Wait for at least one write to complete while the brick is still down.
5) Restart the brick
6) Wait until self-heal completes and stop the 'writing' from mount point.

Root cause:
When Rename happens while the brick is down after the brick comes back up, entry self-heal is triggered on the parent directory of where the rename happened, in this case that is <mnt>. As part of this entry self-heal 
1) file 'a' is deleted and
2) file 'b' is re-created.

0) In parallel to this, writing fd needs to be opened on the file from the mount point.

If re-opening of the file in step-0) happens before step-1) of self-heal then this issue is observed. Writes from mount keep going to the file that was deleted where as the self-heal happens on the file created at step-2. So the checksum mismatches. One more manifestation of this issue is https://bugzilla.redhat.com/show_bug.cgi?id=1139599. Where writes from the mount only increase the file on the 'always up' brick but the file on the other brick is not growing. This leads to split-brain because of size mismatch but all-zero pending changelog.

It is a day-1 bug in link-self-heal of entry-self-heal. Bug exists from 2012. This is the bug in implementation of the following commit:

commit 1936e29c3ac3d6466d391545d761ad8e60ae2e03
Author: Pranith Kumar K <pranithk>
Date:   Wed Feb 29 16:31:18 2012 +0530

    cluster/afr: Hardlink Self-heal
    
    Change-Id: Iea0b38011edbc7fc9d75a95d1775139a5234e514
    BUG: 765391
    Signed-off-by: Pranith Kumar K <pranithk>
    Reviewed-on: http://review.gluster.com/2841
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Amar Tumballi <amarts>
    Reviewed-by: Vijay Bellur <vijay>

Very good testing shwetha!

Thanks a lot for the help in re-creating and condensing the test case kritika!

Pranith

Comment 3 Krutika Dhananjay 2015-12-21 09:31:31 UTC

http://review.gluster.org/#/c/13001/

Note You need to log in before you can comment on or make changes to this bug.