Bug 1005485 - AFR: writes are successful on files which are in split-brain state
AFR: writes are successful on files which are in split-brain state
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate (Show other bugs)
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Anuradha
Depends On:
  Show dependency treegraph
Reported: 2013-09-07 10:27 EDT by spandura
Modified: 2016-09-19 22:00 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2015-12-03 12:17:53 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description spandura 2013-09-07 10:27:06 EDT
Description of problem:
In a replicate volume (1x2) when a file is in split-brain state IO's are successful on the file and self-heal happens from brick which has the file size greater to other brick. 

Version-Release number of selected component (if applicable):
glusterfs built on Sep  6 2013 10:26:11

How reproducible:

1. Create a replicate volume. set self-heal-daemon to off. Start the volume
root@fan [Sep-07-2013-14:08:58] >gluster v info
Volume Name: vol_dis_1_rep_2
Type: Replicate
Volume ID: f5c43519-b5eb-4138-8219-723c064af71c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Brick1: fan.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_rep_2_b0
Brick2: mia.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_rep_2_b1
Options Reconfigured:
server.allow-insecure: on
performance.stat-prefetch: off
performance.write-behind: off
cluster.self-heal-daemon: off

2. Create fuse, nfs, cifs mount:

3. From all the mounts execute the following script:(pass different file names from each mount point)

test_script.sh <filename> :

	echo "Time before flock : `date`"
	flock -x 200
	echo "Time after flock : `date`"
	echo -e "\nWriting to file : $filename"
	for i in `seq 1 1000`; do echo "Hello $i" >&200 ; sleep 1; done
	echo "Time after the writes are successful : `date`"

4. When the writes are in progress bring down brick-1. 

5. After some time bring back brick-1 and bring down brick-0 almost at the same time.  (situation leading to split-brain)

6.  Let the writes on the file progress for some time. 

7. Bring back brick-0 online.  (split-brain state)

Actual Result:
Fuse and Cifs mount behavior:
1.  Writes from mount point are successful without reporting I/0 Error. 

2. Self-heals data from any of the brick depending on which brick has more more file size. 

3. Once the self-heal is complete, the change-logs are cleared on files. 

4. Once the writes are complete "cat testfile" is successful from mount point. 

NFS Behavior
1. Writes from mount point are successful without reporting I/0 Error. 

2. Changelogs are not cleared. 

3. Once the writes are complete cat testfile from mount gives I/0 Error

Expected results:
When file is in split-brain state, IO's should fail.
Comment 1 spandura 2013-09-07 10:32:04 EDT
SOS Reports:  http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1005485

fuse mount process info:
root@darrel [Sep-07-2013-14:29:36] >ps -ef | grep gluster
root      2335     1  0 07:35 ?        00:00:11 /usr/sbin/glusterfs --volfile-id=/vol_dis_1_rep_2 --volfile-server=mia /mnt/gm1
Comment 3 Scott Haines 2013-09-27 13:08:10 EDT
Targeting for 3.0.0 (Denali) release.
Comment 4 Poornima G 2013-12-02 23:54:23 EST
This issue will be seen if post-op-delay is set to non zero and the bricks go down and come back with in the post-op-delay time.

A patch for this has been sent upstream :

but this patch causes performance degradation.
Comment 7 spandura 2014-11-18 02:32:46 EST
Following test case was executed on "glusterfs built on Sep  3 2014 10:13:12" 

Case :-

1. create 2 x 2 distribute-replicate volume. start the volume. Set data-self-heal volume option to "off"

2. Create 2 fuse and 2 nfs mounts from 2 clients.

3. create 10 files from one of the mount.

4. From 1 fuse and 1 nfs mount on each client, open fd's on all 10 files and start writing to the fd's.

    exec 5>./file1
    exec 6>./file2
    exec 7>./file3
    exec 8>./file4
    exec 9>./file5
    exec 10>./file6
    exec 11>./file7
    exec 12>./file8
    exec 13>./file9
    exec 14>./file10

    while true ; do for i in `seq 5 14`; do echo "`date`" >&$i ; done ; done

5. From other fuse mount and nfs mount on each client, cat the contents of file , perform lookup on files in loop.

while true ; do find . | xargs stat ; done

while true ; do for i in `seq 1 10`; do cat  file$i; done ; done

6. Bring down brick1 and brick3 (one brick per sub-vol)

7. Bring back the bricks after some time. (service glusterd restart)

8. file1 ended in split-brain state,

Actual result:
writes are still successful on the split-brain files.

[root@rhsauto006 ~]# gluster v info
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 331cd4da-d234-480d-9152-a926e72369e7
Status: Started
Snap Volume: no
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Options Reconfigured:
cluster.data-self-heal: off
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
[root@rhsauto006 ~]# 

[root@rhsauto006 ~]# gluster v heal testvol info split-brain
Gathering list of split brain entries on volume testvol has been successful 

Number of entries: 0

Number of entries: 0

Number of entries: 2
at                    path on brick
2014-11-18 01:37:55 <gfid:5157d3c5-54fe-4573-a8d5-9dc58e10d3c7>
2014-11-18 01:37:57 <gfid:5157d3c5-54fe-4573-a8d5-9dc58e10d3c7>

Number of entries: 2
at                    path on brick
2014-11-18 01:37:58 /file1
2014-11-18 01:40:29 /file1
[root@rhsauto006 ~]# 
[root@rhsauto006 ~]# 

[root@rhsauto007 ~]# getfattr -d -e hex -m . /rhs/brick1/b3/file1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/b3/file1

[root@rhsauto014 ~]# getfattr -d -e hex -m . /rhs/brick1/b4/file1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/b4/file1

[root@rhsauto014 ~]# 

From fuse mount:
[root@rhsauto001 fuse1]# ls -l
total 4896
-rw-r--r--. 1 root root 500830 Nov 18 07:13 file1
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file10
-rw-r--r--. 1 root root 500830 Nov 18 07:13 file2
-rw-r--r--. 1 root root 500830 Nov 18 07:13 file3
-rw-r--r--. 1 root root 500830 Nov 18 07:13 file4
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file5
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file6
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file7
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file8
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file9
-rw-r--r--. 1 root root     29 Nov 17 14:46 testfile
[root@rhsauto001 fuse1]# ls -lh file1
-rw-r--r--. 1 root root 490K Nov 18 07:13 file1
[root@rhsauto001 fuse1]# echo "Hello" > file1
[root@rhsauto001 fuse1]#
Comment 9 RajeshReddy 2015-10-13 03:21:24 EDT
Tested with build "glusterfs-libs-3.7.1-16", once files are in split-brian state still writes are going to files because of performance.write-behind and after disabling the performance translater writes are failing with IO error
Comment 10 Vivek Agarwal 2015-12-03 12:17:53 EST
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.

Note You need to log in before you can comment on or make changes to this bug.