1005485 – AFR: writes are successful on files which are in split-brain state

Bug 1005485 - AFR: writes are successful on files which are in split-brain state

Summary: AFR: writes are successful on files which are in split-brain state

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Anuradha
QA Contact:	spandura
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-09-07 14:27 UTC by spandura
Modified:	2016-09-20 02:00 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-12-03 17:17:53 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description spandura 2013-09-07 14:27:06 UTC

Description of problem:
=======================
In a replicate volume (1x2) when a file is in split-brain state IO's are successful on the file and self-heal happens from brick which has the file size greater to other brick. 

Version-Release number of selected component (if applicable):
===============================================================
glusterfs 3.4.0.32rhs built on Sep  6 2013 10:26:11

How reproducible:
====================
Everytime

1. Create a replicate volume. set self-heal-daemon to off. Start the volume
root@fan [Sep-07-2013-14:08:58] >gluster v info
 
Volume Name: vol_dis_1_rep_2
Type: Replicate
Volume ID: f5c43519-b5eb-4138-8219-723c064af71c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: fan.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_rep_2_b0
Brick2: mia.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_rep_2_b1
Options Reconfigured:
server.allow-insecure: on
performance.stat-prefetch: off
performance.write-behind: off
cluster.self-heal-daemon: off

2. Create fuse, nfs, cifs mount:

3. From all the mounts execute the following script:(pass different file names from each mount point)

test_script.sh <filename> :
======================
#!/bin/bash

pwd=`pwd`
filename="${pwd}/$1"
(
	echo "Time before flock : `date`"
	flock -x 200
	echo "Time after flock : `date`"
	echo -e "\nWriting to file : $filename"
	for i in `seq 1 1000`; do echo "Hello $i" >&200 ; sleep 1; done
	echo "Time after the writes are successful : `date`"
)200>>$filename


4. When the writes are in progress bring down brick-1. 

5. After some time bring back brick-1 and bring down brick-0 almost at the same time.  (situation leading to split-brain)

6.  Let the writes on the file progress for some time. 

7. Bring back brick-0 online.  (split-brain state)

Actual Result:
=============
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Fuse and Cifs mount behavior:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1.  Writes from mount point are successful without reporting I/0 Error. 

2. Self-heals data from any of the brick depending on which brick has more more file size. 

3. Once the self-heal is complete, the change-logs are cleared on files. 

4. Once the writes are complete "cat testfile" is successful from mount point. 

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
NFS Behavior
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1. Writes from mount point are successful without reporting I/0 Error. 

2. Changelogs are not cleared. 

3. Once the writes are complete cat testfile from mount gives I/0 Error

Expected results:
====================
When file is in split-brain state, IO's should fail.

Comment 1 spandura 2013-09-07 14:32:04 UTC

SOS Reports:  http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1005485

fuse mount process info:
========================
root@darrel [Sep-07-2013-14:29:36] >ps -ef | grep gluster
root      2335     1  0 07:35 ?        00:00:11 /usr/sbin/glusterfs --volfile-id=/vol_dis_1_rep_2 --volfile-server=mia /mnt/gm1

Comment 3 Scott Haines 2013-09-27 17:08:10 UTC

Targeting for 3.0.0 (Denali) release.

Comment 4 Poornima G 2013-12-03 04:54:23 UTC

This issue will be seen if post-op-delay is set to non zero and the bricks go down and come back with in the post-op-delay time.

A patch for this has been sent upstream :
http://review.gluster.com/#/c/5635/

but this patch causes performance degradation.

Comment 7 spandura 2014-11-18 07:32:46 UTC

Following test case was executed on "glusterfs 3.6.0.28 built on Sep  3 2014 10:13:12" 

Case :-
=======

1. create 2 x 2 distribute-replicate volume. start the volume. Set data-self-heal volume option to "off"

2. Create 2 fuse and 2 nfs mounts from 2 clients.

3. create 10 files from one of the mount.

4. From 1 fuse and 1 nfs mount on each client, open fd's on all 10 files and start writing to the fd's.

    exec 5>./file1
    exec 6>./file2
    exec 7>./file3
    exec 8>./file4
    exec 9>./file5
    exec 10>./file6
    exec 11>./file7
    exec 12>./file8
    exec 13>./file9
    exec 14>./file10

    while true ; do for i in `seq 5 14`; do echo "`date`" >&$i ; done ; done

5. From other fuse mount and nfs mount on each client, cat the contents of file , perform lookup on files in loop.

while true ; do find . | xargs stat ; done

while true ; do for i in `seq 1 10`; do cat  file$i; done ; done


6. Bring down brick1 and brick3 (one brick per sub-vol)

7. Bring back the bricks after some time. (service glusterd restart)

8. file1 ended in split-brain state,


Actual result:
=============
writes are still successful on the split-brain files.

[root@rhsauto006 ~]# gluster v info
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 331cd4da-d234-480d-9152-a926e72369e7
Status: Started
Snap Volume: no
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.36.234:/rhs/brick1/b1
Brick2: 10.70.36.236:/rhs/brick1/b2
Brick3: 10.70.36.237:/rhs/brick1/b3
Brick4: 10.70.36.244:/rhs/brick1/b4
Options Reconfigured:
cluster.data-self-heal: off
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
[root@rhsauto006 ~]# 


[root@rhsauto006 ~]# gluster v heal testvol info split-brain
Gathering list of split brain entries on volume testvol has been successful 

Brick 10.70.36.234:/rhs/brick1/b1
Number of entries: 0

Brick 10.70.36.236:/rhs/brick1/b2
Number of entries: 0

Brick 10.70.36.237:/rhs/brick1/b3
Number of entries: 2
at                    path on brick
-----------------------------------
2014-11-18 01:37:55 <gfid:5157d3c5-54fe-4573-a8d5-9dc58e10d3c7>
2014-11-18 01:37:57 <gfid:5157d3c5-54fe-4573-a8d5-9dc58e10d3c7>

Brick 10.70.36.244:/rhs/brick1/b4
Number of entries: 2
at                    path on brick
-----------------------------------
2014-11-18 01:37:58 /file1
2014-11-18 01:40:29 /file1
[root@rhsauto006 ~]# 
[root@rhsauto006 ~]# 

[root@rhsauto007 ~]# getfattr -d -e hex -m . /rhs/brick1/b3/file1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/b3/file1
trusted.afr.testvol-client-2=0x000000070000000000000000
trusted.afr.testvol-client-3=0x000000090000000000000000
trusted.gfid=0x5157d3c554fe4573a8d59dc58e10d3c7

[root@rhsauto014 ~]# getfattr -d -e hex -m . /rhs/brick1/b4/file1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/b4/file1
trusted.afr.testvol-client-2=0x00002e8f0000000000000000
trusted.afr.testvol-client-3=0x000000000000000000000000
trusted.gfid=0x5157d3c554fe4573a8d59dc58e10d3c7

[root@rhsauto014 ~]# 

From fuse mount:
=================
[root@rhsauto001 fuse1]# ls -l
total 4896
-rw-r--r--. 1 root root 500830 Nov 18 07:13 file1
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file10
-rw-r--r--. 1 root root 500830 Nov 18 07:13 file2
-rw-r--r--. 1 root root 500830 Nov 18 07:13 file3
-rw-r--r--. 1 root root 500830 Nov 18 07:13 file4
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file5
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file6
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file7
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file8
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file9
-rw-r--r--. 1 root root     29 Nov 17 14:46 testfile
[root@rhsauto001 fuse1]# ls -lh file1
-rw-r--r--. 1 root root 490K Nov 18 07:13 file1
[root@rhsauto001 fuse1]# echo "Hello" > file1
[root@rhsauto001 fuse1]#

Comment 9 RajeshReddy 2015-10-13 07:21:24 UTC

Tested with build "glusterfs-libs-3.7.1-16", once files are in split-brian state still writes are going to files because of performance.write-behind and after disabling the performance translater writes are failing with IO error

Comment 10 Vivek Agarwal 2015-12-03 17:17:53 UTC

Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.

Note You need to log in before you can comment on or make changes to this bug.