| Summary: | AFR: writes are successful on files which are in split-brain state | ||
|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | spandura |
| Component: | replicate | Assignee: | Anuradha <atalur> |
| Status: | CLOSED EOL | QA Contact: | spandura |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 2.1 | CC: | nsathyan, pgurusid, rhs-bugs, rmekala, sdharane, smohan, storage-qa-internal, vagarwal, vbellur |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-12-03 17:17:53 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
SOS Reports: http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1005485 fuse mount process info: ======================== root@darrel [Sep-07-2013-14:29:36] >ps -ef | grep gluster root 2335 1 0 07:35 ? 00:00:11 /usr/sbin/glusterfs --volfile-id=/vol_dis_1_rep_2 --volfile-server=mia /mnt/gm1 Targeting for 3.0.0 (Denali) release. This issue will be seen if post-op-delay is set to non zero and the bricks go down and come back with in the post-op-delay time. A patch for this has been sent upstream : http://review.gluster.com/#/c/5635/ but this patch causes performance degradation. Following test case was executed on "glusterfs 3.6.0.28 built on Sep 3 2014 10:13:12"
Case :-
=======
1. create 2 x 2 distribute-replicate volume. start the volume. Set data-self-heal volume option to "off"
2. Create 2 fuse and 2 nfs mounts from 2 clients.
3. create 10 files from one of the mount.
4. From 1 fuse and 1 nfs mount on each client, open fd's on all 10 files and start writing to the fd's.
exec 5>./file1
exec 6>./file2
exec 7>./file3
exec 8>./file4
exec 9>./file5
exec 10>./file6
exec 11>./file7
exec 12>./file8
exec 13>./file9
exec 14>./file10
while true ; do for i in `seq 5 14`; do echo "`date`" >&$i ; done ; done
5. From other fuse mount and nfs mount on each client, cat the contents of file , perform lookup on files in loop.
while true ; do find . | xargs stat ; done
while true ; do for i in `seq 1 10`; do cat file$i; done ; done
6. Bring down brick1 and brick3 (one brick per sub-vol)
7. Bring back the bricks after some time. (service glusterd restart)
8. file1 ended in split-brain state,
Actual result:
=============
writes are still successful on the split-brain files.
[root@rhsauto006 ~]# gluster v info
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 331cd4da-d234-480d-9152-a926e72369e7
Status: Started
Snap Volume: no
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.36.234:/rhs/brick1/b1
Brick2: 10.70.36.236:/rhs/brick1/b2
Brick3: 10.70.36.237:/rhs/brick1/b3
Brick4: 10.70.36.244:/rhs/brick1/b4
Options Reconfigured:
cluster.data-self-heal: off
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
[root@rhsauto006 ~]#
[root@rhsauto006 ~]# gluster v heal testvol info split-brain
Gathering list of split brain entries on volume testvol has been successful
Brick 10.70.36.234:/rhs/brick1/b1
Number of entries: 0
Brick 10.70.36.236:/rhs/brick1/b2
Number of entries: 0
Brick 10.70.36.237:/rhs/brick1/b3
Number of entries: 2
at path on brick
-----------------------------------
2014-11-18 01:37:55 <gfid:5157d3c5-54fe-4573-a8d5-9dc58e10d3c7>
2014-11-18 01:37:57 <gfid:5157d3c5-54fe-4573-a8d5-9dc58e10d3c7>
Brick 10.70.36.244:/rhs/brick1/b4
Number of entries: 2
at path on brick
-----------------------------------
2014-11-18 01:37:58 /file1
2014-11-18 01:40:29 /file1
[root@rhsauto006 ~]#
[root@rhsauto006 ~]#
[root@rhsauto007 ~]# getfattr -d -e hex -m . /rhs/brick1/b3/file1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/b3/file1
trusted.afr.testvol-client-2=0x000000070000000000000000
trusted.afr.testvol-client-3=0x000000090000000000000000
trusted.gfid=0x5157d3c554fe4573a8d59dc58e10d3c7
[root@rhsauto014 ~]# getfattr -d -e hex -m . /rhs/brick1/b4/file1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/b4/file1
trusted.afr.testvol-client-2=0x00002e8f0000000000000000
trusted.afr.testvol-client-3=0x000000000000000000000000
trusted.gfid=0x5157d3c554fe4573a8d59dc58e10d3c7
[root@rhsauto014 ~]#
From fuse mount:
=================
[root@rhsauto001 fuse1]# ls -l
total 4896
-rw-r--r--. 1 root root 500830 Nov 18 07:13 file1
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file10
-rw-r--r--. 1 root root 500830 Nov 18 07:13 file2
-rw-r--r--. 1 root root 500830 Nov 18 07:13 file3
-rw-r--r--. 1 root root 500830 Nov 18 07:13 file4
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file5
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file6
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file7
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file8
-rw-r--r--. 1 root root 500801 Nov 18 07:13 file9
-rw-r--r--. 1 root root 29 Nov 17 14:46 testfile
[root@rhsauto001 fuse1]# ls -lh file1
-rw-r--r--. 1 root root 490K Nov 18 07:13 file1
[root@rhsauto001 fuse1]# echo "Hello" > file1
[root@rhsauto001 fuse1]#
Tested with build "glusterfs-libs-3.7.1-16", once files are in split-brian state still writes are going to files because of performance.write-behind and after disabling the performance translater writes are failing with IO error Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/ If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release. |
Description of problem: ======================= In a replicate volume (1x2) when a file is in split-brain state IO's are successful on the file and self-heal happens from brick which has the file size greater to other brick. Version-Release number of selected component (if applicable): =============================================================== glusterfs 3.4.0.32rhs built on Sep 6 2013 10:26:11 How reproducible: ==================== Everytime 1. Create a replicate volume. set self-heal-daemon to off. Start the volume root@fan [Sep-07-2013-14:08:58] >gluster v info Volume Name: vol_dis_1_rep_2 Type: Replicate Volume ID: f5c43519-b5eb-4138-8219-723c064af71c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: fan.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_rep_2_b0 Brick2: mia.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_rep_2_b1 Options Reconfigured: server.allow-insecure: on performance.stat-prefetch: off performance.write-behind: off cluster.self-heal-daemon: off 2. Create fuse, nfs, cifs mount: 3. From all the mounts execute the following script:(pass different file names from each mount point) test_script.sh <filename> : ====================== #!/bin/bash pwd=`pwd` filename="${pwd}/$1" ( echo "Time before flock : `date`" flock -x 200 echo "Time after flock : `date`" echo -e "\nWriting to file : $filename" for i in `seq 1 1000`; do echo "Hello $i" >&200 ; sleep 1; done echo "Time after the writes are successful : `date`" )200>>$filename 4. When the writes are in progress bring down brick-1. 5. After some time bring back brick-1 and bring down brick-0 almost at the same time. (situation leading to split-brain) 6. Let the writes on the file progress for some time. 7. Bring back brick-0 online. (split-brain state) Actual Result: ============= +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Fuse and Cifs mount behavior: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1. Writes from mount point are successful without reporting I/0 Error. 2. Self-heals data from any of the brick depending on which brick has more more file size. 3. Once the self-heal is complete, the change-logs are cleared on files. 4. Once the writes are complete "cat testfile" is successful from mount point. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ NFS Behavior +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1. Writes from mount point are successful without reporting I/0 Error. 2. Changelogs are not cleared. 3. Once the writes are complete cat testfile from mount gives I/0 Error Expected results: ==================== When file is in split-brain state, IO's should fail.