Bug 830134 - NFS Mount doesn't report "I/0 Error" when a file is in split-brain state
NFS Mount doesn't report "I/0 Error" when a file is in split-brain state
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
3.3-beta
Unspecified Unspecified
unspecified Severity urgent
: ---
: ---
Assigned To: Jeff Darcy
: Triaged
Depends On:
Blocks: 853682 855913 858497
  Show dependency treegraph
 
Reported: 2012-06-08 06:31 EDT by Shwetha Panduranga
Modified: 2013-07-24 13:55 EDT (History)
3 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 853682 (view as bug list)
Environment:
Last Closed: 2013-07-24 13:55:46 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Shwetha Panduranga 2012-06-08 06:31:32 EDT
Description of problem:
-----------------------
In a replicate volume, when a file is in split-brain state , cat on that file from the nfs mount should report "I/O Error"

Version-Release number of selected component (if applicable):
------------------------------------------------------------
3.3.0qa45

How reproducible:
----------------
Often

Steps to Reproduce:
---------------------
1.Create a replicate volume(1x2. brick1 and brick2)
2.set self-heal-daemon off for the volume
3.Start the volume.
4.Create a NFS mount.
5.Create a directory <testdir> from NFS mount
6.Create a file <testdir/file> from NFS mount
6.Bring down "brick1".
7.From nfs mount execute: echo "TestCase: Test Split-Brain. Brick1 is down now" > testdir/file
8.Bring back the brick "brick1"
9.Bring down "brick2"
10.From nfs mount execute:echo "TestCase: Test Split-Brain. Brick2 is down now" > testdir/file
11.Bring back the brick "brick2"
12.From the mount execute : cat testdir/file
  
Actual results:
----------------
[06/08/12 - 21:18:14 root@APP-CLIENT1 ~]# cd /mnt/nfsc1; cat testdir/file
TestCase: Test Split-Brain. Brick2 is down now


Expected results:
-------------------
Should report I/O Error

Additional info:
---------------

[06/08/12 - 21:13:48 root@APP-SERVER1 ~]# gluster v info
 
Volume Name: dstore
Type: Replicate
Volume ID: 03c2125d-c86a-45d3-abbe-7f83567d2d0b
Status: Created
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.2.35:/export_sdb/dir1
Brick2: 192.168.2.36:/export_sdb/dir1
Options Reconfigured:
cluster.self-heal-daemon: off

Brick1 data:-
-------------

[06/08/12 - 21:23:30 root@APP-SERVER1 ~]# getfattr -d -m . -e hex /export_sdb/dir1/testdir/file 
getfattr: Removing leading '/' from absolute path names
# file: export_sdb/dir1/testdir/file
trusted.afr.dstore-client-0=0x000000000000000000000000
trusted.afr.dstore-client-1=0x000000010000000100000000
trusted.gfid=0x910b72d06aa842efa8300b16df998741

[06/08/12 - 21:23:31 root@APP-SERVER1 ~]# getfattr -d -m . -e hex /export_sdb/dir1/testdir/
getfattr: Removing leading '/' from absolute path names
# file: export_sdb/dir1/testdir/
trusted.gfid=0x617d018b908042dbb16d14a0d084b224

[06/08/12 - 21:23:33 root@APP-SERVER1 ~]# getfattr -d -m . -e hex /export_sdb/dir1/
getfattr: Removing leading '/' from absolute path names
# file: export_sdb/dir1/
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.volume-id=0x03c2125dc86a45d3abbe7f83567d2d0b

Brick2 data:-
------------

[06/08/12 - 21:24:01 root@APP-SERVER2 ~]# getfattr -d -m . -e hex /export_sdb/dir1/testdir/file 
getfattr: Removing leading '/' from absolute path names
# file: export_sdb/dir1/testdir/file
trusted.afr.dstore-client-0=0x000000010000000100000000
trusted.afr.dstore-client-1=0x000000000000000000000000
trusted.gfid=0x910b72d06aa842efa8300b16df998741

[06/08/12 - 21:24:01 root@APP-SERVER2 ~]# getfattr -d -m . -e hex /export_sdb/dir1/testdir/
getfattr: Removing leading '/' from absolute path names
# file: export_sdb/dir1/testdir/
trusted.gfid=0x617d018b908042dbb16d14a0d084b224

[06/08/12 - 21:24:03 root@APP-SERVER2 ~]# getfattr -d -m . -e hex /export_sdb/dir1/
getfattr: Removing leading '/' from absolute path names
# file: export_sdb/dir1/
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.volume-id=0x03c2125dc86a45d3abbe7f83567d2d0b

Note:- Further rm on the file from mount succeeds 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Mount Output:-
------------
[06/08/12 - 21:24:22 root@APP-CLIENT1 testdir]# rm file 
rm: remove regular file `file'? y

Brick1 :-
-------

[06/08/12 - 21:24:37 root@APP-SERVER1 ~]# getfattr -d -m . -e hex /export_sdb/dir1/testdir/file
getfattr: /export_sdb/dir1/testdir/file: No such file or directory

[06/08/12 - 21:25:48 root@APP-SERVER1 ~]# getfattr -d -m . -e hex /export_sdb/dir1/testdir/
getfattr: Removing leading '/' from absolute path names
# file: export_sdb/dir1/testdir/
trusted.gfid=0x617d018b908042dbb16d14a0d084b224


Brick2:-
--------

[06/08/12 - 21:24:47 root@APP-SERVER2 ~]# getfattr -d -m . -e hex /export_sdb/dir1/testdir/file
getfattr: /export_sdb/dir1/testdir/file: No such file or directory

[06/08/12 - 21:25:52 root@APP-SERVER2 ~]# getfattr -d -m . -e hex /export_sdb/dir1/testdir/
getfattr: Removing leading '/' from absolute path names
# file: export_sdb/dir1/testdir/
trusted.gfid=0x617d018b908042dbb16d14a0d084b224
Comment 1 Krishna Srinivas 2012-09-10 08:30:23 EDT
Pranith, replicate was not returning EIO in case like this (note that it is anonymous fd read). Can you take a look?
Comment 2 Pranith Kumar K 2012-09-10 22:20:16 EDT
Nfs does not perform lookups. Afr depends on lookup fop to realize that there is a split-brain and report it, so with NFS no EIOs are seen this is a known issue.
Comment 3 Jeff Darcy 2012-10-09 11:32:03 EDT
Submitted http://review.gluster.org/4050 to bump mtime/ctime on getattr requests (which NFS uses to check cache freshness) and force a new lookup.  When the self-heal done as part of the lookup fails due to split brain or GFID mismatch, the NFS client gets EIO back.
Comment 4 Vijay Bellur 2012-10-19 10:16:44 EDT
CHANGE: http://review.gluster.org/4058 (nfs: do lookup on getattr after brick-status change) merged in master by Vijay Bellur (vbellur@redhat.com)
Comment 5 Jeff Darcy 2012-10-26 17:10:23 EDT
*** Bug 830121 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.