Bug 830121

Summary: Nfs mount doesn't report "I/O Error" when there is GFID mismatch for a file
Product: [Community] GlusterFS Reporter: Shwetha Panduranga <shwetha.h.panduranga>
Component: replicateAssignee: Vivek Agarwal <vagarwal>
Status: CLOSED DUPLICATE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.3-betaCC: gluster-bugs, jdarcy, mailbox, sankarshan, spandura, vinaraya
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 853683 (view as bug list) Environment:
Last Closed: 2013-08-28 11:03:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 853683, 858498    

Description Shwetha Panduranga 2012-06-08 09:48:01 UTC
Description of problem:
------------------------
When there is a GFID mismatch on 2 bricks for a file, "cat <file_name>" from nfs mount doesn't report "I/O Error"

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
3.3.0qa45

How reproducible:
-----------------
Often


Steps to Reproduce:
---------------------
1.Create a replicate volume(1x2. brick1 and brick2)
2.set self-heal-daemon off for the volume
3.Start the volume.
4.Create a NFS mount.
5.Create d directory <testdir> from NFS mount
6.Bring down "brick1".
7.From nfs mount execute: echo "Test Case: GFID Mismatch should report I/O Error" > testdir/file
8.Bring back the brick "brick1"
9.Bring down "brick2"
10.From nfs mount execute:echo "Test Case: GFID Mismatch should report I/O Error when Brick2 is down" > testdir/file
11.Bring back the brick "brick2"
12.From the mount execute : cat testdir/file

Actual results:
----------------
[06/08/12 - 20:32:19 root@APP-CLIENT1 nfsc1]# cd testdir/
[06/08/12 - 20:32:38 root@APP-CLIENT1 testdir]# ls
file
[06/08/12 - 20:32:39 root@APP-CLIENT1 testdir]# cat file 
Test Case: GFID Mismatch should report I/O Error when Brick2 is down


Expected results:
-----------------
Input/Ouput Error


Additional info:
-----------------
[06/08/12 - 20:27:23 root@APP-SERVER1 ~]# gluster v info
 
Volume Name: dstore
Type: Replicate
Volume ID: ed21634a-27c8-496a-a765-de068ce9dc8e
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.2.35:/export_sdb/dir1
Brick2: 192.168.2.36:/export_sdb/dir1
Options Reconfigured:
cluster.self-heal-daemon: off

[06/08/12 - 20:32:05 root@APP-SERVER1 ~]# gluster v status
Status of volume: dstore
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 192.168.2.35:/export_sdb/dir1			24009	Y	2830
Brick 192.168.2.36:/export_sdb/dir1			24009	Y	9539
NFS Server on localhost					38467	Y	2889
NFS Server on 192.168.2.36				38467	Y	9546

Data from Brick1 
------------------
[06/08/12 - 20:32:07 root@APP-SERVER1 ~]# 
[06/08/12 - 20:33:05 root@APP-SERVER1 ~]# 
[06/08/12 - 20:33:05 root@APP-SERVER1 ~]# getfattr -d -m . -e hex /export_sdb/dir1/testdir/file
getfattr: Removing leading '/' from absolute path names
# file: export_sdb/dir1/testdir/file
trusted.afr.dstore-client-0=0x000000000000000000000000
trusted.afr.dstore-client-1=0x000000010000000000000000
trusted.gfid=0x2dc5c2a385984b98828b9393fd4873db

[06/08/12 - 20:33:07 root@APP-SERVER1 ~]# getfattr -d -m . -e hex /export_sdb/dir1/testdir/
getfattr: Removing leading '/' from absolute path names
# file: export_sdb/dir1/testdir/
trusted.afr.dstore-client-0=0x000000000000000000000000
trusted.afr.dstore-client-1=0x000000000000000000000001
trusted.gfid=0xc20d53aa528644a8a6d74cd155413f19

[06/08/12 - 20:33:09 root@APP-SERVER1 ~]# 
[06/08/12 - 20:33:38 root@APP-SERVER1 ~]# cat /export_sdb/dir1/testdir/file
Test Case: GFID Mismatch should report I/O Error when all bricks are up

Data from Brick2:-
------------------

[06/08/12 - 20:31:47 root@APP-SERVER2 ~]# getfattr -d -m . -e hex /export_sdb/dir1/testdir/file 
getfattr: Removing leading '/' from absolute path names
# file: export_sdb/dir1/testdir/file
trusted.afr.dstore-client-0=0x000000010000000000000000
trusted.afr.dstore-client-1=0x000000000000000000000000
trusted.gfid=0x9f334349bcac445a9c8479a629068da1

[06/08/12 - 20:33:12 root@APP-SERVER2 ~]# getfattr -d -m . -e hex /export_sdb/dir1/testdir/
getfattr: Removing leading '/' from absolute path names
# file: export_sdb/dir1/testdir/
trusted.afr.dstore-client-0=0x000000000000000000000001
trusted.afr.dstore-client-1=0x000000000000000000000000
trusted.gfid=0xc20d53aa528644a8a6d74cd155413f19

[06/08/12 - 20:33:14 root@APP-SERVER2 ~]# cat /export_sdb/dir1/testdir/file
Test Case: GFID Mismatch should report I/O Error

Comment 1 Shwetha Panduranga 2012-06-08 09:53:33 UTC
[06/08/12 - 20:49:16 root@APP-CLIENT1 nfsc1]# rm testdir/file 
rm: remove regular file `testdir/file'? y
rm: cannot remove `testdir/file': Input/output error

Comment 2 Shwetha Panduranga 2012-06-08 09:55:06 UTC
[06/08/12 - 20:48:59 root@APP-CLIENT1 nfsc1]# rm testdir/file 
rm: remove regular file `testdir/file'? y
rm: cannot remove `testdir/file': Input/output error

[06/08/12 - 20:49:04 root@APP-CLIENT1 nfsc1]# cat testdir/file 
Test Case: GFID Mismatch should report I/O Error when all bricks are up

[06/08/12 - 20:49:16 root@APP-CLIENT1 nfsc1]# rm testdir/file 
rm: remove regular file `testdir/file'? y
rm: cannot remove `testdir/file': Input/output error

Comment 3 Jeff Darcy 2012-10-26 21:10:23 UTC
The symptom's not quite the same, but it's very close and the underlying cause is identical.

*** This bug has been marked as a duplicate of bug 830134 ***

Comment 4 spandura 2013-07-19 10:25:37 UTC
Reopening this bug as this is not duplicate of the bug 830134. 

In bug 830134 the EIO is not reported on NFS Mount even when files are in data split-brain. 

In this bug it's entry split-brain not the data split-brain.

Comment 5 Vivek Agarwal 2013-08-28 11:03:59 UTC
The root cause looks similar to 853682.

*** This bug has been marked as a duplicate of bug 853682 ***

Comment 6 Vivek Agarwal 2013-08-28 11:04:45 UTC
NFS lookups are cached by NFS client. Therefore all NFS lookup call may not reach server.

To test this behavior we can mount NFS with lookupcache=none option. This will disable client side lookup cache.

Fuse mount does not seem to cache lookups and the split-brain check is done only at lookups therefore Fuse mount seems to be working.

I think AFR should handle such scenario (lookup cache) as well.