867253 – DHT : If brick is down (where root directory is hashing) then lookup on nfs mount gives error ' cannot open directory .: Input/output error'

Bug 867253 - DHT : If brick is down (where root directory is hashing) then lookup on nfs mount gives error ' cannot open directory .: Input/output error'

Summary: DHT : If brick is down (where root directory is hashing) then lookup on nfs ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	shishir gowda
QA Contact:	amainkar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-10-17 07:05 UTC by Rachana Patel
Modified:	2015-04-20 13:49 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.4.0qa5-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-23 22:33:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
server log (1.40 MB, application/x-gzip) 2012-10-19 08:07 UTC, Rachana Patel	no flags	Details
View All

Description Rachana Patel 2012-10-17 07:05:37 UTC

Description of problem:
DHT :  If brick is down (where root directory is hashing) then lookup on nfs mount gives error  ' cannot open directory .: Input/output error'

Version-Release number of selected component (if applicable):
3.3.0.3rhs-33.el6rhs.x86_64


How reproducible:
always

Steps to Reproduce:
1. Create a Distributed volume having 3 or more sub-volumes on multiple server and start that volume.

[root@Rhs1 ~]# gluster volume info San_11
 
Volume Name: San_11
Type: Distribute
Volume ID: 59df39cb-f6f1-4514-a8a1-d7163f06d962
Status: Started
Number of Bricks: 5
Transport-type: tcp
Bricks:
Brick1: 10.70.35.81:/home/san1
Brick2: 10.70.35.81:/home/san2
Brick3: 10.70.35.85:/home/san1
Brick4: 10.70.35.85:/home/san2
Brick5: 10.70.35.86:/home/san1


2. nfs mount the volume from the client-1 and also FUSE mount the same volume
 

3. From mount point create some dirs and files inside it

4. Find where root directory is hashing
[root@Rhs1 ~]# getfattr -d -m  . -e hex  /home/san2
getfattr: Removing leading '/' from absolute path names
# file: home/san2
security.selinux=0x73797374656d5f753a6f626a6563745f723a686f6d655f726f6f745f743a733000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000000000000033333332
trusted.glusterfs.volume-id=0x59df39cbf6f14514a8a1d7163f06d962

5. Bring that brick down by killing the process.
[root@Rhs1 ~]# gluster volume status San_11
Status of volume: San_11
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.35.81:/home/san1				24019	Y	12761
Brick 10.70.35.81:/home/san2				24020	N	12767
Brick 10.70.35.85:/home/san1				24223	Y	12845
Brick 10.70.35.85:/home/san2				24224	Y	12850
Brick 10.70.35.86:/home/san1				24230	Y	12819
NFS Server on localhost					38467	Y	12857
NFS Server on 10.70.35.86				38467	Y	12825
NFS Server on 10.70.35.85				38467	Y	12857

6. execute ls command on both mount point
Fuse mount is listing Dirs but nfs mount is giving below error
[root@client nfs1]# ls
ls: cannot open directory .: Input/output error

Actual results:
Input/output error

Expected results:
It should list all Directories and files(not hashed on down sub-vol)




Additional info:

Comment 3 shishir gowda 2012-10-17 12:26:03 UTC

Please attach the nfs server logs.

Comment 4 Rachana Patel 2012-10-19 08:07:43 UTC

Created attachment 629846 [details]
server log

Comment 5 shishir gowda 2012-10-23 07:20:12 UTC

It might be related the above mentioned bug, but there are no similar failure error messages.

Seeing these errors in the log.

Need input nfs SME's.

[2012-10-17 12:19:40.383182] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-San_11-client-1: remote op
eration failed: Transport endpoint is not connected. Path: / (00000000-0000-0000-0000-000000000001)
[2012-10-17 12:19:40.383876] W [client3_1-fops.c:1332:client3_1_access_cbk] 0-San_11-client-1: remote op
eration failed: Transport endpoint is not connected
[2012-10-17 12:19:40.383925] W [nfs3.c:1491:nfs3svc_access_cbk] 0-nfs: 3bb08886: / => -1 (Transport endpoint is not connected)
[2012-10-17 12:19:40.383953] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 3bb08886, ACCESS: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected)

Comment 6 shishir gowda 2012-10-23 07:23:54 UTC

The reported confirmed that on parallel access on fuse mount listed all Directories and files(not hashed on down sub-vol)

Comment 7 Rajesh 2012-10-29 11:36:05 UTC

the behaviour or both nfs and fuse mount are the same, as of the latest git HEAD
232adb88512274863c9f5ad51569695af80bd6c0.

rachana, could you confirm the finding?

Comment 8 Rajesh 2012-11-15 07:06:26 UTC

This is reproducible. Found that dht_access returns the EIO error as-is if one brick is down. re-assigning.

Comment 9 Amar Tumballi 2012-11-28 03:06:35 UTC

http://review.gluster.org/4240 is posted for review upstream, once in, will be backported and merged to downstream

Comment 10 Vijay Bellur 2012-11-29 10:26:30 UTC

CHANGE: http://review.gluster.org/4240 (cluster/dht: send ACCESS call on dir to first_up_subvol if cached is down) merged in master by Vijay Bellur (vbellur)

Comment 11 Rachana Patel 2012-12-28 08:52:01 UTC

verified this on  3.4.0qa5

Now it is not giving error  ' cannot open directory .: Input/output error' and shows files and directory but if hashed sub-volume is down for directory it says ' ls: cannot access d37: Invalid argument'

we already have defect for that issue -  https://bugzilla.redhat.com/show_bug.cgi?id=856459
so closing this as verified.

Comment 12 Vijay Bellur 2013-01-28 05:09:53 UTC

CHANGE: http://review.gluster.org/4421 (bug-867253.t: do a clean umount at the end) merged in master by Anand Avati (avati)

Comment 13 Rachana Patel 2013-03-19 09:23:14 UTC

found this defect on -3.3.0.6rhs-4.el6.x86_64

 DHT : If brick is down (where root directory is hashing) then lookup on nfs mount gives error ' cannot open directory .: Input/output error'

- same as original defect- fuse mount is not giving any error but nfs mount is giving.

so reopening defect

info :-
[root@cutlass tmp]# gluster v status 64-fuse
Status of volume: 64-fuse
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick fred.lab.eng.blr.redhat.com:/brick1/6.4-fuse	24025	Y	6063
Brick fan.lab.eng.blr.redhat.com:/brick1/6.4-fuse	24020	N	18113
Brick mia.lab.eng.blr.redhat.com:/brick1/6.4-fuse	24017	Y	27197
NFS Server on localhost					38467	Y	31344
NFS Server on fred.lab.eng.blr.redhat.com		38467	Y	6102
NFS Server on 10.70.34.91				38467	Y	18196
NFS Server on mia.lab.eng.blr.redhat.com		38467	Y	17908
 
[root@fan tmp]# getfattr -d -m . -e hex /brick1/6.4-fuse/
getfattr: Removing leading '/' from absolute path names
# file: brick1/6.4-fuse/
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000000000000055555554
trusted.glusterfs.volume-id=0x8a5d6a111cda4406b818c413e5ae0968

[root@fred tmp]# getfattr -d -m . -e hex /brick1/6.4-fuse/
getfattr: Removing leading '/' from absolute path names
# file: brick1/6.4-fuse/
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff
trusted.glusterfs.volume-id=0x8a5d6a111cda4406b818c413e5ae0968


[root@mia tmp]# getfattr -d -m . -e hex /brick1/6.4-fuse/
getfattr: Removing leading '/' from absolute path names
# file: brick1/6.4-fuse/
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9
trusted.glusterfs.volume-id=0x8a5d6a111cda4406b818c413e5ae0968

nfs mount :-

[root@rhsauto037 test2]# ls
ls: cannot open directory .: Input/output error

fuse mount :-
[root@rhsauto037 test1]# ls
d12  d15  d2   d23  d30  d33  d36  d40  d43  d48  d50  d9   f11  f16  f2   f22  f26  f3   f32  f37  f42  f46  f5  f9
d13  d16  d21  d25  d31  d34  d39  d41  d45  d49  d6   f1   f14  f18  f20  f23  f27  f30  f33  f38  f43  f47  f6
d14  d17  d22  d28  d32  d35  d4   d42  d47  d5   d7   f10  f15  f19  f21  f25  f29  f31  f36  f40  f45  f49  f7

Comment 18 Rachana Patel 2013-05-08 06:45:08 UTC

verified on 3.4.0.4rhs-1.el6rhs.x86_64, working as per expectation, hence marking it as verified

Comment 19 Scott Haines 2013-09-23 22:33:31 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.