Bug 903476 - DHT: - If brick/Sub-volume is down then any attempt to create/access/modify file which is hashed and cached on down sub-volume should give error “ Transport endpoint is not connected” (rather than “No such file or directory” or "Input/output error")
Summary: DHT: - If brick/Sub-volume is down then any attempt to create/access/modify f...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: shishir gowda
QA Contact: amainkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-24 05:07 UTC by Rachana Patel
Modified: 2015-04-20 11:56 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-23 22:29:52 UTC
Embargoed:


Attachments (Terms of Use)
log (12.84 KB, application/x-gzip)
2013-01-24 05:09 UTC, Rachana Patel
no flags Details

Description Rachana Patel 2013-01-24 05:07:18 UTC
Description of problem:
DHT: - If brick/Sub-volume is down then any attempt to access/modify file which is hashed and cached on down sub-volume should give error “ Transport endpoint is not connected” (rather than “No such file or directory” )

Version-Release number of selected component (if applicable):
3.3.0.5rhs-40.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
1. Create a Distributed volume having 3 or more sub-volumes on multiple server and start that volume.

2. Fuse Mount the volume from the client-1 using “mount -t glusterfs  server:/<volume> <client-1_mount_point>”

3. From mount point create some dirs and files inside it.

4. bring on sub-volume down by killing brick process
[root@dhcp159-165 glusterd]# gluster volume status bdown
Status of volume: bdown
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.16.159.157:/home/bdown1			24011	Y	26213
Brick 10.16.159.165:/home/bdown1			24013	N	28792
Brick 10.16.159.158:/home/bdown1			24011	Y	26189
NFS Server on localhost					38467	Y	16589
NFS Server on 10.16.159.223				38467	Y	16876
NFS Server on 10.16.159.207				38467	Y	16848
NFS Server on 10.16.159.158				38467	Y	13990
NFS Server on 10.16.159.157				38467	Y	14035
 
5. from mount point try to access/modify files which are hashed and cached on down sub-volume

server .165
[root@dhcp159-165 glusterd]# ls /home/bdown1/f*
/home/bdown1/f12  /home/bdown1/f34  /home/bdown1/f44  /home/bdown1/f55
/home/bdown1/f13  /home/bdown1/f35  /home/bdown1/f48  /home/bdown1/f58
/home/bdown1/f17  /home/bdown1/f39  /home/bdown1/f50  /home/bdown1/f60
/home/bdown1/f24  /home/bdown1/f4   /home/bdown1/f52  /home/bdown1/f8
/home/bdown1/f28  /home/bdown1/f41  /home/bdown1/f53

[root@dhcp159-165 glusterd]# ls /home/bdown1/d1/f*
/home/bdown1/d1/f1   /home/bdown1/d1/f22  /home/bdown1/d1/f5
/home/bdown1/d1/f10  /home/bdown1/d1/f23  /home/bdown1/d1/f51
/home/bdown1/d1/f18  /home/bdown1/d1/f29  /home/bdown1/d1/f56
/home/bdown1/d1/f19  /home/bdown1/d1/f42  /home/bdown1/d1/f59
/home/bdown1/d1/f2   /home/bdown1/d1/f46
/home/bdown1/d1/f20  /home/bdown1/d1/f47


from mount point remove file hashed/cached to down sub-volume
[root@dhcp159-207 bdown]# rm f12 
rm: cannot remove `f12': No such file or directory 

remove file hashed/cached  to down sub-volume
[root@dhcp159-207 bdown]# unlink f12 
unlink: cannot unlink `f12': No such file or directory 

remove file hashed/cached to down sub-volume
[root@dhcp159-207 bdown]# unlink d1/f1 
unlink: cannot unlink `d1/f1': No such file or directory 

remove file hashed/cached  to down sub-volume
[root@dhcp159-207 bdown]# rm d1/f10 
rm: cannot remove `d1/f10': No such file or directory 

rename file hashed and cached  to down sub-volume
[root@dhcp159-207 bdown]# mv f12 f11 
mv: cannot stat `f12': No such file or directory 

rename file hashed and cached  to down sub-volume
[root@dhcp159-207 bdown]# mv d1/f10 d1/f11 
mv: cannot stat `d1/f10': No such file or directory 


rename – destination  file hashed and cached  to down sub-volume
[root@dhcp159-207 bdown]# mv f15 f12 
mv: cannot move `f15' to `f12': Transport endpoint is not connected 

file hashed and cached  to down sub-volume
[root@dhcp159-207 bdown]# cat f12 
cat: f12: No such file or directory 
[root@dhcp159-207 bdown]# less f12 
f12: No such file or directory 
[root@dhcp159-207 bdown]# echo abc > f12 
-bash: f12: Transport endpoint is not connected 

copy file hashed and cached  to down sub-volume
[root@dhcp159-207 bdown]# cp f12 f11 
cp: cannot stat `f12': No such file or directory 

copy file hashed and cached  to down sub-volume
[root@dhcp159-207 bdown]# cp d1/f10 d1/f11 
cp: cannot stat `d1/f10': No such file or directory 

change metadata for file hashed and cached  to down sub-volume
[root@dhcp159-207 bdown]# chmod 777 f12 
chmod: cannot access `f12': No such file or directory







Actual results:
few operations give error “No such file or directory” 

Expected results:
If brick/Sub-volume is down then any attempt to access/modify file which is hashed and cached on down sub-volume should give error “ Transport endpoint is not connected” (rather than “No such file or directory” )

Additional info:

Comment 1 Rachana Patel 2013-01-24 05:09:16 UTC
Created attachment 686454 [details]
log

Comment 3 Scott Haines 2013-02-06 20:07:43 UTC
Per Feb-06 bug triage meeting, targeting for 2.1.0.

Comment 4 Scott Haines 2013-02-06 20:10:41 UTC
Per Feb-06 bug triage meeting, targeting for 2.1.0.

Comment 5 shishir gowda 2013-02-18 10:43:14 UTC
Fixed as part of bug 893378 (http://review.gluster.org/#change,4383).

*** This bug has been marked as a duplicate of bug 893378 ***

Comment 7 Rachana Patel 2013-06-04 14:43:16 UTC
able to reproduce in :-
3.4.0.8rhs-1.el6rhs.x86_64

server:-


[root@mia ~]# gluster v status test1
Status of volume: test1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/t1	49154	Y32380
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/t1		N/A	N11173
Brick cutlass.lab.eng.blr.redhat.com:/rhs/brick1/t1	49154	Y8989
NFS Server on localhost					2049	Y11183
NFS Server on c5154da1-be15-40e2-b5f3-9be6dadafd43	2049	Y8999
NFS Server on a37ff566-da82-4ae4-90c6-17763466fd36	2049	Y15188
NFS Server on 292b158a-7650-4e09-9bc0-71e392f0d0c1	2049	Y32390
 
There are no active volume tasks
[root@cutlass ~]# ls -l /rhs/brick1/t1/newf52
ls: cannot access /rhs/brick1/t1/newf52: No such file or directory

[root@mia ~]# ls -l /rhs/brick1/t1/newf52
-rw-r--r-- 2 root root 0 Jun  4 02:14 /rhs/brick1/t1/newf52

[root@fred ~]# ls -l /rhs/brick1/t1/newf52
ls: cannot access /rhs/brick1/t1/newf52: No such file or directory


for newf52 file, hashed and cached sub-vol is down

on mount point:-

[root@rhsauto037 test1nfs]# touch newf52
touch: cannot touch `newf52': Input/output error

[root@rhsauto037 test1nfs]# cp file109 newf52
cp: cannot create regular file `newf52': Input/output error


Expected results:
If brick/Sub-volume is down then any attempt to access/modify file which is hashed and cached on down sub-volume should give error “ Transport endpoint is not connected”

Comment 10 shishir gowda 2013-06-10 06:39:25 UTC
Looks like NFS is converting ENOTCONN error to EIO. The failures are not seen on fuse clients:

Create related:

[2013-06-10 06:33:06.002267] W [client-rpc-fops.c:2058:client3_3_create_cbk] 0-sng-client-0: remote oper
ation failed: Transport endpoint is not connected. Path: /new3
[2013-06-10 06:33:06.002307] W [nfs3.c:2354:nfs3svc_create_cbk] 0-nfs: 7f08d02a: /new3 => -1 (Transport 
endpoint is not connected) <========ENOTCONN error
[2013-06-10 06:33:06.002356] W [nfs3-helpers.c:3460:nfs3_log_newfh_res] 0-nfs-nfsv3: XID: 7f08d02a, CREA
TE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), FH: exportid 00000000-0000-0000-
0000-000000000000, gfid 00000000-0000-0000-0000-000000000000 <=====EIO error

Rename related:
[2013-06-10 06:37:01.918575] W [nfs3.c:3663:nfs3svc_rename_cbk] 0-nfs: a108d02a: rename /new1 -> /new3 =
> -1 (Transport endpoint is not connected) <==========ENOTCONN error
[2013-06-10 06:37:01.918615] W [nfs3-helpers.c:3391:nfs3_log_common_res] 0-nfs-nfsv3: XID: a108d02a, REN
AME: NFS: 5(I/O error), POSIX: 14(Bad address) <=========EIO error

Comment 11 santosh pradhan 2013-06-10 08:31:34 UTC
BZ 903476 and BZ 860915 are similar, so marking this as duplicate of 860915. Will update the analysis in BZ 860915.

*** This bug has been marked as a duplicate of bug 860915 ***

Comment 12 Rachana Patel 2013-06-11 06:25:55 UTC
I must say, detailed information with necessary links helped a lot in understanding the root cause(comment #11 of bug 860915). Thanks, Santosh. 

Note:-
1) removing duplicate - bug 860915. 
reason - agreed that root cause is same but as steps are different. one is about Dir creation and other is file acess/modify/creation.
(as per Amar's mail to storage-eng 'Important: Steps to marking a bug as duplicate' , date - May 13, 2013 )

2) opening new bug for documentation and resigning this defect to Shishir
reason - This problem has been fixed in DHT - FUSE mount. So this defect can be used to track that problem in future and for NFS mount would open new defect and assigned it to Doc team

3) As this defect is not reproducible with latest build(3.4.0.9rhs-1.el6.x86_64) on Fuse mount marking it as verified.

Comment 13 Scott Haines 2013-09-23 22:29:52 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.