Bug 903476

Summary:

DHT: - If brick/Sub-volume is down then any attempt to create/access/modify file which is hashed and cached on down sub-volume should give error “ Transport endpoint is not connected” (rather than “No such file or directory” or "Input/output error")

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Rachana Patel <racpatel>

Component:

glusterfs

Assignee:

shishir gowda <sgowda>

Status:

CLOSED ERRATA

QA Contact:

amainkar

Severity:

medium

Docs Contact:

Priority:

medium

Version:

2.0

CC:

nsathyan, rhs-bugs, vbellur

Target Milestone:

---

Keywords:

Reopened

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-09-23 22:29:52 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
log	none

Description Rachana Patel 2013-01-24 05:07:18 UTC

Description of problem:
DHT: - If brick/Sub-volume is down then any attempt to access/modify file which is hashed and cached on down sub-volume should give error “ Transport endpoint is not connected” (rather than “No such file or directory” )

Version-Release number of selected component (if applicable):
3.3.0.5rhs-40.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
1. Create a Distributed volume having 3 or more sub-volumes on multiple server and start that volume.

2. Fuse Mount the volume from the client-1 using “mount -t glusterfs  server:/<volume> <client-1_mount_point>”

3. From mount point create some dirs and files inside it.

4. bring on sub-volume down by killing brick process
[root@dhcp159-165 glusterd]# gluster volume status bdown
Status of volume: bdown
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.16.159.157:/home/bdown1			24011	Y	26213
Brick 10.16.159.165:/home/bdown1			24013	N	28792
Brick 10.16.159.158:/home/bdown1			24011	Y	26189
NFS Server on localhost					38467	Y	16589
NFS Server on 10.16.159.223				38467	Y	16876
NFS Server on 10.16.159.207				38467	Y	16848
NFS Server on 10.16.159.158				38467	Y	13990
NFS Server on 10.16.159.157				38467	Y	14035
 
5. from mount point try to access/modify files which are hashed and cached on down sub-volume

server .165
[root@dhcp159-165 glusterd]# ls /home/bdown1/f*
/home/bdown1/f12  /home/bdown1/f34  /home/bdown1/f44  /home/bdown1/f55
/home/bdown1/f13  /home/bdown1/f35  /home/bdown1/f48  /home/bdown1/f58
/home/bdown1/f17  /home/bdown1/f39  /home/bdown1/f50  /home/bdown1/f60
/home/bdown1/f24  /home/bdown1/f4   /home/bdown1/f52  /home/bdown1/f8
/home/bdown1/f28  /home/bdown1/f41  /home/bdown1/f53

[root@dhcp159-165 glusterd]# ls /home/bdown1/d1/f*
/home/bdown1/d1/f1   /home/bdown1/d1/f22  /home/bdown1/d1/f5
/home/bdown1/d1/f10  /home/bdown1/d1/f23  /home/bdown1/d1/f51
/home/bdown1/d1/f18  /home/bdown1/d1/f29  /home/bdown1/d1/f56
/home/bdown1/d1/f19  /home/bdown1/d1/f42  /home/bdown1/d1/f59
/home/bdown1/d1/f2   /home/bdown1/d1/f46
/home/bdown1/d1/f20  /home/bdown1/d1/f47


from mount point remove file hashed/cached to down sub-volume
[root@dhcp159-207 bdown]# rm f12 
rm: cannot remove `f12': No such file or directory 

remove file hashed/cached  to down sub-volume
[root@dhcp159-207 bdown]# unlink f12 
unlink: cannot unlink `f12': No such file or directory 

remove file hashed/cached to down sub-volume
[root@dhcp159-207 bdown]# unlink d1/f1 
unlink: cannot unlink `d1/f1': No such file or directory 

remove file hashed/cached  to down sub-volume
[root@dhcp159-207 bdown]# rm d1/f10 
rm: cannot remove `d1/f10': No such file or directory 

rename file hashed and cached  to down sub-volume
[root@dhcp159-207 bdown]# mv f12 f11 
mv: cannot stat `f12': No such file or directory 

rename file hashed and cached  to down sub-volume
[root@dhcp159-207 bdown]# mv d1/f10 d1/f11 
mv: cannot stat `d1/f10': No such file or directory 


rename – destination  file hashed and cached  to down sub-volume
[root@dhcp159-207 bdown]# mv f15 f12 
mv: cannot move `f15' to `f12': Transport endpoint is not connected 

file hashed and cached  to down sub-volume
[root@dhcp159-207 bdown]# cat f12 
cat: f12: No such file or directory 
[root@dhcp159-207 bdown]# less f12 
f12: No such file or directory 
[root@dhcp159-207 bdown]# echo abc > f12 
-bash: f12: Transport endpoint is not connected 

copy file hashed and cached  to down sub-volume
[root@dhcp159-207 bdown]# cp f12 f11 
cp: cannot stat `f12': No such file or directory 

copy file hashed and cached  to down sub-volume
[root@dhcp159-207 bdown]# cp d1/f10 d1/f11 
cp: cannot stat `d1/f10': No such file or directory 

change metadata for file hashed and cached  to down sub-volume
[root@dhcp159-207 bdown]# chmod 777 f12 
chmod: cannot access `f12': No such file or directory







Actual results:
few operations give error “No such file or directory” 

Expected results:
If brick/Sub-volume is down then any attempt to access/modify file which is hashed and cached on down sub-volume should give error “ Transport endpoint is not connected” (rather than “No such file or directory” )

Additional info:

Comment 1 Rachana Patel 2013-01-24 05:09:16 UTC

Created attachment 686454 [details]
log

Comment 3 Scott Haines 2013-02-06 20:07:43 UTC

Per Feb-06 bug triage meeting, targeting for 2.1.0.

Comment 4 Scott Haines 2013-02-06 20:10:41 UTC

Per Feb-06 bug triage meeting, targeting for 2.1.0.

Comment 5 shishir gowda 2013-02-18 10:43:14 UTC

Fixed as part of bug 893378 (http://review.gluster.org/#change,4383).

*** This bug has been marked as a duplicate of bug 893378 ***

Comment 7 Rachana Patel 2013-06-04 14:43:16 UTC

able to reproduce in :-
3.4.0.8rhs-1.el6rhs.x86_64

server:-


[root@mia ~]# gluster v status test1
Status of volume: test1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/t1	49154	Y32380
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/t1		N/A	N11173
Brick cutlass.lab.eng.blr.redhat.com:/rhs/brick1/t1	49154	Y8989
NFS Server on localhost					2049	Y11183
NFS Server on c5154da1-be15-40e2-b5f3-9be6dadafd43	2049	Y8999
NFS Server on a37ff566-da82-4ae4-90c6-17763466fd36	2049	Y15188
NFS Server on 292b158a-7650-4e09-9bc0-71e392f0d0c1	2049	Y32390
 
There are no active volume tasks
[root@cutlass ~]# ls -l /rhs/brick1/t1/newf52
ls: cannot access /rhs/brick1/t1/newf52: No such file or directory

[root@mia ~]# ls -l /rhs/brick1/t1/newf52
-rw-r--r-- 2 root root 0 Jun  4 02:14 /rhs/brick1/t1/newf52

[root@fred ~]# ls -l /rhs/brick1/t1/newf52
ls: cannot access /rhs/brick1/t1/newf52: No such file or directory


for newf52 file, hashed and cached sub-vol is down

on mount point:-

[root@rhsauto037 test1nfs]# touch newf52
touch: cannot touch `newf52': Input/output error

[root@rhsauto037 test1nfs]# cp file109 newf52
cp: cannot create regular file `newf52': Input/output error


Expected results:
If brick/Sub-volume is down then any attempt to access/modify file which is hashed and cached on down sub-volume should give error “ Transport endpoint is not connected”

Comment 10 shishir gowda 2013-06-10 06:39:25 UTC

Looks like NFS is converting ENOTCONN error to EIO. The failures are not seen on fuse clients:

Create related:

[2013-06-10 06:33:06.002267] W [client-rpc-fops.c:2058:client3_3_create_cbk] 0-sng-client-0: remote oper
ation failed: Transport endpoint is not connected. Path: /new3
[2013-06-10 06:33:06.002307] W [nfs3.c:2354:nfs3svc_create_cbk] 0-nfs: 7f08d02a: /new3 => -1 (Transport 
endpoint is not connected) <========ENOTCONN error
[2013-06-10 06:33:06.002356] W [nfs3-helpers.c:3460:nfs3_log_newfh_res] 0-nfs-nfsv3: XID: 7f08d02a, CREA
TE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), FH: exportid 00000000-0000-0000-
0000-000000000000, gfid 00000000-0000-0000-0000-000000000000 <=====EIO error

Rename related:
[2013-06-10 06:37:01.918575] W [nfs3.c:3663:nfs3svc_rename_cbk] 0-nfs: a108d02a: rename /new1 -> /new3 =
> -1 (Transport endpoint is not connected) <==========ENOTCONN error
[2013-06-10 06:37:01.918615] W [nfs3-helpers.c:3391:nfs3_log_common_res] 0-nfs-nfsv3: XID: a108d02a, REN
AME: NFS: 5(I/O error), POSIX: 14(Bad address) <=========EIO error

Comment 11 santosh pradhan 2013-06-10 08:31:34 UTC

BZ 903476 and BZ 860915 are similar, so marking this as duplicate of 860915. Will update the analysis in BZ 860915.

*** This bug has been marked as a duplicate of bug 860915 ***

Comment 12 Rachana Patel 2013-06-11 06:25:55 UTC

I must say, detailed information with necessary links helped a lot in understanding the root cause(comment #11 of bug 860915). Thanks, Santosh. 

Note:-
1) removing duplicate - bug 860915. 
reason - agreed that root cause is same but as steps are different. one is about Dir creation and other is file acess/modify/creation.
(as per Amar's mail to storage-eng 'Important: Steps to marking a bug as duplicate' , date - May 13, 2013 )

2) opening new bug for documentation and resigning this defect to Shishir
reason - This problem has been fixed in DHT - FUSE mount. So this defect can be used to track that problem in future and for NFS mount would open new defect and assigned it to Doc team

3) As this defect is not reproducible with latest build(3.4.0.9rhs-1.el6.x86_64) on Fuse mount marking it as verified.

Comment 13 Scott Haines 2013-09-23 22:29:52 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html