Description of problem: DHT: - If brick/Sub-volume is down then any attempt to access/modify file which is hashed and cached on down sub-volume should give error “ Transport endpoint is not connected” (rather than “No such file or directory” ) Version-Release number of selected component (if applicable): 3.3.0.5rhs-40.el6rhs.x86_64 How reproducible: always Steps to Reproduce: 1. Create a Distributed volume having 3 or more sub-volumes on multiple server and start that volume. 2. Fuse Mount the volume from the client-1 using “mount -t glusterfs server:/<volume> <client-1_mount_point>” 3. From mount point create some dirs and files inside it. 4. bring on sub-volume down by killing brick process [root@dhcp159-165 glusterd]# gluster volume status bdown Status of volume: bdown Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.16.159.157:/home/bdown1 24011 Y 26213 Brick 10.16.159.165:/home/bdown1 24013 N 28792 Brick 10.16.159.158:/home/bdown1 24011 Y 26189 NFS Server on localhost 38467 Y 16589 NFS Server on 10.16.159.223 38467 Y 16876 NFS Server on 10.16.159.207 38467 Y 16848 NFS Server on 10.16.159.158 38467 Y 13990 NFS Server on 10.16.159.157 38467 Y 14035 5. from mount point try to access/modify files which are hashed and cached on down sub-volume server .165 [root@dhcp159-165 glusterd]# ls /home/bdown1/f* /home/bdown1/f12 /home/bdown1/f34 /home/bdown1/f44 /home/bdown1/f55 /home/bdown1/f13 /home/bdown1/f35 /home/bdown1/f48 /home/bdown1/f58 /home/bdown1/f17 /home/bdown1/f39 /home/bdown1/f50 /home/bdown1/f60 /home/bdown1/f24 /home/bdown1/f4 /home/bdown1/f52 /home/bdown1/f8 /home/bdown1/f28 /home/bdown1/f41 /home/bdown1/f53 [root@dhcp159-165 glusterd]# ls /home/bdown1/d1/f* /home/bdown1/d1/f1 /home/bdown1/d1/f22 /home/bdown1/d1/f5 /home/bdown1/d1/f10 /home/bdown1/d1/f23 /home/bdown1/d1/f51 /home/bdown1/d1/f18 /home/bdown1/d1/f29 /home/bdown1/d1/f56 /home/bdown1/d1/f19 /home/bdown1/d1/f42 /home/bdown1/d1/f59 /home/bdown1/d1/f2 /home/bdown1/d1/f46 /home/bdown1/d1/f20 /home/bdown1/d1/f47 from mount point remove file hashed/cached to down sub-volume [root@dhcp159-207 bdown]# rm f12 rm: cannot remove `f12': No such file or directory remove file hashed/cached to down sub-volume [root@dhcp159-207 bdown]# unlink f12 unlink: cannot unlink `f12': No such file or directory remove file hashed/cached to down sub-volume [root@dhcp159-207 bdown]# unlink d1/f1 unlink: cannot unlink `d1/f1': No such file or directory remove file hashed/cached to down sub-volume [root@dhcp159-207 bdown]# rm d1/f10 rm: cannot remove `d1/f10': No such file or directory rename file hashed and cached to down sub-volume [root@dhcp159-207 bdown]# mv f12 f11 mv: cannot stat `f12': No such file or directory rename file hashed and cached to down sub-volume [root@dhcp159-207 bdown]# mv d1/f10 d1/f11 mv: cannot stat `d1/f10': No such file or directory rename – destination file hashed and cached to down sub-volume [root@dhcp159-207 bdown]# mv f15 f12 mv: cannot move `f15' to `f12': Transport endpoint is not connected file hashed and cached to down sub-volume [root@dhcp159-207 bdown]# cat f12 cat: f12: No such file or directory [root@dhcp159-207 bdown]# less f12 f12: No such file or directory [root@dhcp159-207 bdown]# echo abc > f12 -bash: f12: Transport endpoint is not connected copy file hashed and cached to down sub-volume [root@dhcp159-207 bdown]# cp f12 f11 cp: cannot stat `f12': No such file or directory copy file hashed and cached to down sub-volume [root@dhcp159-207 bdown]# cp d1/f10 d1/f11 cp: cannot stat `d1/f10': No such file or directory change metadata for file hashed and cached to down sub-volume [root@dhcp159-207 bdown]# chmod 777 f12 chmod: cannot access `f12': No such file or directory Actual results: few operations give error “No such file or directory” Expected results: If brick/Sub-volume is down then any attempt to access/modify file which is hashed and cached on down sub-volume should give error “ Transport endpoint is not connected” (rather than “No such file or directory” ) Additional info:
Created attachment 686454 [details] log
Per Feb-06 bug triage meeting, targeting for 2.1.0.
Fixed as part of bug 893378 (http://review.gluster.org/#change,4383). *** This bug has been marked as a duplicate of bug 893378 ***
able to reproduce in :- 3.4.0.8rhs-1.el6rhs.x86_64 server:- [root@mia ~]# gluster v status test1 Status of volume: test1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/t1 49154 Y32380 Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/t1 N/A N11173 Brick cutlass.lab.eng.blr.redhat.com:/rhs/brick1/t1 49154 Y8989 NFS Server on localhost 2049 Y11183 NFS Server on c5154da1-be15-40e2-b5f3-9be6dadafd43 2049 Y8999 NFS Server on a37ff566-da82-4ae4-90c6-17763466fd36 2049 Y15188 NFS Server on 292b158a-7650-4e09-9bc0-71e392f0d0c1 2049 Y32390 There are no active volume tasks [root@cutlass ~]# ls -l /rhs/brick1/t1/newf52 ls: cannot access /rhs/brick1/t1/newf52: No such file or directory [root@mia ~]# ls -l /rhs/brick1/t1/newf52 -rw-r--r-- 2 root root 0 Jun 4 02:14 /rhs/brick1/t1/newf52 [root@fred ~]# ls -l /rhs/brick1/t1/newf52 ls: cannot access /rhs/brick1/t1/newf52: No such file or directory for newf52 file, hashed and cached sub-vol is down on mount point:- [root@rhsauto037 test1nfs]# touch newf52 touch: cannot touch `newf52': Input/output error [root@rhsauto037 test1nfs]# cp file109 newf52 cp: cannot create regular file `newf52': Input/output error Expected results: If brick/Sub-volume is down then any attempt to access/modify file which is hashed and cached on down sub-volume should give error “ Transport endpoint is not connected”
Looks like NFS is converting ENOTCONN error to EIO. The failures are not seen on fuse clients: Create related: [2013-06-10 06:33:06.002267] W [client-rpc-fops.c:2058:client3_3_create_cbk] 0-sng-client-0: remote oper ation failed: Transport endpoint is not connected. Path: /new3 [2013-06-10 06:33:06.002307] W [nfs3.c:2354:nfs3svc_create_cbk] 0-nfs: 7f08d02a: /new3 => -1 (Transport endpoint is not connected) <========ENOTCONN error [2013-06-10 06:33:06.002356] W [nfs3-helpers.c:3460:nfs3_log_newfh_res] 0-nfs-nfsv3: XID: 7f08d02a, CREA TE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), FH: exportid 00000000-0000-0000- 0000-000000000000, gfid 00000000-0000-0000-0000-000000000000 <=====EIO error Rename related: [2013-06-10 06:37:01.918575] W [nfs3.c:3663:nfs3svc_rename_cbk] 0-nfs: a108d02a: rename /new1 -> /new3 = > -1 (Transport endpoint is not connected) <==========ENOTCONN error [2013-06-10 06:37:01.918615] W [nfs3-helpers.c:3391:nfs3_log_common_res] 0-nfs-nfsv3: XID: a108d02a, REN AME: NFS: 5(I/O error), POSIX: 14(Bad address) <=========EIO error
BZ 903476 and BZ 860915 are similar, so marking this as duplicate of 860915. Will update the analysis in BZ 860915. *** This bug has been marked as a duplicate of bug 860915 ***
I must say, detailed information with necessary links helped a lot in understanding the root cause(comment #11 of bug 860915). Thanks, Santosh. Note:- 1) removing duplicate - bug 860915. reason - agreed that root cause is same but as steps are different. one is about Dir creation and other is file acess/modify/creation. (as per Amar's mail to storage-eng 'Important: Steps to marking a bug as duplicate' , date - May 13, 2013 ) 2) opening new bug for documentation and resigning this defect to Shishir reason - This problem has been fixed in DHT - FUSE mount. So this defect can be used to track that problem in future and for NFS mount would open new defect and assigned it to Doc team 3) As this defect is not reproducible with latest build(3.4.0.9rhs-1.el6.x86_64) on Fuse mount marking it as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html