Description of problem: DHT : If brick is down (where root directory is hashing) then lookup on nfs mount gives error ' cannot open directory .: Input/output error' Version-Release number of selected component (if applicable): 3.3.0.3rhs-33.el6rhs.x86_64 How reproducible: always Steps to Reproduce: 1. Create a Distributed volume having 3 or more sub-volumes on multiple server and start that volume. [root@Rhs1 ~]# gluster volume info San_11 Volume Name: San_11 Type: Distribute Volume ID: 59df39cb-f6f1-4514-a8a1-d7163f06d962 Status: Started Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: 10.70.35.81:/home/san1 Brick2: 10.70.35.81:/home/san2 Brick3: 10.70.35.85:/home/san1 Brick4: 10.70.35.85:/home/san2 Brick5: 10.70.35.86:/home/san1 2. nfs mount the volume from the client-1 and also FUSE mount the same volume 3. From mount point create some dirs and files inside it 4. Find where root directory is hashing [root@Rhs1 ~]# getfattr -d -m . -e hex /home/san2 getfattr: Removing leading '/' from absolute path names # file: home/san2 security.selinux=0x73797374656d5f753a6f626a6563745f723a686f6d655f726f6f745f743a733000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000000000000033333332 trusted.glusterfs.volume-id=0x59df39cbf6f14514a8a1d7163f06d962 5. Bring that brick down by killing the process. [root@Rhs1 ~]# gluster volume status San_11 Status of volume: San_11 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.81:/home/san1 24019 Y 12761 Brick 10.70.35.81:/home/san2 24020 N 12767 Brick 10.70.35.85:/home/san1 24223 Y 12845 Brick 10.70.35.85:/home/san2 24224 Y 12850 Brick 10.70.35.86:/home/san1 24230 Y 12819 NFS Server on localhost 38467 Y 12857 NFS Server on 10.70.35.86 38467 Y 12825 NFS Server on 10.70.35.85 38467 Y 12857 6. execute ls command on both mount point Fuse mount is listing Dirs but nfs mount is giving below error [root@client nfs1]# ls ls: cannot open directory .: Input/output error Actual results: Input/output error Expected results: It should list all Directories and files(not hashed on down sub-vol) Additional info:
Please attach the nfs server logs.
Created attachment 629846 [details] server log
It might be related the above mentioned bug, but there are no similar failure error messages. Seeing these errors in the log. Need input nfs SME's. [2012-10-17 12:19:40.383182] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-San_11-client-1: remote op eration failed: Transport endpoint is not connected. Path: / (00000000-0000-0000-0000-000000000001) [2012-10-17 12:19:40.383876] W [client3_1-fops.c:1332:client3_1_access_cbk] 0-San_11-client-1: remote op eration failed: Transport endpoint is not connected [2012-10-17 12:19:40.383925] W [nfs3.c:1491:nfs3svc_access_cbk] 0-nfs: 3bb08886: / => -1 (Transport endpoint is not connected) [2012-10-17 12:19:40.383953] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 3bb08886, ACCESS: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected)
The reported confirmed that on parallel access on fuse mount listed all Directories and files(not hashed on down sub-vol)
the behaviour or both nfs and fuse mount are the same, as of the latest git HEAD 232adb88512274863c9f5ad51569695af80bd6c0. rachana, could you confirm the finding?
This is reproducible. Found that dht_access returns the EIO error as-is if one brick is down. re-assigning.
http://review.gluster.org/4240 is posted for review upstream, once in, will be backported and merged to downstream
CHANGE: http://review.gluster.org/4240 (cluster/dht: send ACCESS call on dir to first_up_subvol if cached is down) merged in master by Vijay Bellur (vbellur)
verified this on 3.4.0qa5 Now it is not giving error ' cannot open directory .: Input/output error' and shows files and directory but if hashed sub-volume is down for directory it says ' ls: cannot access d37: Invalid argument' we already have defect for that issue - https://bugzilla.redhat.com/show_bug.cgi?id=856459 so closing this as verified.
CHANGE: http://review.gluster.org/4421 (bug-867253.t: do a clean umount at the end) merged in master by Anand Avati (avati)
found this defect on -3.3.0.6rhs-4.el6.x86_64 DHT : If brick is down (where root directory is hashing) then lookup on nfs mount gives error ' cannot open directory .: Input/output error' - same as original defect- fuse mount is not giving any error but nfs mount is giving. so reopening defect info :- [root@cutlass tmp]# gluster v status 64-fuse Status of volume: 64-fuse Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick fred.lab.eng.blr.redhat.com:/brick1/6.4-fuse 24025 Y 6063 Brick fan.lab.eng.blr.redhat.com:/brick1/6.4-fuse 24020 N 18113 Brick mia.lab.eng.blr.redhat.com:/brick1/6.4-fuse 24017 Y 27197 NFS Server on localhost 38467 Y 31344 NFS Server on fred.lab.eng.blr.redhat.com 38467 Y 6102 NFS Server on 10.70.34.91 38467 Y 18196 NFS Server on mia.lab.eng.blr.redhat.com 38467 Y 17908 [root@fan tmp]# getfattr -d -m . -e hex /brick1/6.4-fuse/ getfattr: Removing leading '/' from absolute path names # file: brick1/6.4-fuse/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000000000000055555554 trusted.glusterfs.volume-id=0x8a5d6a111cda4406b818c413e5ae0968 [root@fred tmp]# getfattr -d -m . -e hex /brick1/6.4-fuse/ getfattr: Removing leading '/' from absolute path names # file: brick1/6.4-fuse/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff trusted.glusterfs.volume-id=0x8a5d6a111cda4406b818c413e5ae0968 [root@mia tmp]# getfattr -d -m . -e hex /brick1/6.4-fuse/ getfattr: Removing leading '/' from absolute path names # file: brick1/6.4-fuse/ trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9 trusted.glusterfs.volume-id=0x8a5d6a111cda4406b818c413e5ae0968 nfs mount :- [root@rhsauto037 test2]# ls ls: cannot open directory .: Input/output error fuse mount :- [root@rhsauto037 test1]# ls d12 d15 d2 d23 d30 d33 d36 d40 d43 d48 d50 d9 f11 f16 f2 f22 f26 f3 f32 f37 f42 f46 f5 f9 d13 d16 d21 d25 d31 d34 d39 d41 d45 d49 d6 f1 f14 f18 f20 f23 f27 f30 f33 f38 f43 f47 f6 d14 d17 d22 d28 d32 d35 d4 d42 d47 d5 d7 f10 f15 f19 f21 f25 f29 f31 f36 f40 f45 f49 f7
verified on 3.4.0.4rhs-1.el6rhs.x86_64, working as per expectation, hence marking it as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html