Description of problem: ====================== Bring sub-volume down and create Directories and files inside it. Once sub-volume is up again; access of that Directory fails with ' No such file or directory' error [root@OVM3 new]# ls abc down new [root@OVM3 new]# cd down -bash: cd: down: No such file or directory [root@OVM3 new]# ls down ls: cannot open directory down: No such file or directory Version-Release number of selected component (if applicable): ============================================================= 3.6.0.25-1.el6rhs.x86_64 How reproducible: ================= always Steps to Reproduce: =================== 1. cre[root@OVM3 ~]# gluster volume status new Status of volume: new Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.198:/brick3/n1 49183 Y 17907 Brick 10.70.35.198:/brick3/n2 49184 Y 17918 Brick 10.70.35.198:/brick3/n3 49185 Y 17929 Brick 10.70.35.198:/brick3/n4 49186 Y 17940 NFS Server on localhost 2049 Y 17953 NFS Server on 10.70.35.172 2049 Y 11804 NFS Server on 10.70.35.240 2049 Y 16587 Task Status of Volume new ------------------------------------------------------------------------------ There are no active volume tasks [root@OVM3 ~]# kill -9 17907 ate distributed volume and bring one or more sub-volumes down. 2. create Directories and files inside it [root@OVM3 new]# mkdir down [root@OVM3 new]# touch down/f{1..100} 3. bring all sub-volumes up 4. access Directory from mount point [root@OVM3 new]# cd down -bash: cd: down: No such file or directory [root@OVM3 new]# ls down ls: cannot open directory down: No such file or directory Actual results: =============== Directory and data inside it not accessible. lookup is not healing Directory on previously sown subvolume Expected results: ================= Directory and data should be accessible when all sub-volumes are up. look up should heal Directory on previously down sub-volume Additional info: ================= [2014-08-01 10:27:21.738422] W [client-rpc-fops.c:1357:client3_3_access_cbk] 0-new-client-0: remote operation failed: Stale file handle [2014-08-01 10:27:21.738565] W [client-rpc-fops.c:1357:client3_3_access_cbk] 0-new-client-0: remote operation failed: Stale file handle [2014-08-01 10:27:21.740668] W [client-rpc-fops.c:2761:client3_3_lookup_cbk] 0-new-client-0: remote operation failed: No such file or directory. Path: <gfid:97616c4c-2aea-473e-8fae-3a53576439e3> (97616c4c-2aea-473e-8fae-3a53576439e3) [2014-08-01 10:27:21.740836] W [client-rpc-fops.c:2761:client3_3_lookup_cbk] 0-new-client-0: remote operation failed: No such file or directory. Path: <gfid:97616c4c-2aea-473e-8fae-3a53576439e3> (97616c4c-2aea-473e-8fae-3a53576439e3) [2014-08-01 10:27:21.740884] E [dht-helper.c:813:dht_migration_complete_check_task] 0-new-dht: <gfid:97616c4c-2aea-473e-8fae-3a53576439e3>: failed to lookup the file on new-client-0 [2014-08-01 10:27:21.740950] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 198c87cc: <gfid:97616c4c-2aea-473e-8fae-3a53576439e3> => -1 (No such file or directory) [2014-08-01 10:27:21.740974] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 198c87cc, ACCESS: NFS: 2(No such file or directory), POSIX: 2(No such file or directory) [2014-08-01 10:27:21.741160] E [dht-helper.c:813:dht_migration_complete_check_task] 0-new-dht: <gfid:97616c4c-2aea-473e-8fae-3a53576439e3>: failed to lookup the file on new-client-0 [2014-08-01 10:27:21.741191] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 188c87cc: <gfid:97616c4c-2aea-473e-8fae-3a53576439e3> => -1 (No such file or directory) [2014-08-01 10:27:21.741230] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 188c87cc, ACCESS: NFS: 2(No such file or directory), POSIX: 2(No such file or directory)
Created attachment 926543 [details] Brief test case Tested this using the attached test script post the fix presented here was applied to upstream code, http://review.gluster.org/#/c/8462/ The test case passed. A lot of the TC is commented out as the kill was not working properly, so did manual steps post the point where things were commented out. Susant, can we try this test case before and after the dht_access fix, so that we know we have fixed the regression?
Shyam, Here is update on the patch. Tried without patch and here is the result: [root@vm50 mnt1]# kill -9 5881 [root@vm50 mnt1]# mkdir down [root@vm50 mnt1]# ls down [root@vm50 mnt1]# touch down/f{1..100} [root@vm50 mnt1]# gluster v start test1 force volume start: test1: success [root@vm50 mnt1]# cd down [root@vm50 down]# ls ls: cannot open directory .: No such file or directory [root@vm50 down]# ls ls: cannot open directory .: No such file or directory [root@vm50 down]# ls ls: cannot open directory .: No such file or directory And with the patch: [root@vm50 mnt1]# kill -9 10866 [root@vm50 mnt1]# mkdir down [root@vm50 mnt1]# touch down/f{1..100} [root@vm50 mnt1]# ls down [root@vm50 mnt1]# cd down/^C [root@vm50 mnt1]# gluster v start test1 force volume start: test1: success [root@vm50 mnt1]# cd down/ [root@vm50 down]# ls f1 f12 f16 f2 f23 f27 f30 f34 f38 f41 f45 f49 f52 f56 f6 f63 f67 f70 f74 f78 f81 f85 f89 f92 f96 f10 f13 f17 f20 f24 f28 f31 f35 f39 f42 f46 f5 f53 f57 f60 f64 f68 f71 f75 f79 f82 f86 f9 f93 f97 f100 f14 f18 f21 f25 f29 f32 f36 f4 f43 f47 f50 f54 f58 f61 f65 f69 f72 f76 f8 f83 f87 f90 f94 f98 f11 f15 f19 f22 f26 f3 f33 f37 f40 f44 f48 f51 f55 f59 f62 f66 f7 f73 f77 f80 f84 f88 f91 f95 f99 [root@vm50 down]# So everything looks good :)
verified with 3.6.0.28-1.el6rhs.x86_64 , working as expected hence moving to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html