Description of problem: volume type:- 6x2 when executing rm -rf from different mount-point on two different clients. mount point are again from different servers of the rhs cluster Version-Release number of selected component (if applicable): glusterfs-3.4.0.4rhs-1.el6rhs.x86_64 How reproducible: the logs are seen many a times. Steps to Reproduce: 1. create a volume, start the volume using nodes, [a, b, c, d] 2. mount volume from node a and b on clients c1 and c2 respectively 3. create loads of data in the mount-point.(use only one mount point for creating data). function for creating data: for i in range(10000): os.mkdir(mount_path_nfs + "/" + "%d"%(i)) for j in range(100): os.mkdir(mount_path_nfs + "/" + "%d"%(i) + "/" + "%d"%(j)) commands.getoutput("touch" + " " + mount_path_nfs + "/" + "%d"%(i) + "/" + "%d"%(j) + "/" + "%d"%(j) + ".file") 4. now start "rm -rf *" on both mount point as mentioned in step 2. Actual results: sometimes I find this error in nfs.log [2013-05-07 22:50:05.984856] W [nfs3.c:4080:nfs3svc_readdir_fstat_cbk] 0-nfs: bc310819: <gfid:2a01dbe1-c740-4a10-8209-15ebc48db2e7> => -1 (No such file or directory) [2013-05-07 22:50:05.984900] W [nfs3-helpers.c:3475:nfs3_log_readdir_res] 0-nfs-nfsv3: XID: bc310819, READDIR: NFS: 2(No such file or directory), POSIX: 2(No such file or directory), count: 32768, cverf: 36506500, is_eof: 0 [2013-05-07 22:50:05.988889] E [nfs3.c:3536:nfs3_rmdir_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.70.35.135:827) dist-rep : 6d2aabe8-93e5-4583-b049-406fd826776c [2013-05-07 22:50:06.016279] E [nfs3.c:3393:nfs3_remove_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.70.35.135:827) dist-rep : 7faf1da8-3608-4d2e-98ed-06b511588045 Expected results: FH resolution is not expected. Additional info: one more issue was found during these operations, files the BZ 960834
The problem is reproduced. The error is shown in the case whenever NFS server gets a rmdir or remove call and the file/directory is already deleted.
I opened https://bugzilla.redhat.com/show_bug.cgi?id=901723 during Anshi testing and it appears to be the same issue as this. Should we close 901723 as a DUP of this?
Ben: "Unable to resolve FH" error comes whenever the server fails to get the FH for the file. In this bug rmdir is called from two machines. Therefore while deleting a file/directory one of the machine might see this error because the other machine might have already deleted that entry. But in your bug comment I see that you are trying to delete a file (f5) from the client machine, but the file is not present in any of the brick. This I think should be looked into separately.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html