Bug 960835

Summary: nfs:Unable to resolve FH(READDIR issue)
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Saurabh <saujain>
Component: glusterdAssignee: rjoseph
Status: CLOSED ERRATA QA Contact: Saurabh <saujain>
Severity: urgent Docs Contact:
Priority: high    
Version: 2.1CC: amarts, bturner, mzywusko, rhs-bugs, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.10rhs Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 965435 (view as bug list) Environment:
Last Closed: 2013-09-23 22:39:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 965435    

Description Saurabh 2013-05-08 06:01:01 UTC
Description of problem:
volume type:- 6x2

when executing rm -rf from different mount-point on two different clients.
mount point are again from different servers of the rhs cluster

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.4rhs-1.el6rhs.x86_64

How reproducible:
the logs are seen many a times.

Steps to Reproduce:
1. create a volume, start the volume using nodes, [a, b, c, d]
2. mount volume from node a and b on clients c1 and c2 respectively
3. create loads of data in the mount-point.(use only one mount point for creating data).

function for creating data:
    for i in range(10000):
        os.mkdir(mount_path_nfs + "/" + "%d"%(i))
        for j in range(100):
            os.mkdir(mount_path_nfs + "/" + "%d"%(i) + "/" + "%d"%(j))
            commands.getoutput("touch" + " " + mount_path_nfs + "/" + "%d"%(i) + "/" + "%d"%(j) + "/" + "%d"%(j) + ".file")

4. now start "rm -rf *" on both mount point as mentioned in step 2.

Actual results:

sometimes I find this error in nfs.log
[2013-05-07 22:50:05.984856] W [nfs3.c:4080:nfs3svc_readdir_fstat_cbk] 0-nfs: bc310819: <gfid:2a01dbe1-c740-4a10-8209-15ebc48db2e7> => -1 (No such file or directory)
[2013-05-07 22:50:05.984900] W [nfs3-helpers.c:3475:nfs3_log_readdir_res] 0-nfs-nfsv3: XID: bc310819, READDIR: NFS: 2(No such file or directory), POSIX: 2(No such file or directory), count: 32768, cverf: 36506500, is_eof: 0
[2013-05-07 22:50:05.988889] E [nfs3.c:3536:nfs3_rmdir_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.70.35.135:827) dist-rep : 6d2aabe8-93e5-4583-b049-406fd826776c
[2013-05-07 22:50:06.016279] E [nfs3.c:3393:nfs3_remove_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.70.35.135:827) dist-rep : 7faf1da8-3608-4d2e-98ed-06b511588045

Expected results:

FH resolution is not expected.

Additional info:
one more issue was found during these operations, files the BZ 960834

Comment 3 rjoseph 2013-05-20 11:39:42 UTC
The problem is reproduced. 

The error is shown in the case whenever NFS server gets a rmdir or remove call and the file/directory is already deleted.

Comment 4 Ben Turner 2013-05-20 17:53:32 UTC
I opened https://bugzilla.redhat.com/show_bug.cgi?id=901723 during Anshi testing and it appears to be the same issue as this.  Should we close 901723 as a DUP of this?

Comment 5 rjoseph 2013-05-21 11:38:53 UTC
Ben:

"Unable to resolve FH" error comes whenever the server fails to get the FH for the file. In this bug rmdir is called from two machines. Therefore while deleting a file/directory one of the machine might see this error because the other machine might have already deleted that entry.

But in your bug comment I see that you are trying to delete a file (f5) from the client machine, but the file is not present in any of the brick. This I think should be looked into separately.

Comment 9 Scott Haines 2013-09-23 22:39:41 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 10 Scott Haines 2013-09-23 22:43:47 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html