Bug 763129 (GLUSTER-1397)

Summary: Cached dir fd_ts are a leakin'
Product: [Community] GlusterFS Reporter: Shehjar Tikoo <shehjart>
Component: nfsAssignee: Shehjar Tikoo <shehjart>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: 3.1-alphaCC: amarts, gluster-bugs, krishna
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: nfs
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Shehjar Tikoo 2010-08-19 03:09:49 EDT
Problem description from Krishna:
if direntries are more than 9-10
entries the NFS server does not close the dir fd_t. Because of this
backend FS can not be unmounted (as it gives EBUSY). 


Confirmed through log, which shows successive NFSv3 READDIR calls resulting in increasing references on the directory fd:

shehjart@indus:~$ grep "call_state_wipe.*fd ref" /tmp/dirfd2.log 
[2010-08-19 12:22:59.590367] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 3
[2010-08-19 12:22:59.591897] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 3
[2010-08-19 12:22:59.593443] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 4
[2010-08-19 12:22:59.594955] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 5
[2010-08-19 12:22:59.596459] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 6
[2010-08-19 12:22:59.597946] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 7
[2010-08-19 12:22:59.599418] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 8
[2010-08-19 12:22:59.600871] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 9
[2010-08-19 12:22:59.602312] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 10
[2010-08-19 12:22:59.603772] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 11
[2010-08-19 12:22:59.604958] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 11


These are refcounts on the fd just before they are unref as part of the operation to free up the per-NFS op state.
Comment 1 Anand Avati 2010-08-19 03:46:44 EDT
PATCH: http://patches.gluster.com/patch/4204 in master (protocol/client: fix ESTALE in statfs on root inode)
Comment 2 Amar Tumballi 2010-08-19 04:12:24 EDT
(In reply to comment #1)
> PATCH: http://patches.gluster.com/patch/4204 in master (protocol/client: fix
> ESTALE in statfs on root inode)

Sorry about the confusion.. this patch should have been for bug 763130.. This bug is not yet fixed.
Comment 3 Shehjar Tikoo 2010-08-20 06:22:51 EDT
The leaks are also present in hard fh resolution code where directory opens and reading is performed.
Comment 4 Vijay Bellur 2010-08-31 07:44:15 EDT
PATCH: http://patches.gluster.com/patch/4416 in master (nfs3: Dont ref cached fd after fd_lookup)
Comment 5 Vijay Bellur 2010-08-31 07:44:20 EDT
PATCH: http://patches.gluster.com/patch/4417 in master (nfs3: Dont ref dir fd_t used in hard fh resolution)
Comment 6 Vijay Bellur 2010-08-31 07:44:26 EDT
PATCH: http://patches.gluster.com/patch/4418 in master (nfs3: Unref dir fd once usage ends in hard fh resolution)
Comment 7 Shehjar Tikoo 2010-08-31 22:37:05 EDT
Regression Test:

1. Start nfs export as:

posix->proto/server->proto/client->nfs/server

in the same volume file.

2. A the client, mount as:

mount <server>:/posix -o soft,intr,actimeo=3600 /mnt

3. Run the following command:
$ mkdir -p /mnt//2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/

4. Restart the nfs server, without remounting at the client.

5. At the nfs client:
$ touch /mnt//2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19

6. Now at the nfs server:
$ kill -USR1 <pif of gnfs>

7. Inspect the glusterfsdump file. Ensure there are no more than one open file descriptors. If there are, there is a regression.