763129 – (GLUSTER-1397) Cached dir fd_ts are a leakin'

Bug 763129 (GLUSTER-1397) - Cached dir fd_ts are a leakin'

Summary: Cached dir fd_ts are a leakin'

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	GLUSTER-1397
Product:	GlusterFS
Classification:	Community
Component:	nfs
Sub Component:
Version:	3.1-alpha
Hardware:	All
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Shehjar Tikoo
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-08-19 07:09 UTC by Shehjar Tikoo
Modified:	2015-12-01 16:45 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:
Regression:	RTP
Mount Type:	nfs
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Shehjar Tikoo 2010-08-19 07:09:49 UTC

Problem description from Krishna:
if direntries are more than 9-10
entries the NFS server does not close the dir fd_t. Because of this
backend FS can not be unmounted (as it gives EBUSY). 


Confirmed through log, which shows successive NFSv3 READDIR calls resulting in increasing references on the directory fd:

shehjart@indus:~$ grep "call_state_wipe.*fd ref" /tmp/dirfd2.log 
[2010-08-19 12:22:59.590367] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 3
[2010-08-19 12:22:59.591897] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 3
[2010-08-19 12:22:59.593443] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 4
[2010-08-19 12:22:59.594955] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 5
[2010-08-19 12:22:59.596459] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 6
[2010-08-19 12:22:59.597946] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 7
[2010-08-19 12:22:59.599418] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 8
[2010-08-19 12:22:59.600871] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 9
[2010-08-19 12:22:59.602312] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 10
[2010-08-19 12:22:59.603772] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 11
[2010-08-19 12:22:59.604958] T [nfs3.c:202:nfs3_call_state_wipe] nfs-nfsv3: fd ref: 11


These are refcounts on the fd just before they are unref as part of the operation to free up the per-NFS op state.

Comment 1 Anand Avati 2010-08-19 07:46:44 UTC

PATCH: http://patches.gluster.com/patch/4204 in master (protocol/client: fix ESTALE in statfs on root inode)

Comment 2 Amar Tumballi 2010-08-19 08:12:24 UTC

(In reply to comment #1)
> PATCH: http://patches.gluster.com/patch/4204 in master (protocol/client: fix
> ESTALE in statfs on root inode)

Sorry about the confusion.. this patch should have been for bug 763130.. This bug is not yet fixed.

Comment 3 Shehjar Tikoo 2010-08-20 10:22:51 UTC

The leaks are also present in hard fh resolution code where directory opens and reading is performed.

Comment 4 Vijay Bellur 2010-08-31 11:44:15 UTC

PATCH: http://patches.gluster.com/patch/4416 in master (nfs3: Dont ref cached fd after fd_lookup)

Comment 5 Vijay Bellur 2010-08-31 11:44:20 UTC

PATCH: http://patches.gluster.com/patch/4417 in master (nfs3: Dont ref dir fd_t used in hard fh resolution)

Comment 6 Vijay Bellur 2010-08-31 11:44:26 UTC

PATCH: http://patches.gluster.com/patch/4418 in master (nfs3: Unref dir fd once usage ends in hard fh resolution)

Comment 7 Shehjar Tikoo 2010-09-01 02:37:05 UTC

Regression Test:

1. Start nfs export as:

posix->proto/server->proto/client->nfs/server

in the same volume file.

2. A the client, mount as:

mount <server>:/posix -o soft,intr,actimeo=3600 /mnt

3. Run the following command:
$ mkdir -p /mnt//2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/

4. Restart the nfs server, without remounting at the client.

5. At the nfs client:
$ touch /mnt//2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19

6. Now at the nfs server:
$ kill -USR1 <pif of gnfs>

7. Inspect the glusterfsdump file. Ensure there are no more than one open file descriptors. If there are, there is a regression.

Note You need to log in before you can comment on or make changes to this bug.