471254 – lockd: fix reference count leaks in async locking case (impacts GFS2)

Bug 471254 - lockd: fix reference count leaks in async locking case (impacts GFS2)

Summary: lockd: fix reference count leaks in async locking case (impacts GFS2)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.2.z
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Jeff Layton
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	514700
TreeView+	depends on / blocked

Reported:	2008-11-12 17:11 UTC by Mike Snitzer
Modified:	2009-09-02 08:48 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-09-02 08:48:07 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
reproducer: lock and unlock a file in a tight loop (599 bytes, text/plain) 2009-03-18 17:21 UTC, Jeff Layton	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:1243	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.4 kernel security and bug fix update	2009-09-01 08:53:34 UTC

Description Mike Snitzer 2008-11-12 17:11:07 UTC

Description of problem:
Upstream Linux has identified a very important fix to NFS lockd that enables async locking (with GFS2) to properly close open file descriptors (by updating ref counts as part of cleanup).  Without the fix proper cleanup is short-circuited.

Version-Release number of selected component (if applicable):
Affects 5.2.z (2.6.18-92.1.13.el5) and 5.3 beta (kernel-2.6.18-120.el5).  One would hope an errata kernel can be released for 5.2 and that this will get fixed in time for 5.3's release.

How reproducible:
Always

Steps to Reproduce:
1. use async locks with GFS2
  
Actual results:
Filesystem has open file descriptors associated with async locks.

Expected results:
Open file descriptors should be closed.

Additional info:
This issue has been fixed upstream since Linux 2.6.25, see commit:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b7e6b8694

Comment 1 Jeff Layton 2009-02-02 16:39:52 UTC

Gave a shot at reproducing this today, but ran across bug 483617. Once the GFS2 folks fix that I should be able to give this another shot.

Comment 2 Jeff Layton 2009-02-05 15:14:33 UTC

Looks like Abhi posted a patch for bug 483617 and it works, but I have yet to be able to reproduce this bug.

I assume this problem will manifest itself as being unable to unmount the fs. Can you describe a way to reliably reproduce this?

Comment 3 Mike Snitzer 2009-02-10 05:44:55 UTC

Truth of the matter is I reproduced this problem on IBRIX (simply ran a few async lock, F_SETLK, tests and then couldn't unmount).  The upstream fix I referenced in comment#0 was able to eliminate it for me.  Passing it off as a GFS2 issue was wrong but I figured I'd get it on your radar considering the upstream fix was specifically developed to address the same issue when using GFS2.

Comment 4 Jeff Layton 2009-02-12 18:43:19 UTC

Ok. I'll have to walk over the code and see if I can figure out a way to make this happen so that we can verify that the patch actually fixes it. If or someone else can provide a reliable reproducer, then that would be helpful.

Comment 5 Jeff Layton 2009-03-02 12:43:15 UTC

Patch makes sense now that I look over the code.

Looking at nlm4svc_proc_test codepath...

In order to reproduce this we'd need for nlmsvc_testlock to return nlm_drop_reply. It looks like it only does this when B_QUEUED is set, but B_TIMED_OUT and B_GOT_CALLBACK isn't set. So maybe this would occur if a lock or testlock raced in for this lock while we were waiting for the callback on the lock from the lower fs.

It may be hard to reproduce this reliably...we might just have to settle for review by inspection.

Comment 6 Jeff Layton 2009-03-03 15:57:55 UTC

I've added this patch to the RHEL5 test kernels on my people.redhat.com page:

http://people.redhat.com/jlayton

...if you could test them and verify whether they fix the problem for you, it would be helpful. Even more helpful would be if you had a way to reliably reproduce this problem so we could construct a regression test.

Regardless though, the refcount leak looks pretty obvious I think so in the absence of a regression test I'll still plan to propose this internally.

Comment 8 RHEL Program Management 2009-03-18 17:18:21 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 Jeff Layton 2009-03-18 17:21:54 UTC

Created attachment 335737 [details]
reproducer: lock and unlock a file in a tight loop

Here's the program I used to reproduce this. It just sets a non-blocking lock and unlocks it in a tight loop. When run against a file on a GFS2 export, the filesystem becomes un-umountable.

Comment 10 Don Zickus 2009-04-16 18:38:55 UTC

in kernel-2.6.18-139.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 17 errata-xmlrpc 2009-09-02 08:48:07 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Note You need to log in before you can comment on or make changes to this bug.