Found while working on the use after free problem in lockd... Do the following: On a NFS server fcntl lock a file On a NFS3 client fcntl lock the same file with F_SETLKW On the server, release the lock -- client now has the lock Release the lock on the client -- at this point the nlm_block should be freed On the server do a "service nfs stop" lockd will throw this error when coming down: lockd: couldn't shutdown host module! ...the above set of steps somehow causes a b_count leak for the nlm_block, which keeps the nlm_host refcount high. More block-callback-lock attempts seem to cause the refcount to be even higher. This certainly leaks memory when lockd goes down, and could also be leaking memory in other situations (that needs to be confirmed).
It looks like this is a regression that was introduced by this patch: RHBZ 196318: NFS byte-range locking support for cluster file systems. ...that patch added a kref_get() call to the top of nlmsvc_grant_blocked(), but does not remove the kref_get() at the bottom of that function before nlm_async_call() is called. I think just removing the old kref_get() will be sufficient to fix this.
Created attachment 294822 [details] patch -- remove extra kref_get() from nlmsvc_grant_blocked() Proposed patch...
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-$NEW_VER You can download this test kernel from http://people.redhat.com/dzickus/el5
in kernel-2.6.18-85.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html