Bug 461085

Summary: lockd: return NLM_LCK_DENIED_GRACE_PERIOD after long periods
Product: Red Hat Enterprise Linux 4 Reporter: Hiroaki Nakano <nakano.hiroaki>
Component: kernelAssignee: Peter Staubach <staubach>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.4CC: anton, mgahagan, tao, vgoyal
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 19:24:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fix patch for kernel-2.6.9-42.EL
none
Proposed patch none

Description Hiroaki Nakano 2008-09-04 02:50:12 UTC
Description of problem:

From NFS client, when we are getting file lock by fcntl, application is stacked.
We analize NLM protocol, so NLM_LCK_DENIED_GRACE_PERIOD is returned from NFS server.
Grace period is set up at 30 seconds, but it returns after several months months when we require
lock.
We did find this problem in production.

We pinpointed the cause of this problem was in lockd. 
In the comparison of time_before in lockd, it is generated to reverse 
the sign of the difference when becoming the period of LONG_MAX/2. 

We report this problem and buf fix patch to J.Bruce Fields, NFS maintainer,
so he merges improved buf fix patch with his git repository.

We hope that you are back porting the bug fix patch.


Version-Release number of selected component (if applicable):

Red Hat AS4 Update4
kernel-2.6.9-43.EL (i386)


How reproducible:

After machine reboot or NFS server service restart,
you will get fctnl lock at first passing 25days to 50 days
(assuming HZ=1000).

Steps to Reproduce:
1.NFS Server machine boot.
2.spend 25days - 50days
3.call fcntl lock.
  
Actual results:

application which get fcntl lock is freezed.

Expected results:

application can get fcntl lock after 30seconds.

Additional info:

- LKML thread head mail
http://lkml.org/lkml/2008/8/14/115

- bug fix patch at J.Bruce Fields tree
http://git.linux-nfs.org/?p=bfields/linux-topics.git;a=commitdiff;h=3ff893a7683f2a011ebcc4043604249ad610acb0

Comment 1 Hiroaki Nakano 2008-09-29 10:06:35 UTC
Created attachment 317946 [details]
Fix patch for kernel-2.6.9-42.EL

This patch is fixed BUG 461085 at RedHat EL 4 Update 4. It occurs by using time_before to compare jiffies with jiffies + grace_period_expire. I use timer functions to solve the basic cause of lockd bug that not consider a long time after lockd start.

Comment 4 Peter Staubach 2008-12-02 18:51:37 UTC
Yes, the problem is that jiffies wrap fairly quickly.  The solution is
not to make comparisons against the jiffies value, but to schedule a
timeout to turnoff the grace period once it has started.

This solution doesn't match that from upstream, but seems good enough.

Comment 5 Peter Staubach 2008-12-04 14:51:15 UTC
Created attachment 325689 [details]
Proposed patch

Comment 6 RHEL Program Management 2008-12-04 15:08:39 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Hiroaki Nakano 2008-12-10 01:09:12 UTC
(In reply to comment #5)
> Created an attachment (id=325689) [details]
> Proposed patch

It seems a good patch.
The kernel code of Red Hat Server 5 includes a same problem, can be fixed by like this patch or Bruce's patch.

Comment 8 Peter Staubach 2008-12-10 14:52:48 UTC
Thanx for the feedback.

Regarding RHEL-5 -- I'm ahead of you there already.  Please see bz474590. :-)

Comment 9 Vivek Goyal 2008-12-17 16:08:22 UTC
Committed in 78.22.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 15 errata-xmlrpc 2009-05-18 19:24:19 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html