Description of problem: From NFS client, when we are getting file lock by fcntl, application is stacked. We analize NLM protocol, so NLM_LCK_DENIED_GRACE_PERIOD is returned from NFS server. Grace period is set up at 30 seconds, but it returns after several months months when we require lock. We did find this problem in production. We pinpointed the cause of this problem was in lockd. In the comparison of time_before in lockd, it is generated to reverse the sign of the difference when becoming the period of LONG_MAX/2. We report this problem and buf fix patch to J.Bruce Fields, NFS maintainer, so he merges improved buf fix patch with his git repository. We hope that you are back porting the bug fix patch. Version-Release number of selected component (if applicable): Red Hat AS4 Update4 kernel-2.6.9-43.EL (i386) How reproducible: After machine reboot or NFS server service restart, you will get fctnl lock at first passing 25days to 50 days (assuming HZ=1000). Steps to Reproduce: 1.NFS Server machine boot. 2.spend 25days - 50days 3.call fcntl lock. Actual results: application which get fcntl lock is freezed. Expected results: application can get fcntl lock after 30seconds. Additional info: - LKML thread head mail http://lkml.org/lkml/2008/8/14/115 - bug fix patch at J.Bruce Fields tree http://git.linux-nfs.org/?p=bfields/linux-topics.git;a=commitdiff;h=3ff893a7683f2a011ebcc4043604249ad610acb0
Created attachment 317946 [details] Fix patch for kernel-2.6.9-42.EL This patch is fixed BUG 461085 at RedHat EL 4 Update 4. It occurs by using time_before to compare jiffies with jiffies + grace_period_expire. I use timer functions to solve the basic cause of lockd bug that not consider a long time after lockd start.
Yes, the problem is that jiffies wrap fairly quickly. The solution is not to make comparisons against the jiffies value, but to schedule a timeout to turnoff the grace period once it has started. This solution doesn't match that from upstream, but seems good enough.
Created attachment 325689 [details] Proposed patch
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
(In reply to comment #5) > Created an attachment (id=325689) [details] > Proposed patch It seems a good patch. The kernel code of Red Hat Server 5 includes a same problem, can be fixed by like this patch or Bruce's patch.
Thanx for the feedback. Regarding RHEL-5 -- I'm ahead of you there already. Please see bz474590. :-)
Committed in 78.22.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html