Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1115545

Summary: NFS4: remove incorrect "Lock reclaim failed!" warning when delegations are used
Product: Red Hat Enterprise Linux 6 Reporter: Dave Wysochanski <dwysocha>
Component: kernelAssignee: Dave Wysochanski <dwysocha>
kernel sub component: NFS QA Contact: JianHong Yin <jiyin>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: bfields, eguan
Version: 6.7Keywords: Patch, TestCaseProvided
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-2.6.32-527.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-22 08:09:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1025441, 1156428    
Bug Blocks: 1075802, 1159933    
Attachments:
Description Flags
WIP testcase for this bug - does not currently work
none
Supporting files and WIP testcase for this bug - does not currently work
none
testcase for this bug that now works - requires a kernel which fixes local ratelimiting printk bug https://bugzilla.redhat.com/show_bug.cgi?id=1156428
none
testcase for this bug that now works - requires a kernel which fixes local ratelimiting printk bug https://bugzilla.redhat.com/show_bug.cgi?id=1156428
none
test log showing test failure on kernel with patch for printk ratelimiting bug
none
test log showing test pass on kernel with patch for printk ratelimiting bug plus patches to fix this bug
none
test log showing test pass on kernel with patch for printk ratelimiting bug plus patches to fix this bug none

Description Dave Wysochanski 2014-07-02 15:00:51 UTC
Description of problem:
We need the following upstream patch which should be an easy backport.
commit 6686390bab6a0e049fa7040631aee08b35a55293
Author: NeilBrown <neilb>
Date:   Mon Aug 12 16:52:47 2013 +1000

    NFS: remove incorrect "Lock reclaim failed!" warning.
    
    After reclaiming state that was lost, the NFS client tries to reclaim
    any locks, and then checks that each one has NFS_LOCK_INITIALIZED set
    (which means that the server has confirmed the lock).
    However if the client holds a delegation, nfs_reclaim_locks() simply aborts
    (or more accurately it called nfs_lock_reclaim() and that returns without
    doing anything).
    
    This is because when a delegation is held, the server doesn't need to
    know about locks.
    
    So if a delegation is held, NFS_LOCK_INITIALIZED is not expected, and
    its absence is certainly not an error.
    
    So don't print the warnings if NFS_DELGATED_STATE is set.


Version-Release number of selected component (if applicable):
2.6.32-488.el6


How reproducible:
Should be relatively easy to repro.  Requires
1. NFS4 server with delegations
2. NFS4 client doing locks
3. Some method for triggering lock reclaim


Steps to Reproduce:
TBD

Actual results:
"Lock reclaim failed" printed in /var/log/messages


Expected results:
No message should be printed since the condition being checked for is not relevant to NFS4 delegations.

Additional info:

Comment 1 Dave Wysochanski 2014-07-02 15:22:21 UTC
Looking at that commit, the logic is wrong in the test.  So there's a second commit needed.

commit 1acd1c301f4faae80f4d2c7bbd9a4553b131c0e3
Author: Jeff Layton <jlayton>
Date:   Thu Oct 31 13:03:04 2013 -0400

    nfs: fix inverted test for delegation in nfs4_reclaim_open_state
...
-                               if (test_bit(NFS_DELEGATED_STATE, &state->flags) != 0) {
+                               if (!test_bit(NFS_DELEGATED_STATE, &state->flags)) {

Comment 3 Dave Wysochanski 2014-10-23 16:49:33 UTC
Created attachment 950025 [details]
WIP testcase for this bug - does not currently work

Comment 4 Dave Wysochanski 2014-10-23 16:51:31 UTC
Created attachment 950026 [details]
Supporting files and WIP testcase for this bug - does not currently work

Comment 5 Dave Wysochanski 2014-10-24 11:56:50 UTC
Well, I have a testcase for this bug that should cause the "Lock reclaim failed to fire from inside  nfs4_reclaim_open_state():
pr_warn_ratelimited("NFS: "
		"%s: Lock reclaim "
		"failed!\n", __func__);

But for some reason, the "Lock reclaim failed" message was still not printed.  I wrote a lot of stap and finally ended up patching an nfs module with printks.  Then I discovered that, amazingly, printk_ratelimited has been broken in RHEL6, apparently since introduction in 6.1 due to missing this patch:

commit bb1dc0bacb8ddd7ba6a5906c678a5a5a110cf695
Author: Yong Zhang <yong.zhang>
Date:   Tue Apr 6 14:35:02 2010 -0700

    kernel.h: fix wrong usage of __ratelimit()
    
    When __ratelimit() returns 1 this means that we can go ahead.
    
    Signed-off-by: Yong Zhang <yong.zhang>
    Cc: Ingo Molnar <mingo>
    Cc: Joe Perches <joe>
    Signed-off-by: Andrew Morton <akpm>
    Signed-off-by: Linus Torvalds <torvalds>

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 7f07074..9365227 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -426,7 +426,7 @@ static inline char *pack_hex_byte(char *buf, u8 byte)
                .burst = DEFAULT_RATELIMIT_BURST,       \
        };                                              \
                                                        \
-       if (!__ratelimit(&_rs))                         \
+       if (__ratelimit(&_rs))                          \
                printk(fmt, ##__VA_ARGS__);             \


So what this means is that anything underneath a pr_*_ratelimited macro would only be printed when you get a burst of messages that should be supressed, which is the opposite of the intent of ratelimiting!

I think maybe the reason no one has noticed is due to the low usage of ratelimiting - from what I counted there were only a handful of pr_warn_ratelimit calls, and most were in nfs.

I'll have to open a separate bz for the above patch.

The patch which introduced the ratelimiting went in along with a group of patches to rhel6.1 for nfs:
commit cf2a1c571fe2b88a3954f2f4a2cd35641c4b8977
Author: Steve Dickson <SteveD>
Date:   Mon Nov 15 12:18:33 2010 -0500

    [kernel] kernel.h: add printk_ratelimited and pr_<level>_rl
    
    Message-id: <1289823513-15346-71-git-send-email-steved>
    Patchwork-id: 29304
    O-Subject: [RHEL6.1 PATCH 70/70] kernel.h: add printk_ratelimited and
        pr_<level>_rl
    Bugzilla: 653066
    RH-Acked-by: Prarit Bhargava <prarit>
    RH-Acked-by: J. Bruce Fields <bfields>

Comment 7 Dave Wysochanski 2014-10-24 16:53:09 UTC
Created attachment 950431 [details]
testcase for this bug that now works - requires a kernel which fixes local ratelimiting printk bug https://bugzilla.redhat.com/show_bug.cgi?id=1156428

Comment 8 Dave Wysochanski 2014-10-24 18:35:48 UTC
Created attachment 950476 [details]
testcase for this bug that now works - requires a kernel which fixes local ratelimiting printk bug https://bugzilla.redhat.com/show_bug.cgi?id=1156428

Comment 9 Dave Wysochanski 2014-10-24 18:36:19 UTC
Created attachment 950477 [details]
test log showing test failure on kernel with patch for printk ratelimiting bug

Comment 10 Dave Wysochanski 2014-10-24 18:36:49 UTC
Created attachment 950478 [details]
test log showing test pass on kernel with patch for printk ratelimiting bug plus patches to fix this bug

Comment 12 Dave Wysochanski 2014-10-24 18:45:21 UTC
Created attachment 950479 [details]
test log showing test pass on kernel with patch for printk ratelimiting bug plus patches to fix this bug

Comment 15 RHEL Program Management 2014-11-10 23:11:53 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 16 Rafael Aquini 2015-01-30 17:11:25 UTC
Patch(es) available on kernel-2.6.32-527.el6

Comment 29 JianHong Yin 2015-05-27 07:42:51 UTC
reproduced at RHEL-6.5,RHEL-6.7-20150506.0(kernel-2.6.32-558)
  https://beaker.engineering.redhat.com/jobs/965878
verified at RHEL-6.5,RHEL-6.7-20150527.n.0(kernel-2.6.32-563)
  https://beaker.engineering.redhat.com/jobs/965899

Comment 31 errata-xmlrpc 2015-07-22 08:09:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1272.html