Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1115545 - NFS4: remove incorrect "Lock reclaim failed!" warning when delegations are used
NFS4: remove incorrect "Lock reclaim failed!" warning when delegations are used
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.7
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Dave Wysochanski
Yin.JianHong
: Patch, TestCaseProvided
Depends On: 1025441 1156428
Blocks: 1075802 1159933
  Show dependency treegraph
 
Reported: 2014-07-02 11:00 EDT by Dave Wysochanski
Modified: 2015-07-22 04:09 EDT (History)
2 users (show)

See Also:
Fixed In Version: kernel-2.6.32-527.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-07-22 04:09:40 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
WIP testcase for this bug - does not currently work (4.74 KB, application/octet-stream)
2014-10-23 12:49 EDT, Dave Wysochanski
no flags Details
Supporting files and WIP testcase for this bug - does not currently work (6.74 KB, application/x-gzip)
2014-10-23 12:51 EDT, Dave Wysochanski
no flags Details
testcase for this bug that now works - requires a kernel which fixes local ratelimiting printk bug https://bugzilla.redhat.com/show_bug.cgi?id=1156428 (295.46 KB, application/x-gzip)
2014-10-24 12:53 EDT, Dave Wysochanski
no flags Details
testcase for this bug that now works - requires a kernel which fixes local ratelimiting printk bug https://bugzilla.redhat.com/show_bug.cgi?id=1156428 (296.68 KB, application/x-gzip)
2014-10-24 14:35 EDT, Dave Wysochanski
no flags Details
test log showing test failure on kernel with patch for printk ratelimiting bug (4.17 KB, application/octet-stream)
2014-10-24 14:36 EDT, Dave Wysochanski
no flags Details
test log showing test pass on kernel with patch for printk ratelimiting bug plus patches to fix this bug (3.38 KB, application/octet-stream)
2014-10-24 14:36 EDT, Dave Wysochanski
no flags Details
test log showing test pass on kernel with patch for printk ratelimiting bug plus patches to fix this bug (4.19 KB, application/octet-stream)
2014-10-24 14:45 EDT, Dave Wysochanski
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1118233 None None None Never
Red Hat Product Errata RHSA-2015:1272 normal SHIPPED_LIVE Moderate: kernel security, bug fix, and enhancement update 2015-07-22 07:56:25 EDT

  None (edit)
Description Dave Wysochanski 2014-07-02 11:00:51 EDT
Description of problem:
We need the following upstream patch which should be an easy backport.
commit 6686390bab6a0e049fa7040631aee08b35a55293
Author: NeilBrown <neilb@suse.de>
Date:   Mon Aug 12 16:52:47 2013 +1000

    NFS: remove incorrect "Lock reclaim failed!" warning.
    
    After reclaiming state that was lost, the NFS client tries to reclaim
    any locks, and then checks that each one has NFS_LOCK_INITIALIZED set
    (which means that the server has confirmed the lock).
    However if the client holds a delegation, nfs_reclaim_locks() simply aborts
    (or more accurately it called nfs_lock_reclaim() and that returns without
    doing anything).
    
    This is because when a delegation is held, the server doesn't need to
    know about locks.
    
    So if a delegation is held, NFS_LOCK_INITIALIZED is not expected, and
    its absence is certainly not an error.
    
    So don't print the warnings if NFS_DELGATED_STATE is set.


Version-Release number of selected component (if applicable):
2.6.32-488.el6


How reproducible:
Should be relatively easy to repro.  Requires
1. NFS4 server with delegations
2. NFS4 client doing locks
3. Some method for triggering lock reclaim


Steps to Reproduce:
TBD

Actual results:
"Lock reclaim failed" printed in /var/log/messages


Expected results:
No message should be printed since the condition being checked for is not relevant to NFS4 delegations.

Additional info:
Comment 1 Dave Wysochanski 2014-07-02 11:22:21 EDT
Looking at that commit, the logic is wrong in the test.  So there's a second commit needed.

commit 1acd1c301f4faae80f4d2c7bbd9a4553b131c0e3
Author: Jeff Layton <jlayton@redhat.com>
Date:   Thu Oct 31 13:03:04 2013 -0400

    nfs: fix inverted test for delegation in nfs4_reclaim_open_state
...
-                               if (test_bit(NFS_DELEGATED_STATE, &state->flags) != 0) {
+                               if (!test_bit(NFS_DELEGATED_STATE, &state->flags)) {
Comment 3 Dave Wysochanski 2014-10-23 12:49:33 EDT
Created attachment 950025 [details]
WIP testcase for this bug - does not currently work
Comment 4 Dave Wysochanski 2014-10-23 12:51:31 EDT
Created attachment 950026 [details]
Supporting files and WIP testcase for this bug - does not currently work
Comment 5 Dave Wysochanski 2014-10-24 07:56:50 EDT
Well, I have a testcase for this bug that should cause the "Lock reclaim failed to fire from inside  nfs4_reclaim_open_state():
pr_warn_ratelimited("NFS: "
		"%s: Lock reclaim "
		"failed!\n", __func__);

But for some reason, the "Lock reclaim failed" message was still not printed.  I wrote a lot of stap and finally ended up patching an nfs module with printks.  Then I discovered that, amazingly, printk_ratelimited has been broken in RHEL6, apparently since introduction in 6.1 due to missing this patch:

commit bb1dc0bacb8ddd7ba6a5906c678a5a5a110cf695
Author: Yong Zhang <yong.zhang@windriver.com>
Date:   Tue Apr 6 14:35:02 2010 -0700

    kernel.h: fix wrong usage of __ratelimit()
    
    When __ratelimit() returns 1 this means that we can go ahead.
    
    Signed-off-by: Yong Zhang <yong.zhang@windriver.com>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: Joe Perches <joe@perches.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 7f07074..9365227 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -426,7 +426,7 @@ static inline char *pack_hex_byte(char *buf, u8 byte)
                .burst = DEFAULT_RATELIMIT_BURST,       \
        };                                              \
                                                        \
-       if (!__ratelimit(&_rs))                         \
+       if (__ratelimit(&_rs))                          \
                printk(fmt, ##__VA_ARGS__);             \


So what this means is that anything underneath a pr_*_ratelimited macro would only be printed when you get a burst of messages that should be supressed, which is the opposite of the intent of ratelimiting!

I think maybe the reason no one has noticed is due to the low usage of ratelimiting - from what I counted there were only a handful of pr_warn_ratelimit calls, and most were in nfs.

I'll have to open a separate bz for the above patch.

The patch which introduced the ratelimiting went in along with a group of patches to rhel6.1 for nfs:
commit cf2a1c571fe2b88a3954f2f4a2cd35641c4b8977
Author: Steve Dickson <SteveD@redhat.com>
Date:   Mon Nov 15 12:18:33 2010 -0500

    [kernel] kernel.h: add printk_ratelimited and pr_<level>_rl
    
    Message-id: <1289823513-15346-71-git-send-email-steved@redhat.com>
    Patchwork-id: 29304
    O-Subject: [RHEL6.1 PATCH 70/70] kernel.h: add printk_ratelimited and
        pr_<level>_rl
    Bugzilla: 653066
    RH-Acked-by: Prarit Bhargava <prarit@redhat.com>
    RH-Acked-by: J. Bruce Fields <bfields@redhat.com>
Comment 7 Dave Wysochanski 2014-10-24 12:53:09 EDT
Created attachment 950431 [details]
testcase for this bug that now works - requires a kernel which fixes local ratelimiting printk bug https://bugzilla.redhat.com/show_bug.cgi?id=1156428
Comment 8 Dave Wysochanski 2014-10-24 14:35:48 EDT
Created attachment 950476 [details]
testcase for this bug that now works - requires a kernel which fixes local ratelimiting printk bug https://bugzilla.redhat.com/show_bug.cgi?id=1156428
Comment 9 Dave Wysochanski 2014-10-24 14:36:19 EDT
Created attachment 950477 [details]
test log showing test failure on kernel with patch for printk ratelimiting bug
Comment 10 Dave Wysochanski 2014-10-24 14:36:49 EDT
Created attachment 950478 [details]
test log showing test pass on kernel with patch for printk ratelimiting bug plus patches to fix this bug
Comment 12 Dave Wysochanski 2014-10-24 14:45:21 EDT
Created attachment 950479 [details]
test log showing test pass on kernel with patch for printk ratelimiting bug plus patches to fix this bug
Comment 15 RHEL Product and Program Management 2014-11-10 18:11:53 EST
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.
Comment 16 Rafael Aquini 2015-01-30 12:11:25 EST
Patch(es) available on kernel-2.6.32-527.el6
Comment 29 Yin.JianHong 2015-05-27 03:42:51 EDT
reproduced at RHEL-6.5,RHEL-6.7-20150506.0(kernel-2.6.32-558)
  https://beaker.engineering.redhat.com/jobs/965878
verified at RHEL-6.5,RHEL-6.7-20150527.n.0(kernel-2.6.32-563)
  https://beaker.engineering.redhat.com/jobs/965899
Comment 31 errata-xmlrpc 2015-07-22 04:09:40 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1272.html

Note You need to log in before you can comment on or make changes to this bug.