Red Hat Bugzilla – Bug 1115545
NFS4: remove incorrect "Lock reclaim failed!" warning when delegations are used
Last modified: 2015-07-22 04:09:40 EDT
Description of problem: We need the following upstream patch which should be an easy backport. commit 6686390bab6a0e049fa7040631aee08b35a55293 Author: NeilBrown <neilb@suse.de> Date: Mon Aug 12 16:52:47 2013 +1000 NFS: remove incorrect "Lock reclaim failed!" warning. After reclaiming state that was lost, the NFS client tries to reclaim any locks, and then checks that each one has NFS_LOCK_INITIALIZED set (which means that the server has confirmed the lock). However if the client holds a delegation, nfs_reclaim_locks() simply aborts (or more accurately it called nfs_lock_reclaim() and that returns without doing anything). This is because when a delegation is held, the server doesn't need to know about locks. So if a delegation is held, NFS_LOCK_INITIALIZED is not expected, and its absence is certainly not an error. So don't print the warnings if NFS_DELGATED_STATE is set. Version-Release number of selected component (if applicable): 2.6.32-488.el6 How reproducible: Should be relatively easy to repro. Requires 1. NFS4 server with delegations 2. NFS4 client doing locks 3. Some method for triggering lock reclaim Steps to Reproduce: TBD Actual results: "Lock reclaim failed" printed in /var/log/messages Expected results: No message should be printed since the condition being checked for is not relevant to NFS4 delegations. Additional info:
Looking at that commit, the logic is wrong in the test. So there's a second commit needed. commit 1acd1c301f4faae80f4d2c7bbd9a4553b131c0e3 Author: Jeff Layton <jlayton@redhat.com> Date: Thu Oct 31 13:03:04 2013 -0400 nfs: fix inverted test for delegation in nfs4_reclaim_open_state ... - if (test_bit(NFS_DELEGATED_STATE, &state->flags) != 0) { + if (!test_bit(NFS_DELEGATED_STATE, &state->flags)) {
Created attachment 950025 [details] WIP testcase for this bug - does not currently work
Created attachment 950026 [details] Supporting files and WIP testcase for this bug - does not currently work
Well, I have a testcase for this bug that should cause the "Lock reclaim failed to fire from inside nfs4_reclaim_open_state(): pr_warn_ratelimited("NFS: " "%s: Lock reclaim " "failed!\n", __func__); But for some reason, the "Lock reclaim failed" message was still not printed. I wrote a lot of stap and finally ended up patching an nfs module with printks. Then I discovered that, amazingly, printk_ratelimited has been broken in RHEL6, apparently since introduction in 6.1 due to missing this patch: commit bb1dc0bacb8ddd7ba6a5906c678a5a5a110cf695 Author: Yong Zhang <yong.zhang@windriver.com> Date: Tue Apr 6 14:35:02 2010 -0700 kernel.h: fix wrong usage of __ratelimit() When __ratelimit() returns 1 this means that we can go ahead. Signed-off-by: Yong Zhang <yong.zhang@windriver.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 7f07074..9365227 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -426,7 +426,7 @@ static inline char *pack_hex_byte(char *buf, u8 byte) .burst = DEFAULT_RATELIMIT_BURST, \ }; \ \ - if (!__ratelimit(&_rs)) \ + if (__ratelimit(&_rs)) \ printk(fmt, ##__VA_ARGS__); \ So what this means is that anything underneath a pr_*_ratelimited macro would only be printed when you get a burst of messages that should be supressed, which is the opposite of the intent of ratelimiting! I think maybe the reason no one has noticed is due to the low usage of ratelimiting - from what I counted there were only a handful of pr_warn_ratelimit calls, and most were in nfs. I'll have to open a separate bz for the above patch. The patch which introduced the ratelimiting went in along with a group of patches to rhel6.1 for nfs: commit cf2a1c571fe2b88a3954f2f4a2cd35641c4b8977 Author: Steve Dickson <SteveD@redhat.com> Date: Mon Nov 15 12:18:33 2010 -0500 [kernel] kernel.h: add printk_ratelimited and pr_<level>_rl Message-id: <1289823513-15346-71-git-send-email-steved@redhat.com> Patchwork-id: 29304 O-Subject: [RHEL6.1 PATCH 70/70] kernel.h: add printk_ratelimited and pr_<level>_rl Bugzilla: 653066 RH-Acked-by: Prarit Bhargava <prarit@redhat.com> RH-Acked-by: J. Bruce Fields <bfields@redhat.com>
Created attachment 950431 [details] testcase for this bug that now works - requires a kernel which fixes local ratelimiting printk bug https://bugzilla.redhat.com/show_bug.cgi?id=1156428
Created attachment 950476 [details] testcase for this bug that now works - requires a kernel which fixes local ratelimiting printk bug https://bugzilla.redhat.com/show_bug.cgi?id=1156428
Created attachment 950477 [details] test log showing test failure on kernel with patch for printk ratelimiting bug
Created attachment 950478 [details] test log showing test pass on kernel with patch for printk ratelimiting bug plus patches to fix this bug
Created attachment 950479 [details] test log showing test pass on kernel with patch for printk ratelimiting bug plus patches to fix this bug
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Patch(es) available on kernel-2.6.32-527.el6
reproduced at RHEL-6.5,RHEL-6.7-20150506.0(kernel-2.6.32-558) https://beaker.engineering.redhat.com/jobs/965878 verified at RHEL-6.5,RHEL-6.7-20150527.n.0(kernel-2.6.32-563) https://beaker.engineering.redhat.com/jobs/965899
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1272.html