Bug 470074
Summary: | overlapping nfs locks don't work in gfs/dlm | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | David Teigland <teigland> |
Component: | kernel | Assignee: | David Teigland <teigland> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.2 | CC: | adam.hough, anthony.porcano, cward, czhang, edamato, jlayton, konradr, lmcilroy, mrappa, pdemauro, staubach, steved, tao |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-09-02 08:36:35 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 514700 | ||
Attachments: |
Description
David Teigland
2008-11-05 16:38:20 UTC
This is a bug in lockd, so reassigning. Is this reproducable on ext3 or something other than GFS2? It works fine on ext3. Ahh ok. ext3 doesn't use the fl_grant callback. Looks like that only gets called from DLM or FUSE. I'll have to dig into this and see what the interface is supposed to be like. Assuming that we are expecting the grant to reflect coalesced locks, I guess we'll need a new nlm_compare_locks type function that returns true when the grant has a fl_start and fl_end that completely covers the lock range we're blocking on. We'll also need to restructure nlmsvc_grant_blocked to grant multiple blocks before waking up lockd and to walk the entire list to look for blocks. I reported this upstream to Marc Eshel at IBM who I believe designed and wrote this async callback mechanism between dlm/lockd, and also cc'ed Bruce Fields. IBM is apparently using this between gpfs/lockd. Created attachment 323948 [details]
patch -- handle fl_grant callbacks with coalesced locks (incomplete)
This patch is a start, but still isn't complete.
It adds a new function that tells us whether a particular lock represents a subset of another byte range lock. It also changes nlmsvc_grant_deferred to walk the entire list of blocks and grant any that match.
The problem here, I think, is that we're granting any block that matches. We may have several blocks over the same byte range, so we need to ensure that we only grant one or we'll be granting the same lock more than once.
I think what we'll have to do is keep a list of blocks that we've granted. When we find one that looks like it's in range of the fl that was passed to the function, we'll then walk the list of blocks that we've already granted and make sure that that block doesn't overlap with any of them. Only then will we grant that block.
Documentation on what fl_grant is for and what it's supposed to do is somewhat spotty it seems. This is an interesting post from Bruce Fields that explains the difference and what it means to get a fl_grant callback: http://lkml.org/lkml/2008/4/15/246 I think the idea in comment #7 will still work, but I'll have to determine the best way to track the locks that have already been granted. I'd prefer not to have to add a new list head to the nlm_block struct. I suppose we could walk the entire b_flist each time we find a matching lock. Probably not great for performance if there are a lot of blocks, but it should work... Created attachment 324034 [details]
possible patch -- handle fl_grant callbacks with coalesced locks
Possible patch or starting point for one. This changes nlmsvc_grant_deferred to walk the entire list of blocks and grant any that are entirely covered by the lock range and that don't conflict with locks that have already been granted. This patch seems to fix the reproducer, but I'm not at all confident with it yet. It needs some serious regression testing.
One thing I've noticed is that the only caller of fl_grant is DLM and there doesn't seem to be much in the way of documentation. I think we have some leeway in how fl_grant really ought to work. Might it be better to just have DLM do the fl_grant callback with a non-coalesced version of the lock? What's the purpose of the second fl_grant arg?
I'll do a bit more testing, and will then post this patch to linux-nfs as a RFC if it passes. Maybe we can get some of these questions answered.
Created attachment 324164 [details]
respun patch -- try to handle result != 0 better in the callback
Slightly respun patch (and a little cleaner too, I think). I don't think we want to worry about conflicts if the result != 0 here.
I sent this patch to linux-nfs yesterday. Awaiting comment -- I'm hoping Bruce Fields will chime in since he seems to be the de facto expert on fl_grant. I think he's on semi-vacation though, so it may be a little while.
reassigning back to Dave... The consensus upstream seems to be that the lower filesystem (GFS/DLM in this case) should be passing a non-coalesced lock back to lockd. Dave has a small DLM patch that seems to resolve this, but it needs more testing. I'll leave it to him to post it. Created attachment 327455 [details] patch to test This patch solves the problem in my simple tests. It may not be the final version, as the upstream nfs developers seem to still have some degree of doubt about it. Upstream discussion here: http://marc.info/?t=122713067000002&r=1&w=2 Created attachment 327664 [details]
second patch to try
This uses locks_copy_lock() instead of __locks_copy_lock()
since the RHEL5 kernel does not make the later available.
Note re comment 16, the code originally used locks_copy_lock() so the new patch doesn't change it. For the record, using the existing locks_copy_lock() is correct; replacing with __locks_copy_lock() breaks things. I've done more testing with the upstream kernel and the patch in comment 16 and things seem to work very well. Do we have a way to verify the patch was included in the rpm build? And do you have a pointer to the rpm so I could try it? Thanks Using the kernel from comment 22, my test works. Using kernel 2.6.18-128.el5, my test quickly fails and I see: lockd: grant for unknown block dlm: dlm_plock_callback: lock granted after lock request failed; dangling lock! So, perhaps the folks testing this weren't running the test kernel they were given? Or perhaps the patch was somehow not applied in the first build? This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Updating PM score. posted to rhkernel http://post-office.corp.redhat.com/archives/rhkernel-list/2009-February/msg00447.html in kernel-2.6.18-133.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified. Adding issue 296597 and changing prio/sev to match. ~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative. Verified that the patch to this bug is included in the 2.6.18-160.el5, with patch #23601 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html |