Bug 470074

Summary: overlapping nfs locks don't work in gfs/dlm
Product: Red Hat Enterprise Linux 5 Reporter: David Teigland <teigland>
Component: kernelAssignee: David Teigland <teigland>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: adam.hough, anthony.porcano, cward, czhang, edamato, jlayton, konradr, lmcilroy, mrappa, pdemauro, staubach, steved, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 08:36:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 514700    
Attachments:
Description Flags
patch -- handle fl_grant callbacks with coalesced locks (incomplete)
none
possible patch -- handle fl_grant callbacks with coalesced locks
none
respun patch -- try to handle result != 0 better in the callback
none
patch to test
none
second patch to try none

Description David Teigland 2008-11-05 16:38:20 UTC
Description of problem:

The following analysis was on 2.6.27, but the same should be true in RHEL5.2
and RHEL5.3 (nfs locking over gfs was added in 5.2 per bug 196318).

process on client
  takes RD lock 0-4 (start 0, len 5)
  takes RD lock 2-6 (start 2, len 5)

which should result in a RD lock 0-6.  The first succeeds, but the second
triggers:

lockd: grant for unknown block
dlm: dlm_plock_callback: lock granted after lock request failed; dangling lock!

I added printks to nlmsvc_grant_deferred(), and found the fl is:
  fl pid 0 owner 39c56098 start 0 end 6 type 0

and it's being compared against this on the nlm_blocked list:
  fl pid 0 owner 39c56098 start 2 end 6 type 0

... and failing because nlm_compare_locks compares start and end values.
Looking at fs/dlm/plock.c dlm_plock_callback(), I see that the original
file_lock is saved in xop->flc.   Was the intent to use flc (2-6) in the
nlm callback, since the original fl is apparently modified by the vfs to
represent the final range (0-6)?


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 David Teigland 2008-11-11 17:28:26 UTC
This is a bug in lockd, so reassigning.

Comment 3 Jeff Layton 2008-11-11 18:39:56 UTC
Is this reproducable on ext3 or something other than GFS2?

Comment 4 David Teigland 2008-11-11 18:52:14 UTC
It works fine on ext3.

Comment 5 Jeff Layton 2008-11-12 18:30:38 UTC
Ahh ok. ext3 doesn't use the fl_grant callback. Looks like that only gets called from DLM or FUSE. I'll have to dig into this and see what the interface is supposed to be like.

Assuming that we are expecting the grant to reflect coalesced locks, I guess we'll need a new nlm_compare_locks type function that returns true when the grant has a fl_start and fl_end that completely covers the lock range we're blocking on. We'll also need to restructure nlmsvc_grant_blocked to grant multiple blocks before waking up lockd and to walk the entire list to look for blocks.

Comment 6 David Teigland 2008-11-12 19:57:22 UTC
I reported this upstream to Marc Eshel at IBM who I believe designed and wrote
this async callback mechanism between dlm/lockd, and also cc'ed Bruce Fields.
IBM is apparently using this between gpfs/lockd.

Comment 7 Jeff Layton 2008-11-18 19:27:57 UTC
Created attachment 323948 [details]
patch -- handle fl_grant callbacks with coalesced locks (incomplete)

This patch is a start, but still isn't complete.

It adds a new function that tells us whether a particular lock represents a subset of another byte range lock. It also changes nlmsvc_grant_deferred to walk the entire list of blocks and grant any that match.

The problem here, I think, is that we're granting any block that matches. We may have several blocks over the same byte range, so we need to ensure that we only grant one or we'll be granting the same lock more than once.

I think what we'll have to do is keep a list of blocks that we've granted. When we find one that looks like it's in range of the fl that was passed to the function, we'll then walk the list of blocks that we've already granted and make sure that that block doesn't overlap with any of them. Only then will we grant that block.

Comment 8 Jeff Layton 2008-11-18 19:49:13 UTC
Documentation on what fl_grant is for and what it's supposed to do is somewhat spotty it seems. This is an interesting post from Bruce Fields that explains the difference and what it means to get a fl_grant callback:

http://lkml.org/lkml/2008/4/15/246

I think the idea in comment #7 will still work, but I'll have to determine the best way to track the locks that have already been granted. I'd prefer not to have to add a new list head to the nlm_block struct. I suppose we could walk the entire b_flist each time we find a matching lock. Probably not great for performance if there are a lot of blocks, but it should work...

Comment 9 Jeff Layton 2008-11-19 14:15:31 UTC
Created attachment 324034 [details]
possible patch -- handle fl_grant callbacks with coalesced locks

Possible patch or starting point for one. This changes nlmsvc_grant_deferred to walk the entire list of blocks and grant any that are entirely covered by the lock range and that don't conflict with locks that have already been granted. This patch seems to fix the reproducer, but I'm not at all confident with it yet. It needs some serious regression testing.

One thing I've noticed is that the only caller of fl_grant is DLM and there doesn't seem to be much in the way of documentation. I think we have some leeway in how fl_grant really ought to work. Might it be better to just have DLM do the fl_grant callback with a non-coalesced version of the lock? What's the purpose of the second fl_grant arg?

I'll do a bit more testing, and will then post this patch to linux-nfs as a RFC if it passes. Maybe we can get some of these questions answered.

Comment 10 Jeff Layton 2008-11-20 12:15:22 UTC
Created attachment 324164 [details]
respun patch -- try to handle result != 0 better in the callback

Slightly respun patch (and a little cleaner too, I think). I don't think we want to worry about conflicts if the result != 0 here.

I sent this patch to linux-nfs yesterday. Awaiting comment -- I'm hoping Bruce Fields will chime in since he seems to be the de facto expert on fl_grant. I think he's on semi-vacation though, so it may be a little while.

Comment 13 Jeff Layton 2008-12-17 19:50:39 UTC
reassigning back to Dave...

The consensus upstream seems to be that the lower filesystem (GFS/DLM in this case) should be passing a non-coalesced lock back to lockd. Dave has a small DLM patch that seems to resolve this, but it needs more testing. I'll leave it to him to post it.

Comment 15 David Teigland 2008-12-19 16:07:28 UTC
Created attachment 327455 [details]
patch to test

This patch solves the problem in my simple tests.
It may not be the final version, as the upstream
nfs developers seem to still have some degree of
doubt about it.  Upstream discussion here:
http://marc.info/?t=122713067000002&r=1&w=2

Comment 16 David Teigland 2008-12-22 15:25:50 UTC
Created attachment 327664 [details]
second patch to try

This uses locks_copy_lock() instead of __locks_copy_lock()
since the RHEL5 kernel does not make the later available.

Comment 17 David Teigland 2008-12-22 15:27:33 UTC
Note re comment 16, the code originally used locks_copy_lock()
so the new patch doesn't change it.

Comment 19 David Teigland 2009-01-19 22:56:55 UTC
For the record, using the existing locks_copy_lock() is correct;
replacing with __locks_copy_lock() breaks things.

Comment 20 David Teigland 2009-01-19 23:08:03 UTC
I've done more testing with the upstream kernel and the patch in
comment 16 and things seem to work very well.  Do we have a way to
verify the patch was included in the rpm build?  And do you have a
pointer to the rpm so I could try it?  Thanks

Comment 23 David Teigland 2009-01-20 22:03:28 UTC
Using the kernel from comment 22, my test works.
Using kernel 2.6.18-128.el5, my test quickly fails and I see:

lockd: grant for unknown block
dlm: dlm_plock_callback: lock granted after lock request failed; dangling lock!

So, perhaps the folks testing this weren't running the test kernel they were
given?  Or perhaps the patch was somehow not applied in the first build?

Comment 29 RHEL Program Management 2009-02-12 16:59:18 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 30 RHEL Program Management 2009-02-16 15:35:11 UTC
Updating PM score.

Comment 32 Don Zickus 2009-03-04 19:59:50 UTC
in kernel-2.6.18-133.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 34 Lachlan McIlroy 2009-05-15 01:06:39 UTC
Adding issue 296597 and changing prio/sev to match.

Comment 36 Chris Ward 2009-07-03 18:12:16 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 37 Caspar Zhang 2009-08-03 03:47:19 UTC
Verified that the patch to this bug is included in the 2.6.18-160.el5, with patch #23601

Comment 39 errata-xmlrpc 2009-09-02 08:36:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html