Bug 229469 - NFS fcntl locks being released locally but not on server
Summary: NFS fcntl locks being released locally but not on server
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 9
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-02-21 12:35 UTC by Michael Young
Modified: 2009-06-10 08:20 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-06-10 08:20:47 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
relevant packets from lockd capture (1.76 KB, application/octet-stream)
2007-02-23 16:45 UTC, Michael Young
no flags Details
full packet capture from demo program (8.45 KB, application/octet-stream)
2007-02-23 18:02 UTC, Michael Young
no flags Details
demo program to trigger lock problems (971 bytes, text/x-csrc)
2007-02-23 18:03 UTC, Michael Young
no flags Details
capture of two nfs clients (8.09 KB, application/x-bzip2)
2007-02-25 18:24 UTC, Michael Young
no flags Details
patch to stop the creation of a pointless lock (648 bytes, patch)
2007-03-01 00:29 UTC, Michael Young
no flags Details | Diff

Description Michael Young 2007-02-21 12:35:41 UTC
We have a busy linux client doing an NFS mount to a Netapp server, where there
is some contention for fcntl advisory file locks, and we are seeing file locks
being left on the NetApp server when the lock on the linux box has been removed,
and thus future attempts to access that file fail. It seems likely that either
the linux box isn't requesting the lock to be removed on the NetApp box, or the
NetApp box is failing to remove it. We do also see a lot of occurrences of the
message
do_vfs_lock: VFS is out of sync with lock manager!
in /var/log/messages.
This is with kernel 2.6.19-1.2911.fc6 and ONTAP release 7.0.5.

Comment 1 Michael Young 2007-02-23 16:45:29 UTC
Created attachment 148680 [details]
relevant packets from lockd capture

I have done a packet capture of the lockd activity between the two boxes, and
have isolated where the problem occurs. In this case the file involved was
locked from another computer, and released between packets 3 and 4, while the
linux box is cancelling and retrying the lock every 5 seconds (cancel in packet
4, new lock request in packet 5). The Netapp box answers both requests in
packet 6, but it seems that the linux box doesn't handle the lock granted part
of the request correctly and forgets the file is locked, and rerequests the
lock in packet 7 but of course the Netapp box thinks the file is locked and
blocks the request.

Comment 2 Michael Young 2007-02-23 18:02:11 UTC
Created attachment 148692 [details]
full packet capture from demo program

I have written a demo program, which when run on two linux boxes causes the
stuck lock on the netapp box (the left lock has svid=6).

Comment 3 Michael Young 2007-02-23 18:03:25 UTC
Created attachment 148693 [details]
demo program to trigger lock problems

Comment 4 Michael Young 2007-02-23 23:05:28 UTC
Actually, you don't need two linux nfs clients, running the demo program twice
concurrently on a single NFS client works just as well. Also the bug is
repeatable on i686 as well as x86_64 so is probably the case on all architectures.

Comment 5 Michael Young 2007-02-25 00:46:21 UTC
Actually I am now coming to the conclusion that my demo program gives a slightly
different failure mode on the nfs clients from the original problem, because in
the two client case a lock is created on both client and server, but the svid
isn't recorded correctly, and the unlock command sent to the server has a new svid.
Also, the one client nfs test has only worked for me once, but I do know that
the two client problem is the same against a linux NFS server as a NetApp one.

Comment 6 Steve Dickson 2007-02-25 15:58:28 UTC
Would it be possible to post a bzip2 binary tethereal trace of this svid problem?
Something similar to:
    tethereal -w /tmp/bz229469.pcap host <server> ; bzip2 /tmp/bz229469.pcap

Comment 7 Michael Young 2007-02-25 18:24:43 UTC
Created attachment 148759 [details]
capture of two nfs clients

This is a full capture of the bug demo between two NFS clients and the Netapp
server. Packet 119 has the lock with svid=5 and packet 129 has the unlock with
svid=6

Comment 8 Michael Young 2007-02-25 23:27:55 UTC
This may be obvious, but I have been looking at the nlm_debug output when
running my test example, and counting the get hosts and release hosts suggests
that when the broken lock is granted, there are two gets and two releases, which
would of course mean that the list of lock owners would be cleared, which would
explain the behaviour I have been seeing. However, I haven't yet worked out why
the lock is released twice in this situation.

Comment 9 Michael Young 2007-02-27 17:09:11 UTC
I have spotted something new in my debug attempts. It seems that when the fcntl
F_SETLKW request is blocked on the server and interrupted, a lock is actually
granted on the local machine (which I presume is broken behaviour). A
consequence of this is that if fcntl is retried and the lock is now free on the
NFS server, the local machine reuses the existing local lock and thus it doesn't
call the nlmclnt_locks_copy_lock subroutine, so lockowner->count isn't
increased, and so the call to nlm_put_lockowner (via fl->fl_release_private in
the nlmclnt_proc subroutine) lowers lockowner->count to 0 and frees the record.
As a result, when the unlock command happens, there is no record of the svid for
the lock so a new one is used which of course the server ignores, so the lock on
the server is not removed.

Comment 10 Michael Young 2007-03-01 00:29:44 UTC
Created attachment 148974 [details]
patch to stop the creation of a pointless lock

I have found the bug, which is in the do_setlk function of fs/nfs/file.c which
creates a local lock supposedly to clean up the lock on the server.
Unfortunately, by this stage the lock has forgotten how to do this anyway, and
if the process tries again for same the lock and succeeds, this local only lock
is reused, and it still doesn't know how to remove the server lock when the
process does close, hence the lock left on the server. I have attached a patch
to stop the creation of this local only lock in the case when it can't hope to
remove the remote lock, which seems to fix at least the two client problem I
was seeing, but it may be that creating the local lock is always a mistake when
there is remote locking.

Comment 11 Steve Dickson 2007-03-06 13:44:37 UTC
What I don't understand is how the lock on the server is 
cleaned up? Assuming the lock on the server was create
via the NFS_PROTO(inode)->lock(filp, cmd, fl); call
by not locally registering the lock, how does the just
created server lock get cleaned up? Note: just because
status is EINTR or ERESTARTSYS done not me the server
will not creat the lock...

BTW, thanks for all your hard work on this... 

Comment 12 Michael Young 2007-03-06 16:08:31 UTC
I think I explained it badly (and was possibly misunderstanding it a bit
myself). Normally a local lock (copied from fl earlier) will have been created
as part of the NFS_PROTO(inode)->lock(filp, cmd, fl); call. The problem is that
by this stage nlmclnt_proc has already started to clean up the lock by running
fl->fl_ops->fl_release_private(fl); and fl->fl_ops = NULL; which means that, if
a local lock hasn't been created by now, then the lockowner count on
fl->fl_u.nfs_fl.owner will already be 0 and the lockowner record associated to
the lock deleted, which means that you have already lost the information that
would have allowed you to delete the remote lock in any case. Thus with the
current code, creating the lock at this point is pointless.

I am not sure whether it is right or not to try to create a local lock at this
point. In most cases the creation of the remote lock really will have failed,
which is what you are telling the program, which might then go off and do other
things before finally closing the file and releasing the local lock it doesn't
know it has, so you are potentially leaving the file locked locally for a long
period of time (though I guess the chances of this are small because most
processes will either try again or give up immediately). Also if you have two
processes competing for a lock on this file you may end up with one getting the
local lock, and the other getting the remote lock (which I think might have been
the cause of the do_vfs_lock errors we were seeing in our messages log).

If a remote lock has succeeded without a local lock being created (and I am not
sure what the course of events that can trigger this is, though it is possible
if a local only lock has already been created previously) then creating a local
lock only makes sense if you stand a chance of removing the remote one, which in
this case means delaying the fl_release_private call until later (I think that
if it exists, it does get called as part of the fcntl clear up anyway).

Comment 13 Michael Young 2007-03-06 16:14:37 UTC
One side point. FC6 doesn't seem to start the UDP nlockmgr by default (if it
isn't an NFS server), but it seems that our current NetApp box only tries UDP
not TCP, and this seems to make the locking problems we were seeing more common,
presumably because it somehow synchronized competing locking processes, making
the chance of a race condition triggering the problem more common.

Comment 14 Steve Dickson 2007-07-12 18:48:58 UTC
Moving on to F7, since I'm sure this is still an problem... 

Comment 15 Dave Miller 2007-12-11 05:00:25 UTC
We're hitting this on RHEL5U1, same setup as the reporter, pretty much. 
Multiple machines sharing the same NFS mount from a NetApp.

Comment 16 Michael Young 2007-12-11 11:06:53 UTC
You could try the patch in Comment #10 . We have been running with it applied to
standard Fedora kernels for some time now on the machines which were showing up
the bug, and haven't noticed any NFS issues.
Note: to clear the locks on the Netapp box, running
lock status -h
on the Netapp box will list the locks, and
priv set advanced; sm_mon -l yournfsclienthost; priv set
will clear them, though clearly it is a good idea to make sure the client is
idle first.

Comment 17 Bug Zapper 2008-05-14 12:10:13 UTC
This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'.

Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists.

Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs:
http://docs.fedoraproject.org/release-notes/

The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 18 Michael Young 2008-05-14 13:58:36 UTC
Confirmed that bug still exists in Fedora 9 (2.6.25-2.fc9.i686.xen), also Centos
5.1 (2.6.18-53.1.19.el5xen).

Comment 19 Dave Miller 2008-05-15 03:07:10 UTC
Confirmed still present in RHEL 5.1 (2.6.18-53.1.19.el5) as well

Comment 20 Chuck Lever 2008-10-20 15:58:20 UTC
Upstream 2.6.26 contains some patches that address issues with NLM locking that may be similar.  Has anyone tried the most recent publicly released Fedora 9 kernel (2.6.26.5-45.fc9) to see if the problem documented here still exists?

Comment 21 Steve Dickson 2008-10-21 17:02:01 UTC
Here are the commits Chuck is talking about... 


commit 5e7f37a76fa5b604949020b7317962262812b2dd
commit 536ff0f809b0f4d56e1c41e66768d330668e0a55
commit 4a9af59fee0701d9db99bc148d87b8852d6d6dd8
commit dc9d8d048168ff61c458bec06b28996cb90b182a
commit 8ec7ff74448f65ac963e330795d771ab14ec8408
commit 6b4b3a752b3464f2fd9fe2837fb19270c23c1d6b
commit 5f50c0c6d644d6c8180d9079c13c5d9de3adeb34
commit c4d7c402b788b73dc24f1e54a57f89d3dc5eb7bc
commit d11d10cc05c94a32632d6928d15a1034200dd9a5
commit 4a9af59fee0701d9db99bc148d87b8852d6d6dd8

Comment 22 Michael Young 2008-10-22 08:41:46 UTC
I have not had a chance to examine the new code in enough detail to be sure, but from my testing it seems the behaviour is better but still not perfect. I haven't seen the stuck remote lock reoccur, but processes waiting on a lock don't always get it even after it has been unlocked by the process with the lock.

Comment 23 Chuck Lever 2008-10-23 16:35:43 UTC
I downloaded and tried the reproducer attached to this bugzilla.  I ran it on a pair of Fedora 9 clients I have here with the 2.6.26.5-45.fc9 kernel.  It appeared to work correctly with both an OpenSolaris 2008.5 server and a 2.6.26.5-45.fc9 Linux NFS server.  I have not tried this with a NetApp filer.

Maybe the hung process problem you reported in #c22 is a different bug?

Comment 24 Trond Myklebust 2008-10-27 16:36:13 UTC
Michael,

Have you been running the test with lockd listening on a UDP port?

As you noted in comment #13, NetApp filers only send NLM callbacks over UDP, so
on FC-9, you would need to add something like

        lockd.nlm_udpport=40000

(or some other unused port number) to your grub.conf's kernel boot parameters.
Without this, you are indeed likely to see your test case fail to grab a
contended lock.

Comment 25 Michael Young 2008-10-27 22:10:41 UTC
I was actually seeing this on with several processes contending for a lock on a single machine, so I don't expect the lack of a UDP listener would make much difference (I could with earlier kernels reproduce the stuck lock problem with this setup).
The scenario is to run about 5 processes of the demo program at the same time. When the first lock is released the processes waiting for the lock don't necessarily acquire the now-freed lock. So far I have only managed to reproduce this behaviour on a single processor machine which may or may not be significant.
I am afraid I still haven't had a close enough look at the code to work out what is happening.

Comment 26 Trond Myklebust 2008-10-27 22:31:33 UTC
On the contrary, I do expect it to make a difference, since the code in
nlmclnt_unlock() will call posix_lock_file_wait() in order to free the
vfs lock before it notifies the server. As soon as it does so, the 5
processes that are contending for that lock will attempt to place a
blocking lock with the server, and will start waiting for the UDP callback.

Comment 27 Michael Young 2008-10-28 11:38:41 UTC
Yes, you are right (though the Fedora 9 and Centos/RHEL 5 fix to get lockd listening on udp is to uncomment the line LOCKD_UDPPORT=32769 in /etc/sysconfig/nfs and presumably to change the value for security reasons). What seems to have been happening in my tests were that once the lock was released locally the other processes tried to get a lock and were of course blocked, the netapp box releases the freed lock grants a new one but can't tell the appropriate process. From then on, as the lock attempt that has unknowingly succeeded times out, the lock gets passed unknowingly to another queuing process, so the lock attempts never succeed.

So yes, provided the linux box is listening on UDP the locking seems to work correctly now, but otherwise there is still the potential for competing locks not to be granted to a process.

Comment 28 csb sysadmin 2008-11-17 22:45:46 UTC
I don't know if what we experienced has anything to do with thie problem, but we had a RHEL5 system running 2.6.18-92.1.1 and after a 138 day uptime, the NFS server stopped being able to complete fcntl64(..., F_GETLK, .. calls or fcntl64(255, F_GETFL) calls in strace. For users whose home directories were on this server, these users could not start firefox, thunderbird, openoffice, many KDE apps, etc and I couldn't run this test script no matter which of the 35+ other RHEL4 and RHEL5 NFS clients I tried :

#!/bin/sh

(
flock -x 200
uname -a >> allHosts.txt
) 200>> lockFile


It would just hang. Eventually we had to reboot the NFS server, re-starting NFS didn't help, and I don't think there is anyway to re-start [lockd] since it's a kernel level process.

Comment 29 csb sysadmin 2008-11-17 22:48:49 UTC
and yes the test program would run locally but not over NFS.

Comment 30 Bug Zapper 2009-06-09 22:28:10 UTC
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 31 Michael Young 2009-06-10 08:20:47 UTC
This is working, provided that lockd is listening on UDP


Note You need to log in before you can comment on or make changes to this bug.