Red Hat Bugzilla – Bug 229469
NFS fcntl locks being released locally but not on server
Last modified: 2009-06-10 04:20:47 EDT
We have a busy linux client doing an NFS mount to a Netapp server, where there
is some contention for fcntl advisory file locks, and we are seeing file locks
being left on the NetApp server when the lock on the linux box has been removed,
and thus future attempts to access that file fail. It seems likely that either
the linux box isn't requesting the lock to be removed on the NetApp box, or the
NetApp box is failing to remove it. We do also see a lot of occurrences of the
do_vfs_lock: VFS is out of sync with lock manager!
This is with kernel 2.6.19-1.2911.fc6 and ONTAP release 7.0.5.
Created attachment 148680 [details]
relevant packets from lockd capture
I have done a packet capture of the lockd activity between the two boxes, and
have isolated where the problem occurs. In this case the file involved was
locked from another computer, and released between packets 3 and 4, while the
linux box is cancelling and retrying the lock every 5 seconds (cancel in packet
4, new lock request in packet 5). The Netapp box answers both requests in
packet 6, but it seems that the linux box doesn't handle the lock granted part
of the request correctly and forgets the file is locked, and rerequests the
lock in packet 7 but of course the Netapp box thinks the file is locked and
blocks the request.
Created attachment 148692 [details]
full packet capture from demo program
I have written a demo program, which when run on two linux boxes causes the
stuck lock on the netapp box (the left lock has svid=6).
Created attachment 148693 [details]
demo program to trigger lock problems
Actually, you don't need two linux nfs clients, running the demo program twice
concurrently on a single NFS client works just as well. Also the bug is
repeatable on i686 as well as x86_64 so is probably the case on all architectures.
Actually I am now coming to the conclusion that my demo program gives a slightly
different failure mode on the nfs clients from the original problem, because in
the two client case a lock is created on both client and server, but the svid
isn't recorded correctly, and the unlock command sent to the server has a new svid.
Also, the one client nfs test has only worked for me once, but I do know that
the two client problem is the same against a linux NFS server as a NetApp one.
Would it be possible to post a bzip2 binary tethereal trace of this svid problem?
Something similar to:
tethereal -w /tmp/bz229469.pcap host <server> ; bzip2 /tmp/bz229469.pcap
Created attachment 148759 [details]
capture of two nfs clients
This is a full capture of the bug demo between two NFS clients and the Netapp
server. Packet 119 has the lock with svid=5 and packet 129 has the unlock with
This may be obvious, but I have been looking at the nlm_debug output when
running my test example, and counting the get hosts and release hosts suggests
that when the broken lock is granted, there are two gets and two releases, which
would of course mean that the list of lock owners would be cleared, which would
explain the behaviour I have been seeing. However, I haven't yet worked out why
the lock is released twice in this situation.
I have spotted something new in my debug attempts. It seems that when the fcntl
F_SETLKW request is blocked on the server and interrupted, a lock is actually
granted on the local machine (which I presume is broken behaviour). A
consequence of this is that if fcntl is retried and the lock is now free on the
NFS server, the local machine reuses the existing local lock and thus it doesn't
call the nlmclnt_locks_copy_lock subroutine, so lockowner->count isn't
increased, and so the call to nlm_put_lockowner (via fl->fl_release_private in
the nlmclnt_proc subroutine) lowers lockowner->count to 0 and frees the record.
As a result, when the unlock command happens, there is no record of the svid for
the lock so a new one is used which of course the server ignores, so the lock on
the server is not removed.
Created attachment 148974 [details]
patch to stop the creation of a pointless lock
I have found the bug, which is in the do_setlk function of fs/nfs/file.c which
creates a local lock supposedly to clean up the lock on the server.
Unfortunately, by this stage the lock has forgotten how to do this anyway, and
if the process tries again for same the lock and succeeds, this local only lock
is reused, and it still doesn't know how to remove the server lock when the
process does close, hence the lock left on the server. I have attached a patch
to stop the creation of this local only lock in the case when it can't hope to
remove the remote lock, which seems to fix at least the two client problem I
was seeing, but it may be that creating the local lock is always a mistake when
there is remote locking.
What I don't understand is how the lock on the server is
cleaned up? Assuming the lock on the server was create
via the NFS_PROTO(inode)->lock(filp, cmd, fl); call
by not locally registering the lock, how does the just
created server lock get cleaned up? Note: just because
status is EINTR or ERESTARTSYS done not me the server
will not creat the lock...
BTW, thanks for all your hard work on this...
I think I explained it badly (and was possibly misunderstanding it a bit
myself). Normally a local lock (copied from fl earlier) will have been created
as part of the NFS_PROTO(inode)->lock(filp, cmd, fl); call. The problem is that
by this stage nlmclnt_proc has already started to clean up the lock by running
fl->fl_ops->fl_release_private(fl); and fl->fl_ops = NULL; which means that, if
a local lock hasn't been created by now, then the lockowner count on
fl->fl_u.nfs_fl.owner will already be 0 and the lockowner record associated to
the lock deleted, which means that you have already lost the information that
would have allowed you to delete the remote lock in any case. Thus with the
current code, creating the lock at this point is pointless.
I am not sure whether it is right or not to try to create a local lock at this
point. In most cases the creation of the remote lock really will have failed,
which is what you are telling the program, which might then go off and do other
things before finally closing the file and releasing the local lock it doesn't
know it has, so you are potentially leaving the file locked locally for a long
period of time (though I guess the chances of this are small because most
processes will either try again or give up immediately). Also if you have two
processes competing for a lock on this file you may end up with one getting the
local lock, and the other getting the remote lock (which I think might have been
the cause of the do_vfs_lock errors we were seeing in our messages log).
If a remote lock has succeeded without a local lock being created (and I am not
sure what the course of events that can trigger this is, though it is possible
if a local only lock has already been created previously) then creating a local
lock only makes sense if you stand a chance of removing the remote one, which in
this case means delaying the fl_release_private call until later (I think that
if it exists, it does get called as part of the fcntl clear up anyway).
One side point. FC6 doesn't seem to start the UDP nlockmgr by default (if it
isn't an NFS server), but it seems that our current NetApp box only tries UDP
not TCP, and this seems to make the locking problems we were seeing more common,
presumably because it somehow synchronized competing locking processes, making
the chance of a race condition triggering the problem more common.
Moving on to F7, since I'm sure this is still an problem...
We're hitting this on RHEL5U1, same setup as the reporter, pretty much.
Multiple machines sharing the same NFS mount from a NetApp.
You could try the patch in Comment #10 . We have been running with it applied to
standard Fedora kernels for some time now on the machines which were showing up
the bug, and haven't noticed any NFS issues.
Note: to clear the locks on the Netapp box, running
lock status -h
on the Netapp box will list the locks, and
priv set advanced; sm_mon -l yournfsclienthost; priv set
will clear them, though clearly it is a good idea to make sure the client is
This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'.
Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists.
Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs:
The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Confirmed that bug still exists in Fedora 9 (2.6.25-2.fc9.i686.xen), also Centos
Confirmed still present in RHEL 5.1 (2.6.18-53.1.19.el5) as well
Upstream 2.6.26 contains some patches that address issues with NLM locking that may be similar. Has anyone tried the most recent publicly released Fedora 9 kernel (22.214.171.124-45.fc9) to see if the problem documented here still exists?
Here are the commits Chuck is talking about...
I have not had a chance to examine the new code in enough detail to be sure, but from my testing it seems the behaviour is better but still not perfect. I haven't seen the stuck remote lock reoccur, but processes waiting on a lock don't always get it even after it has been unlocked by the process with the lock.
I downloaded and tried the reproducer attached to this bugzilla. I ran it on a pair of Fedora 9 clients I have here with the 126.96.36.199-45.fc9 kernel. It appeared to work correctly with both an OpenSolaris 2008.5 server and a 188.8.131.52-45.fc9 Linux NFS server. I have not tried this with a NetApp filer.
Maybe the hung process problem you reported in #c22 is a different bug?
Have you been running the test with lockd listening on a UDP port?
As you noted in comment #13, NetApp filers only send NLM callbacks over UDP, so
on FC-9, you would need to add something like
(or some other unused port number) to your grub.conf's kernel boot parameters.
Without this, you are indeed likely to see your test case fail to grab a
I was actually seeing this on with several processes contending for a lock on a single machine, so I don't expect the lack of a UDP listener would make much difference (I could with earlier kernels reproduce the stuck lock problem with this setup).
The scenario is to run about 5 processes of the demo program at the same time. When the first lock is released the processes waiting for the lock don't necessarily acquire the now-freed lock. So far I have only managed to reproduce this behaviour on a single processor machine which may or may not be significant.
I am afraid I still haven't had a close enough look at the code to work out what is happening.
On the contrary, I do expect it to make a difference, since the code in
nlmclnt_unlock() will call posix_lock_file_wait() in order to free the
vfs lock before it notifies the server. As soon as it does so, the 5
processes that are contending for that lock will attempt to place a
blocking lock with the server, and will start waiting for the UDP callback.
Yes, you are right (though the Fedora 9 and Centos/RHEL 5 fix to get lockd listening on udp is to uncomment the line LOCKD_UDPPORT=32769 in /etc/sysconfig/nfs and presumably to change the value for security reasons). What seems to have been happening in my tests were that once the lock was released locally the other processes tried to get a lock and were of course blocked, the netapp box releases the freed lock grants a new one but can't tell the appropriate process. From then on, as the lock attempt that has unknowingly succeeded times out, the lock gets passed unknowingly to another queuing process, so the lock attempts never succeed.
So yes, provided the linux box is listening on UDP the locking seems to work correctly now, but otherwise there is still the potential for competing locks not to be granted to a process.
I don't know if what we experienced has anything to do with thie problem, but we had a RHEL5 system running 2.6.18-92.1.1 and after a 138 day uptime, the NFS server stopped being able to complete fcntl64(..., F_GETLK, .. calls or fcntl64(255, F_GETFL) calls in strace. For users whose home directories were on this server, these users could not start firefox, thunderbird, openoffice, many KDE apps, etc and I couldn't run this test script no matter which of the 35+ other RHEL4 and RHEL5 NFS clients I tried :
flock -x 200
uname -a >> allHosts.txt
) 200>> lockFile
It would just hang. Eventually we had to reboot the NFS server, re-starting NFS didn't help, and I don't think there is anyway to re-start [lockd] since it's a kernel level process.
and yes the test program would run locally but not over NFS.
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '9'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 9's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 9 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
This is working, provided that lockd is listening on UDP