We have a busy linux client doing an NFS mount to a Netapp server, where there is some contention for fcntl advisory file locks, and we are seeing file locks being left on the NetApp server when the lock on the linux box has been removed, and thus future attempts to access that file fail. It seems likely that either the linux box isn't requesting the lock to be removed on the NetApp box, or the NetApp box is failing to remove it. We do also see a lot of occurrences of the message do_vfs_lock: VFS is out of sync with lock manager! in /var/log/messages. This is with kernel 2.6.19-1.2911.fc6 and ONTAP release 7.0.5.
Created attachment 148680 [details] relevant packets from lockd capture I have done a packet capture of the lockd activity between the two boxes, and have isolated where the problem occurs. In this case the file involved was locked from another computer, and released between packets 3 and 4, while the linux box is cancelling and retrying the lock every 5 seconds (cancel in packet 4, new lock request in packet 5). The Netapp box answers both requests in packet 6, but it seems that the linux box doesn't handle the lock granted part of the request correctly and forgets the file is locked, and rerequests the lock in packet 7 but of course the Netapp box thinks the file is locked and blocks the request.
Created attachment 148692 [details] full packet capture from demo program I have written a demo program, which when run on two linux boxes causes the stuck lock on the netapp box (the left lock has svid=6).
Created attachment 148693 [details] demo program to trigger lock problems
Actually, you don't need two linux nfs clients, running the demo program twice concurrently on a single NFS client works just as well. Also the bug is repeatable on i686 as well as x86_64 so is probably the case on all architectures.
Actually I am now coming to the conclusion that my demo program gives a slightly different failure mode on the nfs clients from the original problem, because in the two client case a lock is created on both client and server, but the svid isn't recorded correctly, and the unlock command sent to the server has a new svid. Also, the one client nfs test has only worked for me once, but I do know that the two client problem is the same against a linux NFS server as a NetApp one.
Would it be possible to post a bzip2 binary tethereal trace of this svid problem? Something similar to: tethereal -w /tmp/bz229469.pcap host <server> ; bzip2 /tmp/bz229469.pcap
Created attachment 148759 [details] capture of two nfs clients This is a full capture of the bug demo between two NFS clients and the Netapp server. Packet 119 has the lock with svid=5 and packet 129 has the unlock with svid=6
This may be obvious, but I have been looking at the nlm_debug output when running my test example, and counting the get hosts and release hosts suggests that when the broken lock is granted, there are two gets and two releases, which would of course mean that the list of lock owners would be cleared, which would explain the behaviour I have been seeing. However, I haven't yet worked out why the lock is released twice in this situation.
I have spotted something new in my debug attempts. It seems that when the fcntl F_SETLKW request is blocked on the server and interrupted, a lock is actually granted on the local machine (which I presume is broken behaviour). A consequence of this is that if fcntl is retried and the lock is now free on the NFS server, the local machine reuses the existing local lock and thus it doesn't call the nlmclnt_locks_copy_lock subroutine, so lockowner->count isn't increased, and so the call to nlm_put_lockowner (via fl->fl_release_private in the nlmclnt_proc subroutine) lowers lockowner->count to 0 and frees the record. As a result, when the unlock command happens, there is no record of the svid for the lock so a new one is used which of course the server ignores, so the lock on the server is not removed.
Created attachment 148974 [details] patch to stop the creation of a pointless lock I have found the bug, which is in the do_setlk function of fs/nfs/file.c which creates a local lock supposedly to clean up the lock on the server. Unfortunately, by this stage the lock has forgotten how to do this anyway, and if the process tries again for same the lock and succeeds, this local only lock is reused, and it still doesn't know how to remove the server lock when the process does close, hence the lock left on the server. I have attached a patch to stop the creation of this local only lock in the case when it can't hope to remove the remote lock, which seems to fix at least the two client problem I was seeing, but it may be that creating the local lock is always a mistake when there is remote locking.
What I don't understand is how the lock on the server is cleaned up? Assuming the lock on the server was create via the NFS_PROTO(inode)->lock(filp, cmd, fl); call by not locally registering the lock, how does the just created server lock get cleaned up? Note: just because status is EINTR or ERESTARTSYS done not me the server will not creat the lock... BTW, thanks for all your hard work on this...
I think I explained it badly (and was possibly misunderstanding it a bit myself). Normally a local lock (copied from fl earlier) will have been created as part of the NFS_PROTO(inode)->lock(filp, cmd, fl); call. The problem is that by this stage nlmclnt_proc has already started to clean up the lock by running fl->fl_ops->fl_release_private(fl); and fl->fl_ops = NULL; which means that, if a local lock hasn't been created by now, then the lockowner count on fl->fl_u.nfs_fl.owner will already be 0 and the lockowner record associated to the lock deleted, which means that you have already lost the information that would have allowed you to delete the remote lock in any case. Thus with the current code, creating the lock at this point is pointless. I am not sure whether it is right or not to try to create a local lock at this point. In most cases the creation of the remote lock really will have failed, which is what you are telling the program, which might then go off and do other things before finally closing the file and releasing the local lock it doesn't know it has, so you are potentially leaving the file locked locally for a long period of time (though I guess the chances of this are small because most processes will either try again or give up immediately). Also if you have two processes competing for a lock on this file you may end up with one getting the local lock, and the other getting the remote lock (which I think might have been the cause of the do_vfs_lock errors we were seeing in our messages log). If a remote lock has succeeded without a local lock being created (and I am not sure what the course of events that can trigger this is, though it is possible if a local only lock has already been created previously) then creating a local lock only makes sense if you stand a chance of removing the remote one, which in this case means delaying the fl_release_private call until later (I think that if it exists, it does get called as part of the fcntl clear up anyway).
One side point. FC6 doesn't seem to start the UDP nlockmgr by default (if it isn't an NFS server), but it seems that our current NetApp box only tries UDP not TCP, and this seems to make the locking problems we were seeing more common, presumably because it somehow synchronized competing locking processes, making the chance of a race condition triggering the problem more common.
Moving on to F7, since I'm sure this is still an problem...
We're hitting this on RHEL5U1, same setup as the reporter, pretty much. Multiple machines sharing the same NFS mount from a NetApp.
You could try the patch in Comment #10 . We have been running with it applied to standard Fedora kernels for some time now on the machines which were showing up the bug, and haven't noticed any NFS issues. Note: to clear the locks on the Netapp box, running lock status -h on the Netapp box will list the locks, and priv set advanced; sm_mon -l yournfsclienthost; priv set will clear them, though clearly it is a good idea to make sure the client is idle first.
This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists. Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs: http://docs.fedoraproject.org/release-notes/ The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Confirmed that bug still exists in Fedora 9 (2.6.25-2.fc9.i686.xen), also Centos 5.1 (2.6.18-53.1.19.el5xen).
Confirmed still present in RHEL 5.1 (2.6.18-53.1.19.el5) as well
Upstream 2.6.26 contains some patches that address issues with NLM locking that may be similar. Has anyone tried the most recent publicly released Fedora 9 kernel (2.6.26.5-45.fc9) to see if the problem documented here still exists?
Here are the commits Chuck is talking about... commit 5e7f37a76fa5b604949020b7317962262812b2dd commit 536ff0f809b0f4d56e1c41e66768d330668e0a55 commit 4a9af59fee0701d9db99bc148d87b8852d6d6dd8 commit dc9d8d048168ff61c458bec06b28996cb90b182a commit 8ec7ff74448f65ac963e330795d771ab14ec8408 commit 6b4b3a752b3464f2fd9fe2837fb19270c23c1d6b commit 5f50c0c6d644d6c8180d9079c13c5d9de3adeb34 commit c4d7c402b788b73dc24f1e54a57f89d3dc5eb7bc commit d11d10cc05c94a32632d6928d15a1034200dd9a5 commit 4a9af59fee0701d9db99bc148d87b8852d6d6dd8
I have not had a chance to examine the new code in enough detail to be sure, but from my testing it seems the behaviour is better but still not perfect. I haven't seen the stuck remote lock reoccur, but processes waiting on a lock don't always get it even after it has been unlocked by the process with the lock.
I downloaded and tried the reproducer attached to this bugzilla. I ran it on a pair of Fedora 9 clients I have here with the 2.6.26.5-45.fc9 kernel. It appeared to work correctly with both an OpenSolaris 2008.5 server and a 2.6.26.5-45.fc9 Linux NFS server. I have not tried this with a NetApp filer. Maybe the hung process problem you reported in #c22 is a different bug?
Michael, Have you been running the test with lockd listening on a UDP port? As you noted in comment #13, NetApp filers only send NLM callbacks over UDP, so on FC-9, you would need to add something like lockd.nlm_udpport=40000 (or some other unused port number) to your grub.conf's kernel boot parameters. Without this, you are indeed likely to see your test case fail to grab a contended lock.
I was actually seeing this on with several processes contending for a lock on a single machine, so I don't expect the lack of a UDP listener would make much difference (I could with earlier kernels reproduce the stuck lock problem with this setup). The scenario is to run about 5 processes of the demo program at the same time. When the first lock is released the processes waiting for the lock don't necessarily acquire the now-freed lock. So far I have only managed to reproduce this behaviour on a single processor machine which may or may not be significant. I am afraid I still haven't had a close enough look at the code to work out what is happening.
On the contrary, I do expect it to make a difference, since the code in nlmclnt_unlock() will call posix_lock_file_wait() in order to free the vfs lock before it notifies the server. As soon as it does so, the 5 processes that are contending for that lock will attempt to place a blocking lock with the server, and will start waiting for the UDP callback.
Yes, you are right (though the Fedora 9 and Centos/RHEL 5 fix to get lockd listening on udp is to uncomment the line LOCKD_UDPPORT=32769 in /etc/sysconfig/nfs and presumably to change the value for security reasons). What seems to have been happening in my tests were that once the lock was released locally the other processes tried to get a lock and were of course blocked, the netapp box releases the freed lock grants a new one but can't tell the appropriate process. From then on, as the lock attempt that has unknowingly succeeded times out, the lock gets passed unknowingly to another queuing process, so the lock attempts never succeed. So yes, provided the linux box is listening on UDP the locking seems to work correctly now, but otherwise there is still the potential for competing locks not to be granted to a process.
I don't know if what we experienced has anything to do with thie problem, but we had a RHEL5 system running 2.6.18-92.1.1 and after a 138 day uptime, the NFS server stopped being able to complete fcntl64(..., F_GETLK, .. calls or fcntl64(255, F_GETFL) calls in strace. For users whose home directories were on this server, these users could not start firefox, thunderbird, openoffice, many KDE apps, etc and I couldn't run this test script no matter which of the 35+ other RHEL4 and RHEL5 NFS clients I tried : #!/bin/sh ( flock -x 200 uname -a >> allHosts.txt ) 200>> lockFile It would just hang. Eventually we had to reboot the NFS server, re-starting NFS didn't help, and I don't think there is anyway to re-start [lockd] since it's a kernel level process.
and yes the test program would run locally but not over NFS.
This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This is working, provided that lockd is listening on UDP