On RHEL 5, when using blocking locks, we can end up with a lock on the file which is not owned by any client and cannot be released. I have tested this with kernel 2.6.18-133.el5 which contains the fix from bz 448929. This contains the patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5f50c0c6d644d6c8180d9079c13c5d9de3adeb34 which was expected to fix the issue on RHEL 5. The test program works fine on 2.6.27.5-117.fc10.x86_64 kernel. The problem here appears to be similar to the case we see here. http://marc.info/?l=linux-nfs&m=120663578712912&w=2 Step to Reproduce: To reproduce, please compile and use the attached programs. we will need 2 NFS clients mounting the same nfs share. The test programs will have to be run on 2 different nfs clients over the same nfs share. The commands will have to be run in the sequence show in attached file reproducer_steps. A file named dlvcan2.tab will have to be created on the current working directory. At the end of the set of reproducer steps, the process lockchk can be cancelled. However the lock on the file still exists and is never released. The locks held can be checked in /proc/locks on the nfs server. This can be cleared on the nfs server by running the command 'service nfslock restart'.
Created attachment 337531 [details] Sequence in which the test programs need to be run.
Created attachment 337533 [details] tcpdump taken when problem is detected. vm21: 192.168.122.21 vm22: 192.168.122.22 The following frame numbers show the locking activity leading up to the problem. 335: vm21 to vm11 unlock svid 1 336: vm11 to vm21 unlock granted 368: vm22 to vm11 lock svid 3 370: vm11 to vm22 lock granted. 374: vm21 to vm11 lock svid 2 375: vm11 to vm21 lock blocked (due to other client(vm22)holding lock.) 510: vm22 to vm11 unlock svid 3 511: vm11 to vm22 unlock granted 522: vm21 to vm11 cancel lock svid 2 523: vm21 to vm11 lock svid 3 524: vm11 to vm21 cancel granted 525: vm11 to vm21 lock granted 534: vm21 to vm11 unlock svid 4 <-- In this case, we are not sure why it calls unlock for svid 4. 535: vm11 to vm21 unlock granted 543: vm21 to vm11 lock svid 5 544: vm11 to vm21 lock blocked ( not sure why ) 543 and 544 is then repeated with increasing number of svid.
> 522: vm21 to vm11 cancel lock svid 2 > 523: vm21 to vm11 lock svid 3 > 524: vm11 to vm21 cancel granted > 525: vm11 to vm21 lock granted > > 534: vm21 to vm11 unlock svid 4 <-- In this case, we are not sure why it calls > unlock for svid 4. > 535: vm11 to vm21 unlock granted > > 543: vm21 to vm11 lock svid 5 > 544: vm11 to vm21 lock blocked ( not sure why ) > The lock is probably being blocked because svid 3 is holding the lock. It never got released.
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.
This got closed too soon. This needs to be re-flagged for 5.5.
When the client process receives a signal, nlmclnt_block() waiting for a response from the server returns with a -ERESTARTSYS. This is propagated all the way back to do_setlk. An if condition causes a lock to be set on the system even though the nfs lock is not set. For subsequent lock/unlock requests, the unlock function matches the old lock and the unlock request sent is for this old lock. The server returns success for the old lock which is interpreted as a successful unlock for the new lock on the client. However the new lock set on the server is never freed. We thus get into a condition where the server holds a lock on a file which is not claimed by any client. All subsequent locks for this file to the server are blocked. This is fixed by upstream commit c4d7c402b788b73dc24f1e54a57f89d3dc5eb7b This event sent from IssueTracker by sprabhu issue 268852
Upstream commit c4d7c402b788b73dc24f1e54a57f89d3dc5eb7b has been backported to RHEL 5 kernel version 2.6.18-138. * Fri Apr 03 2009 Don Zickus <dzickus> [2.6.18-138.el5] - [nfs] remove bogus lock-if-signalled case (Bryn M. Reeves ) [456288] The reproducer provided was successfully tested against this kernel version.
Reporter has confirmed that the latest kernel doesn't show the problem with the locks.
Closing this as dup of 456288. Note that the issues reported here are very different however the same patch fixes both issues. *** This bug has been marked as a duplicate of bug 456288 ***