Red Hat Bugzilla – Bug 493337
Problem with blocking locks on RHEL 5
Last modified: 2009-04-24 15:25:11 EDT
On RHEL 5, when using blocking locks, we can end up with a lock on the file which is not owned by any client and cannot be released. I have tested this with kernel 2.6.18-133.el5 which contains the fix from bz 448929. This contains the patch
which was expected to fix the issue on RHEL 5.
The test program works fine on 188.8.131.52-117.fc10.x86_64 kernel.
The problem here appears to be similar to the case we see here.
Step to Reproduce:
To reproduce, please compile and use the attached programs. we will need 2 NFS clients mounting the same nfs share.
The test programs will have to be run on 2 different nfs clients over the same nfs share. The commands will have to be run in the sequence show in attached file reproducer_steps. A file named dlvcan2.tab will have to be created on the current working directory.
At the end of the set of reproducer steps, the process lockchk can be cancelled. However the lock on the file still exists and is never released. The locks held can be checked in /proc/locks on the nfs server. This can be cleared on the nfs server by running the command 'service nfslock restart'.
Created attachment 337531 [details]
Sequence in which the test programs need to be run.
Created attachment 337533 [details]
tcpdump taken when problem is detected.
The following frame numbers show the locking activity leading up to the problem.
335: vm21 to vm11 unlock svid 1
336: vm11 to vm21 unlock granted
368: vm22 to vm11 lock svid 3
370: vm11 to vm22 lock granted.
374: vm21 to vm11 lock svid 2
375: vm11 to vm21 lock blocked (due to other client(vm22)holding lock.)
510: vm22 to vm11 unlock svid 3
511: vm11 to vm22 unlock granted
522: vm21 to vm11 cancel lock svid 2
523: vm21 to vm11 lock svid 3
524: vm11 to vm21 cancel granted
525: vm11 to vm21 lock granted
534: vm21 to vm11 unlock svid 4 <-- In this case, we are not sure why it calls unlock for svid 4.
535: vm11 to vm21 unlock granted
543: vm21 to vm11 lock svid 5
544: vm11 to vm21 lock blocked ( not sure why )
543 and 544 is then repeated with increasing number of svid.
> 522: vm21 to vm11 cancel lock svid 2
> 523: vm21 to vm11 lock svid 3
> 524: vm11 to vm21 cancel granted
> 525: vm11 to vm21 lock granted
> 534: vm21 to vm11 unlock svid 4 <-- In this case, we are not sure why it calls
> unlock for svid 4.
> 535: vm11 to vm21 unlock granted
> 543: vm21 to vm11 lock svid 5
> 544: vm11 to vm21 lock blocked ( not sure why )
The lock is probably being blocked because svid 3 is holding the lock. It never got released.
Development Management has reviewed and declined this request. You may appeal
this decision by reopening this request.
This got closed too soon. This needs to be re-flagged for 5.5.
When the client process receives a signal, nlmclnt_block() waiting for a
response from the server returns with a -ERESTARTSYS. This is propagated
all the way back to do_setlk. An if condition causes a lock to be set on
the system even though the nfs lock is not set.
For subsequent lock/unlock requests, the unlock function matches the old
lock and the unlock request sent is for this old lock. The server returns
success for the old lock which is interpreted as a successful unlock for
the new lock on the client. However the new lock set on the server is
never freed. We thus get into a condition where the server holds a lock on
a file which is not claimed by any client. All subsequent locks for this
file to the server are blocked.
This is fixed by upstream commit c4d7c402b788b73dc24f1e54a57f89d3dc5eb7b
This event sent from IssueTracker by sprabhu
Upstream commit c4d7c402b788b73dc24f1e54a57f89d3dc5eb7b has been backported to RHEL 5 kernel version 2.6.18-138.
* Fri Apr 03 2009 Don Zickus <email@example.com> [2.6.18-138.el5]
- [nfs] remove bogus lock-if-signalled case (Bryn M. Reeves ) 
The reproducer provided was successfully tested against this kernel version.
Reporter has confirmed that the latest kernel doesn't show the problem with the locks.
Closing this as dup of 456288.
Note that the issues reported here are very different however the same patch fixes both issues.
*** This bug has been marked as a duplicate of bug 456288 ***