+++ This bug was initially created as a clone of Bug #199586 +++ Description of problem: The default chkconfig line in the nfslock init script starts it long before lockd is up. This causes clients to try to recover their locks too early. Here's a sample network trace, that shows PROGRAM_NOT_AVAILABLE after the client attempted to recover locks on a reboot: 163.999543 172.16.57.30 -> 172.16.57.138 Portmap V2 GETPORT Call STAT(100024) V:1 UDP 164.000380 172.16.57.138 -> 172.16.57.30 Portmap V2 GETPORT Reply (Call In 73) Port:32768 164.000832 172.16.57.30 -> 172.16.57.138 STAT V1 NOTIFY Call 164.003574 172.16.57.138 -> 172.16.57.30 STAT V1 NOTIFY Reply (Call In 75) 164.004416 172.16.57.138 -> 172.16.57.30 TCP 32866 > sunrpc [SYN] Seq=0 Len=0 MSS=1460 TSV=407443 TSER=0 WS=0 164.004700 172.16.57.30 -> 172.16.57.138 TCP sunrpc > 32866 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSV=4294700213 TSER=407443 WS=2 164.004765 172.16.57.138 -> 172.16.57.30 TCP 32866 > sunrpc [ACK] Seq=1 Ack=1 Win=5840 Len=0 TSV=407443 TSER=4294700213164.004947 172.16.57.138 -> 172.16.57.30 Portmap V2 GETPORT Call NLM(100021) V:1 TCP 164.005222 172.16.57.30 -> 172.16.57.138 TCP sunrpc > 32866 [ACK] Seq=1 Ack=61 Win=5792 Len=0 TSV=4294700214 TSER=407443 164.005832 172.16.57.30 -> 172.16.57.138 Portmap V2 GETPORT Reply (Call In 80) PROGRAM_NOT_AVAILABLE Changing nfslock.init chkconfig line to this: # chkconfig: 345 61 19 seems to fix the problem. Opening this for RHEL4, since that's where I originally noticed the problem, but it looks like FC has the same issue. -- Additional comment from jlayton on 2006-07-20 12:27 EST -- Going ahead and adding this to the 4.5 proposed list. Should be a pretty trivial fix and bad lock recovery can cause data corruption. I've not seen any customer complaints about this particular problem yet, but with the work happening on lock recovery, it's probably just a matter of time.
Created attachment 132769 [details] trivial patch to init script A solution is to make the nfslock script run after the nfs script. This trivial fix should fix the chkconfig line so that that happens by default.
This client should continue to retry when trying to reclaim a lock.... regardless of the error that was return.... and if the client does not continue to retry... its a bug in the client... imho...
Based on the date this bug was created, it appears to have been reported against rawhide during the development of a Fedora release that is no longer maintained. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained. If this bug remains in NEEDINFO thirty (30) days from now, we will automatically close it. If you can reproduce this bug in a maintained Fedora version (7, 8, or rawhide), please change this bug to the respective version and change the status to ASSIGNED. (If you're unable to change the bug's version or status, add a comment to the bug and someone will change it for you.) Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again.
This bug has been in NEEDINFO for more than 30 days since feedback was first requested. As a result we are closing it. If you can reproduce this bug in the future against a maintained Fedora version please feel free to reopen it against that version. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp
This one slipped through the cracks. I'll have a look at it again when I get the chance and see if this is a bug in the client like Steve suggests...
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Ok, looks like Steve was right on this, and this seems to work properly on rawhide (at least). Closing as NOTABUG.