Red Hat Bugzilla – Bug 199586
nfslock script starts statd before lockd is up so lock recovery fails
Last modified: 2014-06-18 03:35:31 EDT
Description of problem:
The default chkconfig line in the nfslock init script starts it long before
lockd is up. This causes clients to try to recover their locks too early. Here's
a sample network trace, that shows PROGRAM_NOT_AVAILABLE after the client
attempted to recover locks on a reboot:
163.999543 172.16.57.30 -> 172.16.57.138 Portmap V2 GETPORT Call STAT(100024)
164.000380 172.16.57.138 -> 172.16.57.30 Portmap V2 GETPORT Reply (Call In 73)
164.000832 172.16.57.30 -> 172.16.57.138 STAT V1 NOTIFY Call
164.003574 172.16.57.138 -> 172.16.57.30 STAT V1 NOTIFY Reply (Call In 75)
164.004416 172.16.57.138 -> 172.16.57.30 TCP 32866 > sunrpc [SYN] Seq=0 Len=0
MSS=1460 TSV=407443 TSER=0 WS=0
164.004700 172.16.57.30 -> 172.16.57.138 TCP sunrpc > 32866 [SYN, ACK] Seq=0
Ack=1 Win=5792 Len=0 MSS=1460 TSV=4294700213 TSER=407443 WS=2
164.004765 172.16.57.138 -> 172.16.57.30 TCP 32866 > sunrpc [ACK] Seq=1 Ack=1
Win=5840 Len=0 TSV=407443 TSER=4294700213164.004947 172.16.57.138 ->
172.16.57.30 Portmap V2 GETPORT Call NLM(100021) V:1 TCP
164.005222 172.16.57.30 -> 172.16.57.138 TCP sunrpc > 32866 [ACK] Seq=1 Ack=61
Win=5792 Len=0 TSV=4294700214 TSER=407443
164.005832 172.16.57.30 -> 172.16.57.138 Portmap V2 GETPORT Reply (Call In 80)
Changing nfslock.init chkconfig line to this:
# chkconfig: 345 61 19
seems to fix the problem. Opening this for RHEL4, since that's where I
originally noticed the problem, but it looks like FC has the same issue.
Created attachment 132827 [details]
trivial patch to init script
Trivial patch that seems to fix the problem.
What confuses me is that nfslock no longer brings lockd up (or down) . The kernel
does that when the server is started or the client mounts a fs, so I'm not sure
how or why this fix works...
Right -- lockd is now started by the 'nfs' script, so the only thing the nfslock
script now does is start statd. The fix here is just to make sure that nfslock
runs after the nfs script at boot time.
Another (maybe better?) fix might be to do away with the nfslock script
altogether and just have statd started by the 'nfs' script. Let me know if you
think that's the way to go.
Created attachment 132873 [details]
patch to move statd startup and shutdown into nfs.init
Something like this patch might be actually be a better way to go (though I've
not tested this patch as of yet).
This moves the rpc.statd startup and shutdown into nfs.init. With something
like this we can probably just remove nfslock.init from the package.
Alternately, we may just want to do this in the devel and or FC trees, and just
go with the chkconfig change for the existing RHEL releases.
I'd be OK either way...
Moving the starting of rpc.statd into the nfs init script would me
the nfs server would have to be started every time the system
booted (since statd is also needed by the client as well) which
is not the right thing to do... imho...
It seems to be that maybe nfslock should always bring up
lockd so its started the same time rpc.statd is... Maybe by
doing a 'modprobe lockd ' could cause the server to come
Also, maybe the bug could be tied with
since it may be related when it comes to failovers...
Good point, I hadn't considered the client-side use of nfslock.
Would there be any harm to simply making nfslock start later here? It seems like
that would take care of the server-side case.
Also, I'm not clear on what the effect would be on the server in starting lockd
up before mountd/exportfs, etc. I've not picked through the code enough to know
if server-side lockd would allow the client to reclaim a lock on a filesystem
that's not yet exported.
All that said, 146773 does look like a thornier problem. I'll go ahead and make
this BZ dependent on that one. Any fix for this would be affected by that case
anyhow, and we can just try to be cognizant of this problem as well to make sure
that it gets addressed.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
> Would there be any harm to simply making nfslock start later here?
I believe so... statd has to be up and running before the netfs
initscript runs others the client locking side would break...
After further review, this is not a server bug... If the client stop trying to
recover its locks just because the server has not made it up (yet) then
that is a client bug, because the client should *never* stop trying to
recover its locks...