Red Hat Bugzilla – Bug 492671
NFS lock recovery on server reboot fails
Last modified: 2010-10-27 12:05:29 EDT
+++ This bug was initially created as a clone of Bug #492669 +++
Description of problem:
When an NFS server boots, it starts the rpc.statd service before the kernel lockd service is up and running. If clients held locks when the server rebooted, they will try to recover their locks while the server lockd service is not yet available and will fail.
This problem is in fact in both client and server.
- Client should keep retrying until service is available;
- Server should not advertise its status change before NLM service is up.
This bugzilla is about the second part. This problem can easily be worked around by starting the nfslock service in rc.local. This bugzilla is a request for a change in service start ordering on boot so that nfs (which loads lockd module) starts before nfslock (which starts rpc.statd).
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Set up NFS client and server
2. Start tcpdump capture in client
3. Run program that locks file
4. Reboot server
5. After server comes up, stop capture and check that after NSM NOTIFY call the client tried to re-issue the lock and received a PROGRAM NOT AVAILABLE error from rpcbind at the server.
Client can't recover lock.
Client recovers lock.
NSM and NLM specifications are vague, and yes, NFS clients SHOULD retry the lock operation a few times with some wait period in between retries (vague eh?), but still, ensuring NSM is started _after_ NLM at the server leads to faster and more reliable lock recovery, regardless of what the clients do.