492671 – NFS lock recovery on server reboot fails

Bug 492671 - NFS lock recovery on server reboot fails

Summary: NFS lock recovery on server reboot fails

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	nfs-utils
Sub Component:
Version:	4.8
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Steve Dickson
QA Contact:	yanfu,wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-03-27 22:44 UTC by Fabio Olive Leite
Modified:	2010-10-27 16:05 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	492669
Environment:
Last Closed:	2010-10-27 16:05:29 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Fabio Olive Leite 2009-03-27 22:44:56 UTC

+++ This bug was initially created as a clone of Bug #492669 +++

Description of problem:

When an NFS server boots, it starts the rpc.statd service before the kernel lockd service is up and running. If clients held locks when the server rebooted, they will try to recover their locks while the server lockd service is not yet available and will fail.

This problem is in fact in both client and server.
- Client should keep retrying until service is available;
- Server should not advertise its status change before NLM service is up.

This bugzilla is about the second part. This problem can easily be worked around by starting the nfslock service in rc.local. This bugzilla is a request for a change in service start ordering on boot so that nfs (which loads lockd module) starts before nfslock (which starts rpc.statd).

Version-Release number of selected component (if applicable):

4.8 packages.

How reproducible:

Always.

Steps to Reproduce:
1. Set up NFS client and server
2. Start tcpdump capture in client
3. Run program that locks file
4. Reboot server
5. After server comes up, stop capture and check that after NSM NOTIFY call the client tried to re-issue the lock and received a PROGRAM NOT AVAILABLE error from rpcbind at the server.

Actual results:

Client can't recover lock.

Expected results:

Client recovers lock.

Additional info:

NSM and NLM specifications are vague, and yes, NFS clients SHOULD retry the lock operation a few times with some wait period in between retries (vague eh?), but still, ensuring NSM is started _after_ NLM at the server leads to faster and more reliable lock recovery, regardless of what the clients do.

Note You need to log in before you can comment on or make changes to this bug.