63162 – clients first nfs mount goes stale after manual relocation

Bug 63162 - clients first nfs mount goes stale after manual relocation

Summary: clients first nfs mount goes stale after manual relocation

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 2.1
Classification:	Red Hat
Component:	clumanager
Sub Component:
Version:	2.1
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Lon Hohberger
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	70115 70132
Blocks:	63033
TreeView+	depends on / blocked

Reported:	2002-04-10 19:15 UTC by Mike McLean
Modified:	2008-05-01 15:38 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2002-10-08 15:33:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2002:226	0	normal	SHIPPED_LIVE	Fixes for clumanager addressing starvation and service hangs	2002-10-08 04:00:00 UTC

Description Mike McLean 2002-04-10 19:15:30 UTC

* Pensacola-re0409.0AS  (and others)
* clumanager-1.0.9-1

Description of Problem:
I'm getting 'permission denied' errors (is this due to stale nfs file handles?)
on nfs clients in certain situations.  Specifically, I see the error after a
manual service relocation.  However, if I manually remount from the client, I do
not see the same error on subsequent relocations.  I've seen this with several
different clients.

How Reproducible:
very

Steps to Reproduce:
1. start and nfs export service in clumanager
2. mount nfs export on client
3. manually relocate service
4. try to get a directory listing of the nfs export

Comment 1 Mike McLean 2002-04-22 23:29:52 UTC

This problem seems to be (anti-) related to the persistent nfs daemons (bug
#63178).  The permission denied error appears when relating or failing over the
"first" time that the client mounts the nfs share.

It seems that there is some negotiation between client and nfs server that is
lost when shifting from one server to another. Once the client has mounted from
both servers, everything is fine, but until then the client's mount only
functions when the original node is active.  In particular, if you relocate
twice (back to same machine) the mountpoint works again.

When the error occurs, running 'showmount' on the server that controls the nfs
service will not show the client in question.  Once the client unmounts and
remounts, it will appear in showmount on both servers at all times and failovers
will go smoothly.

Comment 2 Lon Hohberger 2002-04-25 19:37:36 UTC

Client entries being present in /var/lib/nfs/rmtab seems to be key for correct
NFS fail-over and proper server recovery on reboot (in a non-cluster).  For
instance (assume cluc and clud are the cluster nodes; with the service starting
on cluc):

linda: mount -t nfs clubsvc5:/mnt/nfs0/dir1 /mnt/nfs
linda: ls /mnt/nfs
cluc: cluadmin -- service relocate nfs0
linda: ls /mnt/nfs
 - ESTALE
cluc: cluadmin -- service relocate nfs0
linda: ls /mnt/nfs
cluc: cluadmin -- service relocate nfs0
linda: ls /mnt/nfs
 - ESTALE
cluc: cp /dev/null /var/lib/nfs/rmtab
cluc: cluadmin -- service relocate nfs0
linda: ls /mnt/nfs
 - ESTALE

Ok, so, now the service is on cluc and we've received ESTALE.  If we copy the
old contents of /var/lib/nfs/rmtab from cluc to clud, and relocate the service
(to clud), we'll succeed with our 'ls'.  Similarly, if we restore
/var/lib/nfs/rmtab on cluc and relocate the service back to cluc, 'ls' will
again succeed.

Needless to say, this provides an interesting problem.  Either we can keep the
files in sync (somehow), or fix it so that the client retries in some manner.

Comment 3 Tim Burke 2002-04-25 22:45:40 UTC

Further testing reveals that this problem is only exhibited for wildcard (and
probably netgroup) exports.

For example, suppose the nfs client name is client1, when the service is
created, if client1 is explicitly itemized then the relocation will be
transparent.  But if the service is wildcard exported (e.g. *), then in response
to the relocation the client will encounter either ESTALE or EPERM (depending on
the version of nfs-utils running on the client). In this case, the only way for
the client to resume operation is to remount the nfs filesystem. 

This boils down to state maintained on the nfs server side.  In the case of the
relocate, this state isn't there.  To comprehensively address this problem would
require that the cluster infrastructure get involved with keeping the rmtab
state consistent across cluster members.  This change could entail modifications
to the nfs_utils.  Its sufficiently broad that its really not appropriate for
the small window we have for pensacola release.

I propose that we release note along these lines:

- For transparent relocation/failover you need to explicitly itemize the set of
authorized clients.
- If you use netgroups or wildcards, in response to a relocation/failover, in
certain circumstances it will be necessary to manually remount the directory on
the nfs client systems.

Comment 4 Lon Hohberger 2002-04-26 14:29:20 UTC

I've tested with netgroups and the behavior is the same.

Summary:
- Exports to explicit hosts/IPs don't need entries in /var/lib/nfs/rmtab
- Exports to host/IP wildcards and netgroups require entries in /var/lib/nfs/rmtab.

Comment 5 Lon Hohberger 2002-05-01 18:40:03 UTC

This is how it works:

exportfs reads rmtab when run, sending entries in rmtab which match the current
export pathname/wildcard up to the kernel.  It also sends the export
pathname/wildcard to rpc.mountd for authentication.

rpc.mountd places entries in rmtab when a mount request is authenticated
successfully and removes entries then umount requests are received.

This little bit of state is used in the event of a server reboot so that clients
can transparently continue working (after a delay, of course).

Unfortunately, if this list isn't present on the other node (or if it has
different entries), the nfs clients will receive ESTALE.

Comment 6 Lon Hohberger 2002-05-13 17:24:21 UTC

Fix in pool.  Awaiting more testing from different developers before closing.

Note You need to log in before you can comment on or make changes to this bug.