Red Hat Bugzilla – Bug 133215
clients nfs mount goes stale after nfs service restart
Last modified: 2009-04-16 16:15:25 EDT
Description of problem:
I'm getting stale nfs file handles on nfs clients in certain
situations. Specifically, I see the error after a
manual service _restart_ or by stopping it for config changes and
restarting it after a while. However, if I manually remount from the
client, everything works fine again.
It only concerns clients which are addressed in netgroups.
Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux ES release 3 (Taroon Update 3)
easy (very frequently)
Steps to Reproduce:
1.start a nfs export service in clumanager with netgroup as
2.mount nfs export on client
3.stop manually the nfs service
4.start manually the nfs service
5.try to access on the client the export
stale nfs handles
access to the nfs-mount
1.2.16-1 is in the Cluster Suite channel on RHN (!)
Steps 3 and 4: Do you mean "service nfs stop / service nfs start", or
using the cluster tools to restart it?
> 1.2.16-1 is in the Cluster Suite channel on RHN (!)
I know, but I have a working system, and because of this error
I cannot restart the cluster! :-(
>Steps 3 and 4: Do you mean "service nfs stop / service nfs start",
>or using the cluster tools to restart it?
I'm using the cluster tool to manage the cluster-services.
The cluster tool gives the possibility to enable/disable and restart
So that steps 3/4 should be:
3. disable nfs-service by cluster-tool
4. enable nfs-service by cluster-tool
or restart nfs-service by cluster-tool
I'm sorry for the misunderstanding.
Not really a misunderstanding; just want to have everything clear.
WRT to upgrading, you can do it in a 'rolling' fashion; details are in
the errata advisories. This should minimize downtime (because you
don't have to take the whole cluster offline to do it; just one node
at a time).
Few more questions:
(1) Is autofs (automount) used in conjunction with the clients? If
so, what is the mount timeout?
(2) How would you characterize the the clients receiving ESTALE (e.g.
all netgroup members/some netgroup members/all clients [inside and
outside of netgroup]/some clients [random, not specific to netgroup])?
(3) What are the entries in /var/lib/nfs/rmtab and
(4) When the service is running, there should be a copy of "clurmtabd"
running for each export path (but not per client). Is this the case?
>WRT to upgrading, you can do it in a 'rolling' fashion; details are
>in the errata advisories. This should minimize downtime (because you
>don't have to take the whole cluster offline to do it; just one node
>at a time).
sure, but I want to solve the problem with the stale nfs handles
before I have downtime ;-)
1) Nope. There is no use of autofs
2) Sll netgroup members - expect those which are seperatly noted with
special mount/export options.
3) The clients listed seperatly; /var/lib/nfs/rmtab contains the same
entries as <service-mountpoint>/.clumanager/rmtab
Was it that you wanted to know???
4) Thats the case!
I have just another question:
Why isn't it possible to make changes to the exports and reload the
The only way to do it is to take down the service, make your changes
and then get it online again. But for some additional entries in the
export this is exorbitant.
Ok, so it's just the netgroup clients. That should make it easier.
The answer to your question lies in the way services are defined.
They're more or less monolithic with lots of properties as opposed to
modeled as a tree of separate entities combined in a group.
This is a known architectural limitation. It should go away in the
future (next major release of RHCS).
Hmmmmm... Did you remove any NFS clients from the service?
> Hmmmmm... Did you remove any NFS clients from the service?
What do you mean? No, I didn't removed some clients, so that these
couldn't mount the exports ;-)
There's no need to make any changes to the service, if it goes down
and up, the netgroup-clients get stale nfs-handles.
I'm not sure about the point how the behavior is if relocating the
service - I remember this worked well.
Thanks for the information. A similar problem occurs (apparently)
with wildcards, and yet another with many individual exports.
I have thus far been unable to reproduce any of the above.
Could you attach your cluster.xml (you can change your IPs/hostnames
if you're paranoid about it, but please don't change anything else)?
Created attachment 104357 [details]
Are you using the YP server to serve netgroups to the cluster?
More specifically, are netgroups from your clustered YP service being
used by your clustered NFS services?
Not really - its just a YP-Slave for your network. But sure,
the yp-server for the cluster is the slave.
Is there anything wrong??
It should be fine; we are just collecting as much data as we can so we
can try to figure out what's wrong. Thanks for your patience.
Created attachment 104412 [details]
Should fix problem
thanks for the patch.
I will include it to the new clumanager before updating and inform
you about the results (this will take some time).
thanks again for the patch -
it seems working correctly now.