Bug 133215 - clients nfs mount goes stale after nfs service restart
Summary: clients nfs mount goes stale after nfs service restart
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: clumanager
Version: 3
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-09-22 14:34 UTC by Gregor Pardella
Modified: 2009-04-16 20:15 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-11-09 10:40:54 UTC
Embargoed:


Attachments (Terms of Use)
cluster.xml (5.91 KB, text/plain)
2004-09-27 09:02 UTC, Gregor Pardella
no flags Details
Should fix problem (1.44 KB, patch)
2004-09-28 00:09 UTC, Lon Hohberger
no flags Details | Diff

Description Gregor Pardella 2004-09-22 14:34:37 UTC
Description of problem:

I'm getting stale nfs file handles on nfs clients in certain 
situations.  Specifically, I see the error after a
manual service _restart_ or by stopping it for config changes and 
restarting it after a while. However, if I manually remount from the 
client, everything works fine again.
It only concerns clients which are addressed in netgroups.


Version-Release number of selected component (if applicable):
clumanager-1.2.3-1
Red Hat Enterprise Linux ES release 3 (Taroon Update 3)

How reproducible:
easy (very frequently)

Steps to Reproduce:
1.start a nfs export service in clumanager with netgroup as 
  nfs-client
2.mount nfs export on client
3.stop manually the nfs service
4.start manually the nfs service
5.try to access on the client the export
  
Actual results:
stale nfs handles

Expected results:
access to the nfs-mount

Additional info:

Comment 1 Lon Hohberger 2004-09-22 14:47:29 UTC
1.2.16-1 is in the Cluster Suite channel on RHN (!)

Steps 3 and 4: Do you mean "service nfs stop / service nfs start", or
using the cluster tools to restart it?


Comment 3 Gregor Pardella 2004-09-22 14:59:07 UTC
> 1.2.16-1 is in the Cluster Suite channel on RHN (!)
I know, but I have a working system, and because of this error
I cannot restart the cluster! :-(

>Steps 3 and 4: Do you mean "service nfs stop / service nfs start", 
>or using the cluster tools to restart it?

I'm using the cluster tool to manage the cluster-services.
The cluster tool gives the possibility to enable/disable and restart
a service.
So that steps 3/4 should be:
3. disable nfs-service by cluster-tool
4. enable nfs-service by cluster-tool
or restart nfs-service by cluster-tool

I'm sorry for the misunderstanding.

Comment 4 Lon Hohberger 2004-09-22 16:55:48 UTC
Not really a misunderstanding; just want to have everything clear.

WRT to upgrading, you can do it in a 'rolling' fashion; details are in
the errata advisories.  This should minimize downtime (because you
don't have to take the whole cluster offline to do it; just one node
at a time).

Few more questions:

(1) Is autofs (automount) used in conjunction with the clients?  If
so, what is the mount timeout?

(2) How would you characterize the the clients receiving ESTALE (e.g.
all netgroup members/some netgroup members/all clients [inside and
outside of netgroup]/some clients [random, not specific to netgroup])?

(3) What are the entries in /var/lib/nfs/rmtab and
<service-mountpoint>/.clumanager/rmtab?

(4) When the service is running, there should be a copy of "clurmtabd"
running for each export path (but not per client).  Is this the case?


Comment 5 Gregor Pardella 2004-09-23 14:24:34 UTC
>WRT to upgrading, you can do it in a 'rolling' fashion; details are   
>in the errata advisories.  This should minimize downtime (because you
>don't have to take the whole cluster offline to do it; just one node
>at a time).

sure, but I want to solve the problem with the stale nfs handles
before I have downtime ;-)

Answers:
1) Nope. There is no use of autofs
2) Sll netgroup members - expect those which are seperatly noted with
   special mount/export options.
3) The clients listed seperatly; /var/lib/nfs/rmtab contains the same
   entries as <service-mountpoint>/.clumanager/rmtab
   Was it that you wanted to know???
4) Thats the case!


I have just another question:
Why isn't it possible to make changes to the exports and reload the 
nfs service?? 
The only way to do it is to take down the service, make your changes
and then get it online again. But for some additional entries in the
export this is exorbitant.


Comment 6 Lon Hohberger 2004-09-23 15:20:56 UTC
Ok, so it's just the netgroup clients.  That should make it easier.

The answer to your question lies in the way services are defined. 
They're more or less monolithic with lots of properties as opposed to
modeled as a tree of separate entities combined in a group.

This is a known architectural limitation.  It should go away in the
future (next major release of RHCS).

Comment 7 Lon Hohberger 2004-09-23 15:59:07 UTC
Hmmmmm... Did you remove any NFS clients from the service?



Comment 8 Gregor Pardella 2004-09-24 09:18:45 UTC
> Hmmmmm... Did you remove any NFS clients from the service?

What do you mean? No, I didn't removed some clients, so that these
couldn't mount the exports ;-)

There's no need to make any changes to the service, if it goes down 
and up, the netgroup-clients get stale nfs-handles.
I'm not sure about the point how the behavior is if relocating the
service - I remember this worked well.



Comment 10 Lon Hohberger 2004-09-24 19:40:27 UTC
Thanks for the information.  A similar problem occurs (apparently)
with wildcards, and yet another with many individual exports.

I have thus far been unable to reproduce any of the above.

Could you attach your cluster.xml (you can change your IPs/hostnames
if you're paranoid about it, but please don't change anything else)?

Comment 11 Gregor Pardella 2004-09-27 09:02:19 UTC
Created attachment 104357 [details]
cluster.xml

Comment 12 Lon Hohberger 2004-09-27 15:03:21 UTC
Are you using the YP server to serve netgroups to the cluster?

More specifically, are netgroups from your clustered YP service being
used by your clustered NFS services?


Comment 13 Gregor Pardella 2004-09-27 15:24:10 UTC
Not really - its just a YP-Slave for your network. But sure,
the yp-server for the cluster is the slave.

Is there anything wrong??

Comment 14 Lon Hohberger 2004-09-27 16:03:27 UTC
It should be fine; we are just collecting as much data as we can so we
can try to figure out what's wrong.  Thanks for your patience.





Comment 15 Lon Hohberger 2004-09-28 00:09:38 UTC
Created attachment 104412 [details]
Should fix problem

Comment 16 Gregor Pardella 2004-09-29 11:39:58 UTC
thanks for the patch.
I will include it to the new clumanager before updating and inform 
you about the results (this will take some time).



Comment 17 Gregor Pardella 2004-10-08 16:08:50 UTC
thanks again for the patch -
it seems working correctly now.




Note You need to log in before you can comment on or make changes to this bug.