Bug 114388

Summary: Stale file handles under failover conditions
Product: [Retired] Red Hat Cluster Suite Reporter: Need Real Name <cjk>
Component: clumanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: cluster-maint, riel, tao, vanhoof
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-11-09 17:39:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 117433    
Bug Blocks: 131576    
Attachments:
Description Flags
This oughtta fix it. none

Description Need Real Name 2004-01-27 15:47:24 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; 
H010818)

Description of problem:
on a 4 node cluster running RHEL3 AS base and update 1, NFS clients 
get stale file handles when failing over, or relocating NFS services.

Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-4.ELsmp and kernel-smp-2.4.21-9.ELsmp

How reproducible:
Always

Steps to Reproduce:
1.mount -t nfs X.X.X.X:/opt /opt
2.watch ls -l /opt
3.clusvcadm -r nfsservice
    

Actual Results:  result is stale file handles on the client

Expected Results:  I expect to see the files after a short (1 second) 
pause.

Additional info:

Under AS2.1 this worked flawlessly for this system. As I understood 
it, the kernel was patched to allow this behavior in 2.1 but it seems 
to be gone. Could this be a side effect of the RPC annomolies in the 
original RHEL 3 kernel?

Comment 2 Lon Hohberger 2004-02-06 18:37:41 UTC
Don't forget to open a ticket w/ Red Hat support, if you have not
already done so.



Comment 3 Lon Hohberger 2004-02-06 18:42:33 UTC
Could you check:

/var/lib/nfs/rmtab [from initial startup host]

and:
<device_mount_point>/.clumanager/rmtab

They should both contain the entry for your client machine.


Comment 4 Need Real Name 2004-02-18 18:31:56 UTC
I've checked on a box that has the clients listed by name rather 
than having a * in the client id place and the names are indeed
the same. This cluster however DOES work. It started working when
I specified the names instead of a * for the exports. Now that it all
works, I put a star in addition to those already named machines and
I see the ip address of the client machines and it seems to work for
those clients that get access via the *.

On a cluster that had just the *, the entries did not match.

Unfortunately I cannot revert the cluster back to the non-working 
state since it has been deemed critical to the project that it
supports.

Any insight as to why just having a * would cause such problems?

Comment 5 Lon Hohberger 2004-02-24 16:29:34 UTC
Not off the top of my head.  I'll look at the code rq.


Comment 6 Lon Hohberger 2004-02-24 16:49:22 UTC
The code looks correct (actually, it hasn't changed much since RHEL 2.1).

The changes WRT NFS behavior are pretty basic: instead of
synchronizing the [client:export:refcount] triples over the network
(not scalable), we store them on the cluster services' mount points.

These [client:export:refcount] triples are synchronized with
/var/lib/nfs/rmtab immediately following mounting the file system (and
before new requests are allowed to happen).

If the kernel patch didn't exist, you'd probably see "clunfsops:
Invalid argument" errors in the system logs.

Are there any things which might be removing .clumanager on your mount
points?

Comment 7 Lon Hohberger 2004-03-04 16:42:25 UTC
Related kernel panic may explain the problem.

Comment 8 Gary Lerhaupt 2004-03-24 19:31:11 UTC
Does anybody know if this issue has been resolved by rhel3Update2?

Comment 9 Lon Hohberger 2004-04-05 17:23:10 UTC
No, it hasn't; this bug depends on 117433.

Comment 10 Need Real Name 2004-05-30 03:40:31 UTC
What exactly does bug #117433 cover?

I've been trying to figure this out for months now. Here are the 
symptoms...

When using a * for client access on a single export, things work like 
a charm. As soon as I add an additional exports, I get stale file 
handles on all but the last export listed in the configuration.

If I name all clients, for all exports, the stale file handle problem 
goes away. However, as failing the service over causes a "exportfs -
u" to be called followed by a "sleep 1" for each client under each 
export, a 100+ client system can take several minutes (5 to 6) to 
failover 5 exports.

Attention (or even acknowledgment) to this problem would be greatly 
appreciated....

Cheers.

Comment 11 Need Real Name 2004-05-31 04:33:16 UTC
Oops, "sleep 1" was from something else... The rest holds true...

Cheers

Comment 12 Lon Hohberger 2004-06-01 13:45:31 UTC
#117433 covers a panic in the NFS server ioctls on Red Hat Enterprise
Linux v3.  The specific ioctl in question is used during service
start, stop, failover, and relocation.

Given that the clumanager code handling NFS exports hasn't changed
much (except as noted above) since RHEL v2.1, this makes that panic
prime suspect as the cause for the behavior you are witnessing.

As a side note, you might try this:
# cludb -p clurmtabd%pollinterval 1

This alters the time between poll checks of /var/lib/nfs/rmtab, but is
not likely to help given the symptoms.  It tends to help in
environments where mount/umount frequency is high, such as using
autofs to mount clumanager exports on clients.


Comment 13 Lon Hohberger 2004-06-01 14:09:57 UTC
Additionally, please file a ticket with Red Hat Support:

https://www.redhat.com/apps/support/


Comment 14 Lon Hohberger 2004-08-27 17:02:12 UTC
Does this happen on kernel 2.4.21-15.0.4.EL and clumanager 1.2.16?


Comment 15 Lon Hohberger 2004-09-24 13:19:22 UTC
Ok, given the lack of response to the previous question, are netgroups
involved in your cluster configuration?


Comment 16 Need Real Name 2004-09-24 16:13:10 UTC
Ok, given the lack of support for this problem in the past, please 
can the comments...

Now, that said, I've not been able to test the config on our systems 
since we had to reconfigure everything to use a single export instead 
of the original config. Once I get my lab hardware in I can further 
test this issue more in depth.

To answer your question, no, netgroups are not in use.

My turn to can the attitude. I appreciate you still paying attention 
to this....

thanks

Comment 17 Lon Hohberger 2004-09-24 18:58:36 UTC
First and foremost, I sincerely apologize for both the tone of my
previous comment and the lack of support.

At the moment, I am attempting to characterize this all as accurately
as possible, as I have had extreme difficulty reproducing it (read: I
have not been able to... ever :( ), which leads me to believe it's a
timing issue of some sort... We'll see.

The good news is that I am getting more data from other sources as
well via Red Hat Support, so I should be able to narrow it down enough
sometime in the near future to arrive at a general solution.

I would just like to be as precise as I can, so please forgive me if
this seems redundant:

(1) Did all of the clients specified by the wildcard (*) go stale
after a failover, or random ones?

(2) Approximately how many clients are there in total (ie, physical
machines accessing the export)?  If you can give me an exact number,
that's even better.

Thank you in advance.

Comment 18 Need Real Name 2004-09-27 12:38:57 UTC
Lon,

thanx for looking into this. I'm somewhat reassured that this 
behavior is not normal but at the same time very confused as we
are not doing something off the wall at all. I'll try to be more 
clear in the explaination of the problem....

the condistions are these...

1: RHEL U2 
2: MSA1000 SAN (shared storage)
3: Qlogic 2340 FC
4: 2 Nodes RHAS3.0
5: 80+ clients mounting multiple NFS points

When a single NFS export exists, and the clients field is a "*"
it works great with a 1.x second failover. 

If I add any additional exports using an "*", all but the last named
export exporting to "*" goes stale during failover. That is to say if 
I have the equivelent of this...

export1  *(rw,no_root_squash)
export2  *(rw,no_root_squash)

(only config'd with the cluster manager and NOT with /etc/exports)

then export1 clients will go stale and export 2 clients will be fine.
If I have something like this...

export1  client1, client2, client3(rw,no_root_squash)
export2  *(rw,no_root_squash)

then after failover, all will be fine....

if I add any more mount points in either case, then the behavior of 
the moints coincieds accordingly. ie, more named mounts work fine, 
more "*" still fail excepot the last one named..

Now, the way to make it work _every_time_ is to specifically name 
every client machine in the config. for 5 or 6 mount points that is a 
major pain in the A**, even with the import function. It also causes 
failovers to take 5+ minutes which is unacceptable for our needs.

So, to specifically answer your questions above....

(1) Did all of the clients specified by the wildcard (*) go stale
after a failover, or random ones?

A1: As explained above, not all of them, but not random either. I was 
consistantly the last export listed in the config tool, that did not 
go stale. All others did go stale.

(2) Approximately how many clients are there in total (ie, physical
machines accessing the export)?  If you can give me an exact number,
that's even better.

A2: Approximately 80 clients.

I can run some tests if they are not too disruptive to our developers 
but I don't have access to 80+ machines anymore. 

Thanks....



then after failover

Comment 20 Lon Hohberger 2004-09-27 21:35:20 UTC
Bingo.

Comment 21 Lon Hohberger 2004-09-27 23:59:35 UTC
Created attachment 104411 [details]
This oughtta fix it.

Comment 22 Need Real Name 2004-09-28 14:36:16 UTC
Hmm, my previous comment didn't take. Let's just say I was more than 
pleased. :)

I tested the patch against RHAS.U2 and so far it works like a charm. 
I replaced the clumanager package with the patched version, then 
after restarting clumanager, mounted 3 exports on a machine that is 
not known to the cluster and did a clusvcadm -r nfs. A few seconds 
later and the new machine could see the mounts just fine, no stale 
file handles anywhere!

I'll continue to beat on this and make sure all is good and well in 
the world again and follow up here.

Thanks again.

Comment 23 Lon Hohberger 2004-09-28 15:17:20 UTC
Couple of notes re: ESTALE and clurmtabd:

You might want to reduce your poll interval to 1 if you are going to
doing many fast mounts/umounts while power-cycling or other stress-tests:

    cludb -p clurmtabd%pollinterval 1

Because there's no actual synchronization between rpc.mountd and
clurmtabd, there's no way to ensure that all mount lines get
propogated in the event of a failure.

So, clurmtabd polls every 4 seconds by default.  In autofs
environments or environments which have lots of mount/umounts coming
in very often, using a '1' decreases the chance of getting an ESTALE
in the event of a failure (or simulated failure).

Manual service relocation or restarts should not have this window, as
clurmtabd synchronizes its export lists prior to exiting during the
service's stop phase.


Comment 25 Derek Anderson 2004-11-09 17:39:37 UTC
The fix worked for me too.  Ready to ship with RHEL3-U4:

clumanager-1.2.22-2

Comment 26 John Flanagan 2004-12-21 03:40:12 UTC
An advisory has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-491.html


Comment 27 Lon Hohberger 2007-12-21 15:10:26 UTC
Fixing product name.  Clumanager on RHEL3 was part of RHCS3, not RHEL3