Bug 114388
Summary: | Stale file handles under failover conditions | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Need Real Name <cjk> | ||||
Component: | clumanager | Assignee: | Lon Hohberger <lhh> | ||||
Status: | CLOSED ERRATA | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3 | CC: | cluster-maint, riel, tao, vanhoof | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-11-09 17:39:37 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 117433 | ||||||
Bug Blocks: | 131576 | ||||||
Attachments: |
|
Description
Need Real Name
2004-01-27 15:47:24 UTC
Don't forget to open a ticket w/ Red Hat support, if you have not already done so. Could you check: /var/lib/nfs/rmtab [from initial startup host] and: <device_mount_point>/.clumanager/rmtab They should both contain the entry for your client machine. I've checked on a box that has the clients listed by name rather than having a * in the client id place and the names are indeed the same. This cluster however DOES work. It started working when I specified the names instead of a * for the exports. Now that it all works, I put a star in addition to those already named machines and I see the ip address of the client machines and it seems to work for those clients that get access via the *. On a cluster that had just the *, the entries did not match. Unfortunately I cannot revert the cluster back to the non-working state since it has been deemed critical to the project that it supports. Any insight as to why just having a * would cause such problems? Not off the top of my head. I'll look at the code rq. The code looks correct (actually, it hasn't changed much since RHEL 2.1). The changes WRT NFS behavior are pretty basic: instead of synchronizing the [client:export:refcount] triples over the network (not scalable), we store them on the cluster services' mount points. These [client:export:refcount] triples are synchronized with /var/lib/nfs/rmtab immediately following mounting the file system (and before new requests are allowed to happen). If the kernel patch didn't exist, you'd probably see "clunfsops: Invalid argument" errors in the system logs. Are there any things which might be removing .clumanager on your mount points? Related kernel panic may explain the problem. Does anybody know if this issue has been resolved by rhel3Update2? No, it hasn't; this bug depends on 117433. What exactly does bug #117433 cover? I've been trying to figure this out for months now. Here are the symptoms... When using a * for client access on a single export, things work like a charm. As soon as I add an additional exports, I get stale file handles on all but the last export listed in the configuration. If I name all clients, for all exports, the stale file handle problem goes away. However, as failing the service over causes a "exportfs - u" to be called followed by a "sleep 1" for each client under each export, a 100+ client system can take several minutes (5 to 6) to failover 5 exports. Attention (or even acknowledgment) to this problem would be greatly appreciated.... Cheers. Oops, "sleep 1" was from something else... The rest holds true... Cheers #117433 covers a panic in the NFS server ioctls on Red Hat Enterprise Linux v3. The specific ioctl in question is used during service start, stop, failover, and relocation. Given that the clumanager code handling NFS exports hasn't changed much (except as noted above) since RHEL v2.1, this makes that panic prime suspect as the cause for the behavior you are witnessing. As a side note, you might try this: # cludb -p clurmtabd%pollinterval 1 This alters the time between poll checks of /var/lib/nfs/rmtab, but is not likely to help given the symptoms. It tends to help in environments where mount/umount frequency is high, such as using autofs to mount clumanager exports on clients. Additionally, please file a ticket with Red Hat Support: https://www.redhat.com/apps/support/ Does this happen on kernel 2.4.21-15.0.4.EL and clumanager 1.2.16? Ok, given the lack of response to the previous question, are netgroups involved in your cluster configuration? Ok, given the lack of support for this problem in the past, please can the comments... Now, that said, I've not been able to test the config on our systems since we had to reconfigure everything to use a single export instead of the original config. Once I get my lab hardware in I can further test this issue more in depth. To answer your question, no, netgroups are not in use. My turn to can the attitude. I appreciate you still paying attention to this.... thanks First and foremost, I sincerely apologize for both the tone of my previous comment and the lack of support. At the moment, I am attempting to characterize this all as accurately as possible, as I have had extreme difficulty reproducing it (read: I have not been able to... ever :( ), which leads me to believe it's a timing issue of some sort... We'll see. The good news is that I am getting more data from other sources as well via Red Hat Support, so I should be able to narrow it down enough sometime in the near future to arrive at a general solution. I would just like to be as precise as I can, so please forgive me if this seems redundant: (1) Did all of the clients specified by the wildcard (*) go stale after a failover, or random ones? (2) Approximately how many clients are there in total (ie, physical machines accessing the export)? If you can give me an exact number, that's even better. Thank you in advance. Lon, thanx for looking into this. I'm somewhat reassured that this behavior is not normal but at the same time very confused as we are not doing something off the wall at all. I'll try to be more clear in the explaination of the problem.... the condistions are these... 1: RHEL U2 2: MSA1000 SAN (shared storage) 3: Qlogic 2340 FC 4: 2 Nodes RHAS3.0 5: 80+ clients mounting multiple NFS points When a single NFS export exists, and the clients field is a "*" it works great with a 1.x second failover. If I add any additional exports using an "*", all but the last named export exporting to "*" goes stale during failover. That is to say if I have the equivelent of this... export1 *(rw,no_root_squash) export2 *(rw,no_root_squash) (only config'd with the cluster manager and NOT with /etc/exports) then export1 clients will go stale and export 2 clients will be fine. If I have something like this... export1 client1, client2, client3(rw,no_root_squash) export2 *(rw,no_root_squash) then after failover, all will be fine.... if I add any more mount points in either case, then the behavior of the moints coincieds accordingly. ie, more named mounts work fine, more "*" still fail excepot the last one named.. Now, the way to make it work _every_time_ is to specifically name every client machine in the config. for 5 or 6 mount points that is a major pain in the A**, even with the import function. It also causes failovers to take 5+ minutes which is unacceptable for our needs. So, to specifically answer your questions above.... (1) Did all of the clients specified by the wildcard (*) go stale after a failover, or random ones? A1: As explained above, not all of them, but not random either. I was consistantly the last export listed in the config tool, that did not go stale. All others did go stale. (2) Approximately how many clients are there in total (ie, physical machines accessing the export)? If you can give me an exact number, that's even better. A2: Approximately 80 clients. I can run some tests if they are not too disruptive to our developers but I don't have access to 80+ machines anymore. Thanks.... then after failover Bingo. Created attachment 104411 [details]
This oughtta fix it.
Hmm, my previous comment didn't take. Let's just say I was more than pleased. :) I tested the patch against RHAS.U2 and so far it works like a charm. I replaced the clumanager package with the patched version, then after restarting clumanager, mounted 3 exports on a machine that is not known to the cluster and did a clusvcadm -r nfs. A few seconds later and the new machine could see the mounts just fine, no stale file handles anywhere! I'll continue to beat on this and make sure all is good and well in the world again and follow up here. Thanks again. Couple of notes re: ESTALE and clurmtabd: You might want to reduce your poll interval to 1 if you are going to doing many fast mounts/umounts while power-cycling or other stress-tests: cludb -p clurmtabd%pollinterval 1 Because there's no actual synchronization between rpc.mountd and clurmtabd, there's no way to ensure that all mount lines get propogated in the event of a failure. So, clurmtabd polls every 4 seconds by default. In autofs environments or environments which have lots of mount/umounts coming in very often, using a '1' decreases the chance of getting an ESTALE in the event of a failure (or simulated failure). Manual service relocation or restarts should not have this window, as clurmtabd synchronizes its export lists prior to exiting during the service's stop phase. The fix worked for me too. Ready to ship with RHEL3-U4: clumanager-1.2.22-2 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-491.html Fixing product name. Clumanager on RHEL3 was part of RHCS3, not RHEL3 |