Bug 174638
Summary: | GFS file system mounted with -o sync and exported over NFS gets data errors on failover | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Henry Harris <henry.harris> |
Component: | gfs | Assignee: | Robert Peterson <rpeterso> |
Status: | CLOSED DUPLICATE | QA Contact: | GFS Bugs <gfs-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | rkenna |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-02-14 17:20:28 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Henry Harris
2005-11-30 23:24:09 UTC
There was an osync bug that kevin fixed earlier. If you can't reproduce this with the latest code, you might try pulling that fix out. It's possible the this is another side effect of that bug. I'd like to get some more information that will help me recreate this scenario in our lab. I was unable to recreate this problem using a three-node cluster, i686 hardware and commands similar to this from my nfs client: dt of=/mnt/bob/test.dt bs=512 passes=1 limit=1G log=/var/log/bob.dt.log In every test I tried, my nfs server failed over to the next server successfully and no errors were logged either by the dt tool nor the systems in the cluster. In each case, dt kept writing happily and my file size kept growing. When my primary nfs server came back, it took over nfs serving again, also without any errors reported by dt. By the way, my cluster is using the latest cluster suite rpms including GFS-kernel-2.6.9-47.1.i686.rpm I would like the following information: 1. I'd like a list of all RPMs on the server nodes and client nodes (i.e. output from rpm -qa). Please put it in an attachment. One copy is okay if the nodes are all basically the same. I'm especially interested in rpm -q GFS-kernel but I'd like to see the rest too. 2. I'd like to get the exact dt command used to recreate the problem. 3. A brief description of the hardware involved with this problem. (For example, Pentium 4 CPU 2.40GHz, 512MB Ram, Brocade fencing, etc.) 4. A brief description of the environment in which the problem occurred. For example, is dt writing a new file or is it overwriting an existing file? Is it copying from the -o sync mount to the non-sync mount or copying from the non-sync mount to the -o sync mount? Are there multiple dt's running on multiple files? Is the nfs server (or client) under a heavy workload when the failure occurs? And so on. 5. Description of how the primary nfs server node was brought down to create the failure (Did you use the /sbin/reboot command and with what parameters? Did you pull the plug? Did you get a kernel panic? Did you tell your power switch to cut the power? Did you do a really long kernel-only op like insmod?) 6. Anything else you think might help me recreate the problem in our lab. Thanks. Bob Peterson I'm fairly certain this bug is the same as bz 178057. The use of the -o sync option probably just changes the timing slightly but does not eliminate the problem. The IO errors reported by the dt tool are probably the result of the NFS3ERR_ACCES described in comment 20 of bz 178057. *** This bug has been marked as a duplicate of 178057 *** |