Red Hat Bugzilla – Bug 433694
GFS2 concurrency/caching issues when using ATA Over Ethernet and a local device
Last modified: 2009-05-27 23:39:28 EDT
Description of problem:
I create a cluster with 2 nodes. Node A shares a block device over AoE (using
vblade) and mounts the gfs2 filesystem on that device. Node B mounts the
block device which was exported by Node A. Filesystem changes (metadata
and/or file content) is not updated instantly when changes are made. A and B
are able to see completly different views of the fs.
Version-Release number of selected component (if applicable):
aoe driver in kernel v22 (2.6.18-53) also v32 (2.6.21)
Steps to Reproduce:
1. create cluster as described
2. open console to Node B. Mount fs and tail a file
3. open console to Node A. Moent fs and append to file
No output on Node B
Lines appended in step 3 should echo on Node B
Both nodes are HP BL20 G2 blades running Centos 5. The block device is a
partition on a RAID 1 on the onboard RAID controller. Node A
mounts /dev/cciss/c0d0p3. Node B mounts /dev/etherd/e1.1
What happens when both nodes mount /dev/etherd/e1.1?
Node A cannot mount /dev/etherd/e1.1 because AoE cannot connect to a device
that is on the same machine that is sharing it.
If I bring in a 3rd server and have that play AoE host with Nodes A and B in a
2 node cluster it appears fine.
GFS2 in RHEL 5.1 is rather old and may well have bugs in it. I'd suggest using
something more uptodate. Ideally waiting for 5.2 should see those problems
solved. If you are just evaluating GFS2, then I'd suggest using Fedora rawhide.
Also the problem that you are seeing might well be related to the way in which
you are exporting the block device. It depends upon the caching in the AOE
software and I don't know enough about it to be sure that it will not reply to
the client before the the data its received from the remote end is on disk.
Unless this is true then it will not be possible to use GFS2 in this way.
It seems like caching in AOE to me. I submitted the bug report under the
direction of folks in the GFS2 IRC channel to bring it to attention.
Ok, I'm going to close this one on the basis that its not a problem in GFS2
since its down to caching in the virtual device that you are using. You might
want to open a feature request against the AoE software to disable caching, but
thats best done upstream.
Also this configuration of exporting a local device and mounting it local and
remote at the same time is not supported (we assume equal access to the storage)
and it will cause problems with fencing which does not expect access to the
storage to disappear from the cluster just because one node got fenced. So its
certainly not recommended as a course of action.