Description of problem: I create a cluster with 2 nodes. Node A shares a block device over AoE (using vblade) and mounts the gfs2 filesystem on that device. Node B mounts the block device which was exported by Node A. Filesystem changes (metadata and/or file content) is not updated instantly when changes are made. A and B are able to see completly different views of the fs. Version-Release number of selected component (if applicable): gfs2-tools 0.1.38-1.el5 aoe driver in kernel v22 (2.6.18-53) also v32 (2.6.21) vblade v14 How reproducible: frequent Steps to Reproduce: 1. create cluster as described 2. open console to Node B. Mount fs and tail a file 3. open console to Node A. Moent fs and append to file Actual results: No output on Node B Expected results: Lines appended in step 3 should echo on Node B Additional info:
Both nodes are HP BL20 G2 blades running Centos 5. The block device is a partition on a RAID 1 on the onboard RAID controller. Node A mounts /dev/cciss/c0d0p3. Node B mounts /dev/etherd/e1.1
What happens when both nodes mount /dev/etherd/e1.1?
Node A cannot mount /dev/etherd/e1.1 because AoE cannot connect to a device that is on the same machine that is sharing it. If I bring in a 3rd server and have that play AoE host with Nodes A and B in a 2 node cluster it appears fine.
GFS2 in RHEL 5.1 is rather old and may well have bugs in it. I'd suggest using something more uptodate. Ideally waiting for 5.2 should see those problems solved. If you are just evaluating GFS2, then I'd suggest using Fedora rawhide. Also the problem that you are seeing might well be related to the way in which you are exporting the block device. It depends upon the caching in the AOE software and I don't know enough about it to be sure that it will not reply to the client before the the data its received from the remote end is on disk. Unless this is true then it will not be possible to use GFS2 in this way.
It seems like caching in AOE to me. I submitted the bug report under the direction of folks in the GFS2 IRC channel to bring it to attention.
Ok, I'm going to close this one on the basis that its not a problem in GFS2 since its down to caching in the virtual device that you are using. You might want to open a feature request against the AoE software to disable caching, but thats best done upstream. Also this configuration of exporting a local device and mounting it local and remote at the same time is not supported (we assume equal access to the storage) and it will cause problems with fencing which does not expect access to the storage to disappear from the cluster just because one node got fenced. So its certainly not recommended as a course of action.