Bug 433694 - GFS2 concurrency/caching issues when using ATA Over Ethernet and a local device
Summary: GFS2 concurrency/caching issues when using ATA Over Ethernet and a local device
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.1
Hardware: All
OS: Linux
Target Milestone: ---
: ---
Assignee: Steve Whitehouse
QA Contact: GFS Bugs
Depends On:
TreeView+ depends on / blocked
Reported: 2008-02-20 21:34 UTC by Dan Kelly
Modified: 2009-05-28 03:39 UTC (History)
2 users (show)

Clone Of:
Last Closed: 2008-03-03 09:09:57 UTC

Attachments (Terms of Use)

Description Dan Kelly 2008-02-20 21:34:02 UTC
Description of problem:
I create a cluster with 2 nodes.  Node A shares a block device over AoE (using 
vblade) and mounts the gfs2 filesystem on that device.  Node B mounts the 
block device which was exported by Node A.  Filesystem changes (metadata 
and/or file content) is not updated instantly when changes are made.  A and B 
are able to see completly different views of the fs.

Version-Release number of selected component (if applicable):
gfs2-tools 0.1.38-1.el5
aoe driver in kernel v22 (2.6.18-53) also v32 (2.6.21)
vblade v14

How reproducible:

Steps to Reproduce:
1.  create cluster as described
2.  open console to Node B.  Mount fs and tail a file
3.  open console to Node A.  Moent fs and append to file
Actual results:
No output on Node B

Expected results:
Lines appended in step 3 should echo on Node B

Additional info:

Comment 1 Dan Kelly 2008-02-20 22:00:27 UTC
Both nodes are HP BL20 G2 blades running Centos 5.  The block device is a 
partition on a RAID 1 on the onboard RAID controller.  Node A 
mounts /dev/cciss/c0d0p3.  Node B mounts /dev/etherd/e1.1

Comment 2 Nate Straz 2008-02-20 22:11:52 UTC
What happens when both nodes mount /dev/etherd/e1.1?

Comment 3 Dan Kelly 2008-02-20 22:22:51 UTC
Node A cannot mount /dev/etherd/e1.1 because AoE cannot connect to a device 
that is on the same machine that is sharing it.

If I bring in a 3rd server and have that play AoE host with Nodes A and B in a 
2 node cluster it appears fine.

Comment 4 Steve Whitehouse 2008-02-21 16:41:09 UTC
GFS2 in RHEL 5.1 is rather old and may well have bugs in it. I'd suggest using
something more uptodate. Ideally waiting for 5.2 should see those problems
solved. If you are just evaluating GFS2, then I'd suggest using Fedora rawhide.

Also the problem that you are seeing might well be related to the way in which
you are exporting the block device. It depends upon the caching in the AOE
software and I don't know enough about it to be sure that it will not reply to
the client before the the data its received from the remote end is on disk.
Unless this is true then it will not be possible to use GFS2 in this way.

Comment 5 Dan Kelly 2008-02-21 17:06:38 UTC
It seems like caching in AOE to me.  I submitted the bug report under the
direction of folks in the GFS2 IRC channel to bring it to attention.

Comment 6 Steve Whitehouse 2008-03-03 09:06:50 UTC
Ok, I'm going to close this one on the basis that its not a problem in GFS2
since its down to caching in the virtual device that you are using. You might
want to open a feature request against the AoE software to disable caching, but
thats best done upstream.

Also this configuration of exporting a local device and mounting it local and
remote at the same time is not supported (we assume equal access to the storage)
and it will cause problems with fencing which does not expect access to the
storage to disappear from the cluster just because one node got fenced. So its
certainly not recommended as a course of action.

Note You need to log in before you can comment on or make changes to this bug.