Bug 433694

Summary:	GFS2 concurrency/caching issues when using ATA Over Ethernet and a local device
Product:	Red Hat Enterprise Linux 5	Reporter:	Dan Kelly <dan>
Component:	kernel	Assignee:	Steve Whitehouse <swhiteho>
Status:	CLOSED NOTABUG	QA Contact:	GFS Bugs <gfs-bugs>
Severity:	low	Docs Contact:
Priority:	low
Version:	5.1	CC:	cluster-maint, edamato
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2008-03-03 09:09:57 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dan Kelly 2008-02-20 21:34:02 UTC

Description of problem:
I create a cluster with 2 nodes.  Node A shares a block device over AoE (using 
vblade) and mounts the gfs2 filesystem on that device.  Node B mounts the 
block device which was exported by Node A.  Filesystem changes (metadata 
and/or file content) is not updated instantly when changes are made.  A and B 
are able to see completly different views of the fs.

Version-Release number of selected component (if applicable):
gfs2-tools 0.1.38-1.el5
aoe driver in kernel v22 (2.6.18-53) also v32 (2.6.21)
vblade v14

How reproducible:
frequent

Steps to Reproduce:
1.  create cluster as described
2.  open console to Node B.  Mount fs and tail a file
3.  open console to Node A.  Moent fs and append to file
  
Actual results:
No output on Node B

Expected results:
Lines appended in step 3 should echo on Node B

Additional info:

Comment 1 Dan Kelly 2008-02-20 22:00:27 UTC

Both nodes are HP BL20 G2 blades running Centos 5.  The block device is a 
partition on a RAID 1 on the onboard RAID controller.  Node A 
mounts /dev/cciss/c0d0p3.  Node B mounts /dev/etherd/e1.1

Comment 2 Nate Straz 2008-02-20 22:11:52 UTC

What happens when both nodes mount /dev/etherd/e1.1?

Comment 3 Dan Kelly 2008-02-20 22:22:51 UTC

Node A cannot mount /dev/etherd/e1.1 because AoE cannot connect to a device 
that is on the same machine that is sharing it.

If I bring in a 3rd server and have that play AoE host with Nodes A and B in a 
2 node cluster it appears fine.

Comment 4 Steve Whitehouse 2008-02-21 16:41:09 UTC

GFS2 in RHEL 5.1 is rather old and may well have bugs in it. I'd suggest using
something more uptodate. Ideally waiting for 5.2 should see those problems
solved. If you are just evaluating GFS2, then I'd suggest using Fedora rawhide.

Also the problem that you are seeing might well be related to the way in which
you are exporting the block device. It depends upon the caching in the AOE
software and I don't know enough about it to be sure that it will not reply to
the client before the the data its received from the remote end is on disk.
Unless this is true then it will not be possible to use GFS2 in this way.

Comment 5 Dan Kelly 2008-02-21 17:06:38 UTC

It seems like caching in AOE to me.  I submitted the bug report under the
direction of folks in the GFS2 IRC channel to bring it to attention.

Comment 6 Steve Whitehouse 2008-03-03 09:06:50 UTC

Ok, I'm going to close this one on the basis that its not a problem in GFS2
since its down to caching in the virtual device that you are using. You might
want to open a feature request against the AoE software to disable caching, but
thats best done upstream.

Also this configuration of exporting a local device and mounting it local and
remote at the same time is not supported (we assume equal access to the storage)
and it will cause problems with fencing which does not expect access to the
storage to disappear from the cluster just because one node got fenced. So its
certainly not recommended as a course of action.