Bug 1158120

Summary: Data corruption due to lack of cache revalidation on open
Product: [Community] GlusterFS Reporter: Philip Spencer <pspencer>
Component: md-cacheAssignee: bugs <bugs>
Status: CLOSED WORKSFORME QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: atumball, bugs, lmohanty, ndevos, otira.fuu, pgurusid, pkarampu, rgowdapp, smohan, vbellur
Target Milestone: ---Keywords: Patch, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: gluster-stale-cache, GLUSTERFS_METADATA_INCONSISTENCY
Fixed In Version: glusterfs-6.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1191100 (view as bug list) Environment:
Last Closed: 2019-04-30 10:00:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1191100    
Bug Blocks:    
Attachments:
Description Flags
Patch to check cache validity when opening or locking a file
none
Patch to check cache validity when opening or locking a file (corrected) none

Description Philip Spencer 2014-10-28 15:55:14 UTC
Created attachment 951437 [details]
Patch to check cache validity when opening or locking a file

We've started to see data corruption happening when processes on different hosts are accessing the same SQLite database, coordinating their efforts through proper locking.

We eventually found the following simple scenario that reproduces the problem. Here $D is a directory mounted via a native glusterfs mount, and $OTH is another host that also has $D mounted and can be ssh'd to without a password and reachable quickly (in < 0.3 seconds or so).

$ echo init > $D/f; cat $D/f; ssh $OTHER_HOST "echo second > $D/$f"; cat $D/f
init
init

If one does a directory listing before the final cat, then one gets the expected result:

$ echo init > $D/f; cat $D/f; ssh $OTHER_HOST "echo second > $D/$f"; ls $D > /dev/null; cat $D/f
init
second

Also, if one omits the first cat, or does another modification to the file after it, then one also gets the expected result:

$ echo init > $D/f; ssh $OTHER_HOST "echo second > $D/$f"; cat $D/f
second

There seem to be several different bugs contributing to this problem and I will post separate bug reports and patches for each of them.

This bug report concerns the case when the default volume options are used on mount. Then fopen-keep-cache is used by default, and fuse does not flush its cache from the first "cat $D/f" before doing the second "cat $D/f", and nothing else has told fuse to invalidate its cache. That is why the incorrect data is displayed.

When one does an "ls $D" prior to the second cat, then code in md-cache.c invalidates the cache for every file within the directory whose contents have changed. This then gets passed up to fuse-bridge to tell fuse to flush its page cache.

However, no such check is done when opening a file, and I think it should be, and also should be done when obtaining a lock on part of a file that is already open.

The attached patch is what we are now using to avoid this problem. It may not be implemented in the best way, as I was not sure how to access the iatt data for the file on an open, so the patch, after a successful open callback, winds an fstat fop down the stack, and, on return, checks the iatt values from that. Better would be for the open fop to return the iatt value directly, the way other fops do, since I think the storage backend does a stat anyway when responding to an open call, and if it were to return that stat information there would be no performance hit in using it. As it stands, the patch does incur a slight performance penalty.

Even with this patch, the behaviour is still not always entirely correct due to other bugs which I will report separately.

The patch does a similar thing for lock calls.

The patch is against 3.6.0beta3.

Comment 1 Philip Spencer 2014-10-28 20:36:24 UTC
Created attachment 951526 [details]
Patch to check cache validity when opening or locking a file (corrected)

Oops ... pretty fundamental typo in the first patch that forgot the "op_errno !=0" to skip doing the iatt check if op_errno was nonzero! Would result in locks appearing to succeed (returning 0) when in fact they failed!

Revised patch is attached.

Comment 2 krishnan parthasarathi 2014-11-04 12:44:49 UTC
Philip,

Thanks for opening this issue and posting a patch.  We use Gerrit (review.gluster.org) as our collaborative code review tool. It would be helpful if you could submit your patches there (Ref: http://www.gluster.org/community/documentation/index.php/Simplified_dev_workflow).

Comment 3 Amar Tumballi 2017-08-10 13:04:16 UTC
*** Bug 1191100 has been marked as a duplicate of this bug. ***

Comment 4 Vijay Bellur 2018-11-19 08:01:29 UTC
Raghavendra, Poornima - can you please comment on the validity of this patch now? Thanks!

Comment 5 Yaniv Kaul 2019-04-27 19:34:50 UTC
Status?

Comment 6 Amar Tumballi 2019-04-30 10:00:41 UTC
This is no more relevant as we did multiple fixes in our caching layers and fixed may issues for hosting DB workload on GlusterFS. Closing as WORKSFORME on GlusterFS-6.1 release.