Bug 833564 - fuse invalidates local file cache on every file open
fuse invalidates local file cache on every file open
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: fuse (Show other bugs)
Unspecified Linux
high Severity unspecified
: ---
: ---
Assigned To: Brian Foster
Depends On:
  Show dependency treegraph
Reported: 2012-06-19 15:18 EDT by Brian Foster
Modified: 2013-09-23 18:36 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-09-23 18:36:12 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Brian Foster 2012-06-19 15:18:06 EDT
Description of problem:

By default, fuse invalidates the page cache for an inode on every file open. This is generally inefficient, particularly on read-only or read-mostly workloads.

Version-Release number of selected component (if applicable): 3.3

How reproducible: 100%

Steps to Reproduce:
1. Create a largish file on a glusterfs volume but small enough to fit into local page cache (i.e., 1GB).
2. Repeatedly cat the file. On a single VM, this takes a few seconds for each complete file read.
3. Alternatively, observe the cached memory drop and repopulate on each read (via free or top).

Expected results:

Repeated reads should ideally read file data from the local page cache. This reduces the total read time to the order of milliseconds and eliminates the need for more read requests passed down into gluster and over the network.
Comment 2 Brian Foster 2012-06-19 15:40:55 EDT
The fuse kernel module provides the FOPEN_KEEP_CACHE flag to bypass the invalidation on open. I've prototyped integration of this flag into mount/fuse via the 'fopen-keep-cache' mount option (or glusterfs '--fopen-open-cache' command line option). This change includes extra validation of locally cached inode attributes against newly received attributes to detect remote changes. The end result is replacement of unconditional local cache invalidations with conditional validations when we know the remote side has been modified.

I have run a 16-thread read-only (i.e., object files to local storage) kernel compile job against a single brick volume to test the effects of improved local caching. The gluster brick is an XFS formatted ramdisk. The results, in terms of time to complete, are as follows:

- gluster NFS: 1:47
- Default glusterfs graph: 7:53
- No client-side cache xlators: 9:32
- No client-side cache xlators, fopen-keep-cache enabled: 6:01
- "" + fuse hacks to disable atime* invalidations: 5:19

* - FUSE appears to unconditionally invalidate cached attributes on read operations to pick up atime changes. This assumes the user cares to track atime in the first place. Disabling these invalidations has a further positive effect on this test, but this is something we'll have to try and address in fuse...
Comment 3 Brian Foster 2012-07-13 11:09:24 EDT
The proposed fix has been posted for review:

Comment 4 Vijay Bellur 2012-07-13 12:46:17 EDT
CHANGE: http://review.gluster.com/3584 (fuse/md-cache: add support for the 'fopen-keep-cache' mount option) merged in master by Anand Avati (avati@redhat.com)
Comment 5 Csaba Henk 2012-07-17 10:05:59 EDT
(In reply to comment #3)
> The proposed fix has been posted for review:
> http://review.gluster.com/3584

Brian, the patch is very comprehensive, I see just a minor issue with it:
if kernel features FUSE < 7.12, then invalidation functionality is not available,
and the invalidate callback will silently become a no-op. AFAICS, with --fopen-keep-cache invalidation is not just hinting the kernel about disposable memory, but correct operation relies on this. So it would be better to fail if --fopen-keep-cache is used with such a kernel.
Comment 6 Brian Foster 2012-07-17 18:15:14 EDT
Hmm, yes --fopen-keep-cache depends on the fuse invalidation functionality. I'll look into fixing that up. Thanks for the review Csaba.
Comment 7 Brian Foster 2012-07-18 09:40:28 EDT
Fix posted to address Csaba's comment: http://review.gluster.com/3690
Comment 8 Vijay Bellur 2012-07-18 13:52:31 EDT
CHANGE: http://review.gluster.com/3690 (mount/fuse: check for fuse inval notify support when fopen-keep-cache enabled) merged in master by Anand Avati (avati@redhat.com)
Comment 9 spandura 2013-08-27 08:21:54 EDT
Verified the fix on the build:
glusterfs built on Aug 26 2013 09:03:20

Test Case:
1. Create 1 x 2 replicate volume . Start the volume

2. Create fuse mount. 

3. Create a 512MB file from mount point : dd if=/dev/urandom of=./test_file bs=1M count=512

4. On the client node perform the following:

a. Record the cache memory drop and repopulate on every read. 
Execute : free -m -s 1

b. Record the time taken in each read
From mount point execute : for i in `seq  1 10`; do time cat ./test_file > /dev/null ; done

Repeat the above testcase for the following scenarios 
Scenario 1:  No options while creating fuse mount.

Scenario 2. Use "fopen-keep-cache" mount option while creating fuse mount.

Expected Result:-
Scenario 1. Time taken to read the file after the first read is almost in the same range as the first read. 

Scenario 2. Time taken to read the file after the first read should be very much less when fopen-keep-cache mount option is set.

Actual Result:-
Scenario 1 :- 
root@darrel [Aug-27-2013-14:34:26] >for i in `seq  1 10`; do time cat ./test_file > /dev/null ; sleep 1 ; done

real	0m1.113s
user	0m0.006s
sys	0m0.338s

real	0m1.503s
user	0m0.013s
sys	0m0.534s

real	0m1.123s
user	0m0.002s
sys	0m0.414s

real	0m1.445s
user	0m0.004s
sys	0m0.535s

real	0m1.055s
user	0m0.006s
sys	0m0.338s

real	0m1.194s
user	0m0.004s
sys	0m0.361s

real	0m1.223s
user	0m0.008s
sys	0m0.443s

real	0m1.052s
user	0m0.007s
sys	0m0.371s

real	0m1.207s
user	0m0.006s
sys	0m0.400s

real	0m1.064s
user	0m0.007s
sys	0m0.392s

Scenario 2:- 
root@darrel [Aug-27-2013-15:01:14] >for i in `seq  1 10`; do time cat ./test_file > /dev/null ; sleep 1 ; done

real	0m1.196s
user	0m0.005s
sys	0m0.347s

real	0m0.179s
user	0m0.002s
sys	0m0.173s

real	0m0.148s
user	0m0.003s
sys	0m0.141s

real	0m0.143s
user	0m0.000s
sys	0m0.139s

real	0m0.146s
user	0m0.002s
sys	0m0.140s

real	0m0.147s
user	0m0.000s
sys	0m0.141s

real	0m0.144s
user	0m0.001s
sys	0m0.139s

real	0m0.147s
user	0m0.001s
sys	0m0.141s

real	0m0.147s
user	0m0.000s
sys	0m0.143s

real	0m0.146s
user	0m0.001s
sys	0m0.141s

Bug is fixed. Moving it to Verified state.
Comment 10 Scott Haines 2013-09-23 18:36:12 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.