Red Hat Bugzilla – Bug 833564
fuse invalidates local file cache on every file open
Last modified: 2013-09-23 18:36:12 EDT
Description of problem:
By default, fuse invalidates the page cache for an inode on every file open. This is generally inefficient, particularly on read-only or read-mostly workloads.
Version-Release number of selected component (if applicable): 3.3
How reproducible: 100%
Steps to Reproduce:
1. Create a largish file on a glusterfs volume but small enough to fit into local page cache (i.e., 1GB).
2. Repeatedly cat the file. On a single VM, this takes a few seconds for each complete file read.
3. Alternatively, observe the cached memory drop and repopulate on each read (via free or top).
Repeated reads should ideally read file data from the local page cache. This reduces the total read time to the order of milliseconds and eliminates the need for more read requests passed down into gluster and over the network.
The fuse kernel module provides the FOPEN_KEEP_CACHE flag to bypass the invalidation on open. I've prototyped integration of this flag into mount/fuse via the 'fopen-keep-cache' mount option (or glusterfs '--fopen-open-cache' command line option). This change includes extra validation of locally cached inode attributes against newly received attributes to detect remote changes. The end result is replacement of unconditional local cache invalidations with conditional validations when we know the remote side has been modified.
I have run a 16-thread read-only (i.e., object files to local storage) kernel compile job against a single brick volume to test the effects of improved local caching. The gluster brick is an XFS formatted ramdisk. The results, in terms of time to complete, are as follows:
- gluster NFS: 1:47
- Default glusterfs graph: 7:53
- No client-side cache xlators: 9:32
- No client-side cache xlators, fopen-keep-cache enabled: 6:01
- "" + fuse hacks to disable atime* invalidations: 5:19
* - FUSE appears to unconditionally invalidate cached attributes on read operations to pick up atime changes. This assumes the user cares to track atime in the first place. Disabling these invalidations has a further positive effect on this test, but this is something we'll have to try and address in fuse...
The proposed fix has been posted for review:
CHANGE: http://review.gluster.com/3584 (fuse/md-cache: add support for the 'fopen-keep-cache' mount option) merged in master by Anand Avati (firstname.lastname@example.org)
(In reply to comment #3)
> The proposed fix has been posted for review:
Brian, the patch is very comprehensive, I see just a minor issue with it:
if kernel features FUSE < 7.12, then invalidation functionality is not available,
and the invalidate callback will silently become a no-op. AFAICS, with --fopen-keep-cache invalidation is not just hinting the kernel about disposable memory, but correct operation relies on this. So it would be better to fail if --fopen-keep-cache is used with such a kernel.
Hmm, yes --fopen-keep-cache depends on the fuse invalidation functionality. I'll look into fixing that up. Thanks for the review Csaba.
Fix posted to address Csaba's comment: http://review.gluster.com/3690
CHANGE: http://review.gluster.com/3690 (mount/fuse: check for fuse inval notify support when fopen-keep-cache enabled) merged in master by Anand Avati (email@example.com)
Verified the fix on the build:
glusterfs 22.214.171.124rhs built on Aug 26 2013 09:03:20
1. Create 1 x 2 replicate volume . Start the volume
2. Create fuse mount.
3. Create a 512MB file from mount point : dd if=/dev/urandom of=./test_file bs=1M count=512
4. On the client node perform the following:
a. Record the cache memory drop and repopulate on every read.
Execute : free -m -s 1
b. Record the time taken in each read
From mount point execute : for i in `seq 1 10`; do time cat ./test_file > /dev/null ; done
Repeat the above testcase for the following scenarios
Scenario 1: No options while creating fuse mount.
Scenario 2. Use "fopen-keep-cache" mount option while creating fuse mount.
Scenario 1. Time taken to read the file after the first read is almost in the same range as the first read.
Scenario 2. Time taken to read the file after the first read should be very much less when fopen-keep-cache mount option is set.
Scenario 1 :-
root@darrel [Aug-27-2013-14:34:26] >for i in `seq 1 10`; do time cat ./test_file > /dev/null ; sleep 1 ; done
root@darrel [Aug-27-2013-15:01:14] >for i in `seq 1 10`; do time cat ./test_file > /dev/null ; sleep 1 ; done
Bug is fixed. Moving it to Verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
For information on the advisory, and where to find the updated files, follow the link below.
If the solution does not work for you, open a new bug report.