Description of problem: Possibly a GFS bug, possibly a FSCache/CacheFilesd bug. When mounting a GFS volume via FSCache: mount -t nfs -overs=3,fsc,noacl,noatime <gNFSd host>:/<path> <mount dir> /etc/cachefilesd.conf: dir /data/fscache tag default_cache brun 10% bcull 7% bstop 3% frun 10% fcull 7% fstop 3% Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Compile kernel w/ FSCache + Cachefilesd CONFIG_FSCACHE=m CONFIG_FSCACHE_STATS=y CONFIG_CACHEFILES=m CONFIG_NFS_FSCACHE=y 2. Configure cachefilesd.conf file like so: /etc/cachefilesd.conf: dir /data/fscache <---- In our setup this is an SSD tag default_cache brun 10% bcull 7% bstop 3% frun 10% fcull 7% fstop 3% 3. Mount a gluster volume like so: mount -t nfs -overs=3,fsc,noacl,noatime <gNFSd host>:/<path> /mnt/mycachedmount 4. Write some data to the mount: for i in {1..20}; do dd if=/dev/urandom of=/mnt/mycachedmount/testfile$i bs=1M count=100; done 5. For good measure, umount/remount the mount to dump any caches. 6. Read the data back from the mount: for f in /mnt/mycachedmount/*; do echo $f; dd if=$f of=/dev/null bs=1M; done Repeat the reading a few times to observe results. Actual results: Inconsistent read speeds are observed, and TCPDump shows lots of (NFS) read requests over the wire. Expected results: After the first read loop, no read requests should be observed and the read throughput should be approximate to the performance of the local cache device (/data/fscache in this example). Additional info: I can only speculate that this behavior is due to gNFSd sending back inconsistent atime (maybe mtime?) responses which are consumed by FSCache causing false "misses" on the cache.
Here's a paste of what a actual run from our hosts (filenames redacted): 100+1 records in 100+1 records out 1004991432 bytes (1.0 GB) copied, 0.323953 s, 3.1 GB/s <---- Expected speeds from local SSD 100+1 records in 100+1 records out 1005020921 bytes (1.0 GB) copied, 0.181589 s, 5.5 GB/s 100+1 records in 100+1 records out 1004987708 bytes (1.0 GB) copied, 0.19944 s, 5.0 GB/s 100+1 records in 100+1 records out 1004889584 bytes (1.0 GB) copied, 0.219334 s, 4.6 GB/s 100+1 records in 100+1 records out 1004882070 bytes (1.0 GB) copied, 36.7292 s, 27.4 MB/s <---- Read requests being send to gNFSd slowing things down 100+1 records in 100+1 records out 1004833208 bytes (1.0 GB) copied, 0.266268 s, 3.8 GB/s 100+1 records in 100+1 records out 1004874470 bytes (1.0 GB) copied, 41.6095 s, 24.2 MB/s 13+1 records in 13+1 records out 133198502 bytes (133 MB) copied, 5.34731 s, 24.9 MB/s 100+1 records in 100+1 records out *SNIP*
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5. This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs". If there is no response by the end of the month, this bug will get automatically closed.
It's not clear this bug is actually a gNFSd bug, it's quite likely it's a FSCache bug, so could we get the FSCache maintainer to take a look as well? I'll see if it repros on 3.6/3.7, but I highly suspect it will.
GlusterFS 3.4.x has reached end-of-life. If this bug still exists in a later release please reopen this and change the version or open a new bug.
Fairly certain this is not fixed in 3.6/3.7.
I looked into FS-Cache a while ago, and if I remember correctly there are different issues when using gNFS. The attribute times used by FS-Cache for invalidation can differ when reading from different subvolumes/bricks: - Precision of times, atime/mtime is set in seconds (?), but returned in nanosecond (?) precision. The least significant part of the [am]time differs. - ctime can not be set and can get modified by Gluster internal services. There is an idea to maintain the ctime in an extended attribute (bug 1318493).
Lot of time since no activity on this bug. We have either fixed it already or it is mostly not critical anymore! Please re-open the bug if the issue is burning for you, or you want to take the bug to closure with fixes.