During a discussion on the phone the other day, it was mentioned that we should not be trying to invalidate the cache after a write call completes. With a cursory look, I think we'll likely need this patch backported for RHEL4 (commitid is from Trond's nfs-2.6 git tree): commit 70ca88521fc7bee8ef0fc22033a439d4b9a2c70d Author: Trond Myklebust <Trond.Myklebust> Date: Sun Sep 30 15:21:24 2007 -0400 NFS: Fake up 'wcc' attributes to prevent cache invalidation after write NFSv2 and v4 don't offer weak cache consistency attributes on WRITE calls. In NFSv3, returning wcc data is optional. In all cases, we want to prevent the client from invalidating our cached data whenever ->write_done() attempts to update the inode attributes. Signed-off-by: Trond Myklebust <Trond.Myklebust> We'll probably need to come up with some sort of reproducer for this so that we can quantify whether this actually helps things.
For RHEL5, it looks like the above patch went in as part of Peter's big performance update in bug 321111.
I suppose we can start by trying this test. Presumably the first read should be as fast as the "reread" if the patch is working correctly. Sniffing traffic should also tell us (and maybe looking at nfsstat -c). # /opt/iozone/bin/iozone -azc -f /mnt/test/testfile -s 64k -r 64 -i 0 -i 1
Created attachment 296373 [details] patch1 -- NFSv4: Add post-op attributes to NFSv4 write and commit callbacks.
Created attachment 296374 [details] patch2 -- NFS: Clean up inode metadata updates
Created attachment 296376 [details] patch3 -- NFS: Fake up 'wcc' attributes to prevent cache invalidation after write
I've attached a set of 3 experimental patches to implement what Peter suggested the other day in our meeting. This seems to correct the iozone slowdowns in the testing I've done. I'm building a new set of test kernels now and will post them on my people page once they're done...
I've put this on 4.8 proposed, but I'm not opposed to considering this for 4.7 if it's deemed important enough. This of course presumes no regressions show up in testing. That said, it seems like this might cause a few more problems with cache consistency. Consider: client writes to file and has up to date cache client writes to file again and doesn't invalidate cache since we've faked up the wcc preattrs local timestamp is set to mtime of file server races in with a write from local process or another client within the same second client doesn't realize the file has changed ...before this patchset, the client would have probably invalidated the cache after the second write. Given the other possible races due to coarse mtime granularity, this probably isn't a huge issue but its something we should keep in mind.
I've put some patches up on my people page with this patchset: http://people.redhat.com/jlayton/
Steven, Would it be possible for your customer to test kernel-2.6.9-68.16.EL.jtltest.31 someplace non-critical and let us know if this helps them at all? Note that this kernel *also* contains a patch to fix a lockd race that can seriously affect performance as well, so if it does help we'll still likely need to have them verify whether it's this patch that actually helps...
Created attachment 296811 [details] patch1 -- NFSv4: Add post-op attributes to NFSv4 write and commit callbacks Fixed patch1 -- the original one had a bad merge of nops changes for write and commit. So much of this file looks alike that it confuses the patch program and it ends up merging changes into the wrong place. Building a new test kernel now. The problem should only have affected NFSv4, so it's doubtful this will make any difference on NFSv2/3.
Updating PM score.
*** This bug has been marked as a duplicate of bug 427385 ***