Description of problem: RHEL nfs clients don't honor nsec portion of file timestamp fields. RHEL nfs client can miss file updates, and remain out-of-sync. Version-Release number of selected component (if applicable): RHEL 3 (also report under RHL 8, likely present in AS2.1) How reproducible: Customer reports that it's load/timing related. Will post a test case if/when available. Windows client updates a file via multiple write operations in under 1 second on a netapp filer (filesize remains constant), while rhel nfs client reads same files. (Yes, it has been observed that file locking would correct this problem. The issue is that once out of sync, the linux nfs client *stays* out of sync.) Steps to Reproduce: 1. Update file multiple times in under 1 second 2. Trigger linux nfs client read between writes operations 3. Actual results: Linux nfs client gets out of sync with the nfs server and stays out of sync. Subsequently touching the file (from a third nfs client) brings the rhel nfs client's cache back into sync. Expected results: Due to asynchronous read/write operations it is expected that the linux nfs client will occasionally be out of sync for a short period. It should not remain out of sync, though. Additional info: Escalated per Riel's request.
This was orignially reported in IT 108088
This shows the same symptoms as reported in BZ 108088. The customer reports that it's reproducable with v2 as well as v3.
A first cut at honoring sub-second mtime updates. (Now if I could reproduce the problem I could test this...) --- fs/nfs/inode.c.orig 2004-01-19 17:00:41.000000000 -0800 +++ fs/nfs/inode.c 2004-01-20 00:52:30.000000000 -0800 @@ -1101,6 +1101,8 @@ /* Ugh... */ if (cdif == 0 && fattr->size > NFS_CACHE_ISIZE(inode)) goto out_valid; + if (fattr->mtime > NFS_CACHE_MTIME(inode)) + goto out_valid; return -1; out_valid: return 0;
Does this increase the overall operation count when running something like the connectathon test suite? If so, by what percentage?
Well the patch in comment #3 has no effect on the traffic for the simply reason its never executed, at least when running the connectathon test suite. So its not clear to me, that this patch will help at all with the client seeing changes on the server in a more timely bases. The main issue is a file is being updated on the server and the clients are not noticing it in a timely bases. Looking over all that has been said, I don't see where changing the default cache timeouts (i.e. acregmax, acdirmax oractimeo) have been tried to try and solve this problem. The defaults range any were from 60 to 30 seconds, knocking them down to say 30 or 20 second would help the issue....
*** Bug 156307 has been marked as a duplicate of this bug. ***
Undid dup from bug 156307, since that was against RHEL2.1. Also removed this one from U5 blocker list, since U5 is now closed.