Description of problem: If a file is modified on an NFS server which is exporting that directory to NFS clients, the NFS clients may receive numerous ESTALE errors when attempting to access that file using cached file handles. The number of ESTALES recevied and reported to user space appears related to how many directories above that file may have been removed Version-Release number of selected component (if applicable): all How reproducible: always Steps to Reproduce: 1) NFS mount a share from a client to a server. Lets say we mount server:/tmp on the client at /mnt/tmp. You may want to mount it with a large timeo value to ensure that you get ESTALE on open from the cache file handles 2) on the server in /tmp, create the following file: a/b/c 3) tar the contents of directory a recusively, preserving timestamps on the file. I use the command: tar -c --file ./test.tar a run from the /tmp directory on the server 4) on the client run: cd /mnt/tmp cat a/b/c This should get the file handle for c in the NFS cache. 5) on the server run: rm -rf a tar -x -v --file ./test.tar This will recreate the file tree on the server, and make the cached file handles on the client stale 6) on the client run: cat a/b/c If you are running without the patch that I posted you will of course get an ESTALE error. If you tcpdumped the connection, you will find 3 ESTALE errors returned, in two lookup responses, and in 1 getattr response. Actual results: ESTALES are returned to the calling userspace application Expected results: arguably, since ESTALE is largely a transient error on network file systems, the operation should be silently retried. Additional info:
Created attachment 105200 [details] patch to retry fs operations that result in ESTALE errors Patch to prevent transient ESTALE errors from being reported to user space by retrying them.
A fix for this problem has just been committed to the RHEL3 U5 patch pool this evening (in kernel version 2.4.21-25.1.EL).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-294.html