Red Hat Bugzilla – Bug 157266
RHEL 4 needs ESTALE pathname resolution logic
Last modified: 2010-10-21 22:59:55 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2
Description of problem:
A patch to retry pathname resolution if an ESTALE error occurs is now in the
2.6 mainline (2.6.12-rc1 or rc2). This patch should be backported to RHEL 4's
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. mount any NFS server
2. "ls" a directory on that mount
3. go to the server and remove the directory, then recreate it
4. "ls" again on the client
Actual Results: ls: '.': Stale file handle
Expected Results: 'ls' should have returned the contents of the new directory.
I would provide the changeset information for the specific 2.6 patch, but we
aren't using BK anymore.
Was the patch post to a list or is it on a website somewhere?
This should be fixed in RHEL4-U2 kernel
Still happens with kernel 2.6.9-34.EL
I tested this on two RHEL4 U4 NFS clients with a NetApp filer for the NFS server
and this bug is still present in U4 (kernel 2.6.9-42.EL).
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
I inserted a dump_stack() in __nfs_revalidate_inode when the getattr fails with
an ESTALE. Here's the result:
[<e064196d>] __nfs_revalidate_inode+0x1d3/0x42e [nfs]
[<e06411fc>] nfs_getattr+0x45/0x69 [nfs]
[<e06411b7>] nfs_getattr+0x0/0x69 [nfs]
...so we apparently need to handle the ESTALE gracefully somewhere along here.
Confirmed that problem exists in kernel.org 2.6.15 kernels as well. Looking now
to see if it's ever actually been fixed.
Also exists in 2.6.18-rc7, and in Linville's wireless-dev git tree (which seems
to be ~2.6.19-rc4 for VFS/NFS stuff).
Created attachment 140933 [details]
patch to retry lookup if the stat call fails
Here's what I think is happening.
The first lookup occurs and populates the dcache/icache. The dir is then
removed and recreated on the server, which changes the filehandle. The client
then makes a stat() call which ends up in vfs_stat. vfs_stat uses
user_path_walk() to populate the nameidata struct, but because the attribute
cache hasn't timed out, it presumes that the info it has is already good.
vfs_stat then calls vfs_getattr, which calls nfs_getattr, which forces an on
the wire lookup because the filesystem is mounted with atimes enabled. This
generates the ESTALE, and invalidates the inode.
The attached patch is one way to fix it. If user_path_walk() works, but the
next call ends up with a -ESTALE, then we can presume the above has occurred
and force a retry on the lookup. If that also fails with an -ESTALE then we'd
return with that error.
We'd probably also want to fix up vfs_lstat too (and maybe some other
functions, I'll troll through them).
Another possibility might be to set the LOOKUP_REVAL flag in the original
lookup to force a true on-the-wire revalidation, but that would probably mean
more otw operations overall.
Peter, what do you think about this approach?
It seems like something like this will be required, but this will
require some careful analysis and checking to ensure that it will
not loop indefinitely.
We can't set LOOKUP_REVAL on the original lookup. The performance cost
would be too high.
The set of system calls which need to be checked is the entire set
which take pathnames as their arguments.
Created attachment 140934 [details]
updated patch, fix vfs_lstat the same way
This patch fixes vfs_lstat in the same way. I don't see any other calls that
would need to be fixed this way right offhand.
Also, my description of the problem was a little off. nfs_getattr doesn't force
an otw lookup, rather it forces an otw getattr call.
These patches assume that it's not possible for user_path_walk to repeatedly
succeed but to repeatedly get an ESTALE from the vfs_getattr call. Such a
situation would make an infinite loop, obviously, but I don't see how that
_All_ system calls which take pathnames as an argument _must_ be
fixed in this fashion.
The trick is to ensure that the lookup process eventually either
fails with ENOENT or continues to purge stale caches until the
ESTALE is discovered by validating the attributes on the root of
the file system. Then, the error, ENOENT, should be returned
instead of ESTALE.
Without this trick, if the root of a mount point goes stale,
then the client will loop forever.
The code also needs to take the current working directory of the process
into account as well. If the current working directory becomes stale,
then a relative pathname lookup may also cause an infinite loop if this
situation is handled correctly.
Ok, this issue is a quite a bit more complex than I had originally thought. :-)
Am I wrong that most of the code you describe is already present in
__link_path_walk? I thought that it will revalidate dentries all the way back to
the root of the fs when it hit an ESTALE (via the link_path_walk wrapper).
Though, I'm not certain how it handles the situation when the root of the fs is
If that is already handled, then we'd just need to patch up the syscalls a'la
the patch above, correct?
Here's a list of syscalls from a recent git kernel that I identified that seem
to take pathnames as args. I built this list quickly, so it might be incomplete.
Some of these are just wrappers around common functions so the amount of work
might not be so bad. Still, with so many, perhaps it would be better to do this
Created attachment 141049 [details]
patch against upstream, fix vfs_stat, vfs_lstat and sys_readlinkat
Here's a slightly updated patch. Since the same problem exists upstream, it's
probably best to work on fixing it there first. This patches up vfs_stat,
vfs_lstat and vfs_readlinkat (since they all exist in fs/stat.c).
Basically, I'm doing the same thing as before, but also making sure that
LOOKUP_REVAL is set when we redo the lookup on an ESTALE. My thinking is that
the proper place to put the ESTALE handling (if it doesn't already exist there)
is in the pathname resolution, so fixing up the syscalls to just reenter that
code and force revalidation is probably what we want.
Thank you for the constructing that list of system calls which may need
to be modified!
The attached patch is a little buggy but seems along the right lines.
We need to ensure that LOOKUP_REVAL does the right thing so that the
pathname lookup portion works right. We also need to ensure that it
is not possible for the kernel to loop forever if it is not possible
Created attachment 141073 [details]
new patch for vfs_stat, vfs_lstat and sys_readlinkat
Doh! Yeah, I see where I missed the label in the lstat call. I've not tested
any of the newer patches, so these shouldn't be taken too literally, more
This patch adds the missing label, and also adds a retried flag. If we retry
the lookup and still end up in the same codepath, then we won't retry again
(and hence won't loop forever). Is this what you were more meaning by ensuring
that we don't loop forever?
Also, I checked and the only place that seems to check the LOOKUP_REVAL flag
no other filesystems seem to care about it. Not sure if that would be big deal
I am mostly concerned about the pathname lookups which don't actually
require any over the wire lookups, like looking up "." or "".
*** Bug 401551 has been marked as a duplicate of this bug. ***
Updating PM score.
Development Management has reviewed and declined this request. You may appeal
this decision by reopening this request.