Bug 157266
Summary: | RHEL 4 needs ESTALE pathname resolution logic | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Chuck Lever <cel> |
Component: | kernel | Assignee: | Peter Staubach <staubach> |
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.0 | CC: | bnocera, dmair, fleite, hgarcia, jlayton, jmbastia, jplans, steved, syeghiay, tao, xdl-redhat-bugzilla |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-01-27 16:49:28 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 176344, 198694, 391511, 409961, 461297 | ||
Attachments: |
Description
Chuck Lever
2005-05-09 22:43:05 UTC
Was the patch post to a list or is it on a website somewhere? http://client.linux-nfs.org/Linux-2.6.x/2.6.11/linux-2.6.11-34-stale_retry.dif http://client.linux-nfs.org/Linux-2.6.x/2.6.11/linux-2.6.11-35-stale2.dif This should be fixed in RHEL4-U2 kernel Still happens with kernel 2.6.9-34.EL I tested this on two RHEL4 U4 NFS clients with a NetApp filer for the NFS server and this bug is still present in U4 (kernel 2.6.9-42.EL). This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. I inserted a dump_stack() in __nfs_revalidate_inode when the getattr fails with an ESTALE. Here's the result: [<e064196d>] __nfs_revalidate_inode+0x1d3/0x42e [nfs] [<c018557b>] dput+0x33/0x423 [<c017c752>] link_path_walk+0x90/0xb9 [<c01d39a2>] selinux_inode_getattr+0x48/0x50 [<e06411fc>] nfs_getattr+0x45/0x69 [nfs] [<e06411b7>] nfs_getattr+0x0/0x69 [nfs] [<c0176ca5>] vfs_getattr+0x35/0x88 [<c0176d20>] vfs_stat+0x28/0x3a [<c0134124>] do_sigaction+0x3e8/0x491 [<c0177315>] sys_stat64+0xf/0x23 [<c013462e>] sys_rt_sigaction+0x8f/0xf2 [<c011d7e5>] do_page_fault+0x0/0x4dc [<c031a39b>] error_code+0x2f/0x38 [<c0319903>] syscall_call+0x7/0xb ...so we apparently need to handle the ESTALE gracefully somewhere along here. Confirmed that problem exists in kernel.org 2.6.15 kernels as well. Looking now to see if it's ever actually been fixed. Also exists in 2.6.18-rc7, and in Linville's wireless-dev git tree (which seems to be ~2.6.19-rc4 for VFS/NFS stuff). Created attachment 140933 [details]
patch to retry lookup if the stat call fails
Here's what I think is happening.
The first lookup occurs and populates the dcache/icache. The dir is then
removed and recreated on the server, which changes the filehandle. The client
then makes a stat() call which ends up in vfs_stat. vfs_stat uses
user_path_walk() to populate the nameidata struct, but because the attribute
cache hasn't timed out, it presumes that the info it has is already good.
vfs_stat then calls vfs_getattr, which calls nfs_getattr, which forces an on
the wire lookup because the filesystem is mounted with atimes enabled. This
generates the ESTALE, and invalidates the inode.
The attached patch is one way to fix it. If user_path_walk() works, but the
next call ends up with a -ESTALE, then we can presume the above has occurred
and force a retry on the lookup. If that also fails with an -ESTALE then we'd
return with that error.
We'd probably also want to fix up vfs_lstat too (and maybe some other
functions, I'll troll through them).
Another possibility might be to set the LOOKUP_REVAL flag in the original
lookup to force a true on-the-wire revalidation, but that would probably mean
more otw operations overall.
Peter, what do you think about this approach?
It seems like something like this will be required, but this will require some careful analysis and checking to ensure that it will not loop indefinitely. We can't set LOOKUP_REVAL on the original lookup. The performance cost would be too high. The set of system calls which need to be checked is the entire set which take pathnames as their arguments. Created attachment 140934 [details]
updated patch, fix vfs_lstat the same way
This patch fixes vfs_lstat in the same way. I don't see any other calls that
would need to be fixed this way right offhand.
Also, my description of the problem was a little off. nfs_getattr doesn't force
an otw lookup, rather it forces an otw getattr call.
These patches assume that it's not possible for user_path_walk to repeatedly
succeed but to repeatedly get an ESTALE from the vfs_getattr call. Such a
situation would make an infinite loop, obviously, but I don't see how that
could occur.
_All_ system calls which take pathnames as an argument _must_ be fixed in this fashion. The trick is to ensure that the lookup process eventually either fails with ENOENT or continues to purge stale caches until the ESTALE is discovered by validating the attributes on the root of the file system. Then, the error, ENOENT, should be returned instead of ESTALE. Without this trick, if the root of a mount point goes stale, then the client will loop forever. The code also needs to take the current working directory of the process into account as well. If the current working directory becomes stale, then a relative pathname lookup may also cause an infinite loop if this situation is handled correctly. Ok, this issue is a quite a bit more complex than I had originally thought. :-) Am I wrong that most of the code you describe is already present in __link_path_walk? I thought that it will revalidate dentries all the way back to the root of the fs when it hit an ESTALE (via the link_path_walk wrapper). Though, I'm not certain how it handles the situation when the root of the fs is itself stale. If that is already handled, then we'd just need to patch up the syscalls a'la the patch above, correct? Here's a list of syscalls from a recent git kernel that I identified that seem to take pathnames as args. I built this list quickly, so it might be incomplete. Some of these are just wrappers around common functions so the amount of work might not be so bad. Still, with so many, perhaps it would be better to do this in stages? sys_truncate sys_stat sys_statfs sys_statfs64 sys_lstat sys_newstat sys_newlstat sys_stat64 sys_lstat64 sys_truncate64 sys_setxattr sys_lsetxattr sys_getxattr sys_lgetxattr sys_listxattr sys_llistxattr sys_removexattr sys_lremovexattr sys_pivot_root sys_chroot sys_mknod sys_link sys_symlink sys_unlink sys_rename sys_chmod sys_creat sys_open sys_access sys_chown sys_lchown sys_chown16 sys_lchown16 sys_getcwd sys_mkdir sys_chdir sys_rmdir sys_swapon sys_swapoff sys_uselib sys_mknodat sys_mkdirat sys_unlinkat sys_symlinkat sys_linkat sys_renameat sys_futimesat sys_faccessat sys_fchmodat sys_fchownat sys_openat sys_newfstatat sys_fstatat64 sys_readlinkat compat_sys_futimesat compat_sys_newfstatat compat_sys_openat kernel_execve Created attachment 141049 [details]
patch against upstream, fix vfs_stat, vfs_lstat and sys_readlinkat
Here's a slightly updated patch. Since the same problem exists upstream, it's
probably best to work on fixing it there first. This patches up vfs_stat,
vfs_lstat and vfs_readlinkat (since they all exist in fs/stat.c).
Basically, I'm doing the same thing as before, but also making sure that
LOOKUP_REVAL is set when we redo the lookup on an ESTALE. My thinking is that
the proper place to put the ESTALE handling (if it doesn't already exist there)
is in the pathname resolution, so fixing up the syscalls to just reenter that
code and force revalidation is probably what we want.
Thoughts?
Thank you for the constructing that list of system calls which may need to be modified! The attached patch is a little buggy but seems along the right lines. We need to ensure that LOOKUP_REVAL does the right thing so that the pathname lookup portion works right. We also need to ensure that it is not possible for the kernel to loop forever if it is not possible to recover. Created attachment 141073 [details]
new patch for vfs_stat, vfs_lstat and sys_readlinkat
Doh! Yeah, I see where I missed the label in the lstat call. I've not tested
any of the newer patches, so these shouldn't be taken too literally, more
RFC's.
This patch adds the missing label, and also adds a retried flag. If we retry
the lookup and still end up in the same codepath, then we won't retry again
(and hence won't loop forever). Is this what you were more meaning by ensuring
that we don't loop forever?
Also, I checked and the only place that seems to check the LOOKUP_REVAL flag
is:
nfs_lookup_verify_inode
no other filesystems seem to care about it. Not sure if that would be big deal
or not...
I am mostly concerned about the pathname lookups which don't actually require any over the wire lookups, like looking up "." or "". This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. *** Bug 401551 has been marked as a duplicate of this bug. *** Updating PM score. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. |