Bug 401551

Summary: NFS dentries can contain stale file IDs
Product: Red Hat Enterprise Linux 4 Reporter: Fabio Olive Leite <fleite>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED DUPLICATE QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: chet.burgess, k.georgiou, rrajaram, sfolkwil, sputhenp, staubach, steved, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-01-09 12:48:01 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
Patch that adds extra validity checks to nfs_readdir_lookup. none

Description Fabio Olive Leite 2007-11-27 13:17:27 EST
In RHEL-4u6 it is possible to have nfs_readdir_lookup return dentries 
containing stale NFS file IDs, thus causing stat/open calls to fail.

This is related to upstream commit ef75c7974b383769ae5741cf930b8aa4dcaef395.

I'll attach a patch that applies cleanly to 2.6.9-67.EL and fixes the problem 
(tested at customer site). It has some debug printks that can be safely 
removed.
Comment 1 Fabio Olive Leite 2007-11-27 13:17:27 EST
Created attachment 270011 [details]
Patch that adds extra validity checks to nfs_readdir_lookup.
Comment 3 Jeff Layton 2007-12-03 10:27:06 EST
Patch looks reasonable and is upstream. Do we have a way to reliably reproduce
this? As a side note, this might be the bug behind bug 327591. We have no way to
reproduce that as of yet though, so it's hard to know for sure.
Comment 4 Jeff Layton 2007-12-05 15:37:26 EST
I've gone ahead and added this patch to my test kernels. When you get a
reproducer, it would be good to verify that this patch actually fixes it.
Comment 5 Jeff Layton 2007-12-05 15:37:48 EST
Test kernels are at:

http://people.redhat.com/jlayton/
Comment 9 Jeff Layton 2008-01-02 11:47:14 EST
I've tested this reproducer on a kernel without this patch and then on one with
it. I don't see any difference in behavior. I still see an ENOENT error ~1/3 to
1/2 of the time.

IIRC, when I looked at this problem before, we concluded that the issue was
timestamp granularity on the server. I don't think it's possible to fix that,
aside from moving to a different server-side filesystem.

Can you clarify the configuration of your client and server? What kernel is
running on the client, and what sort of local filesystem is the server using?
Comment 10 Sam Folk-Williams 2008-01-09 09:53:38 EST
Do we have any indication if RHEL 5 is vulnerable to this?
Comment 11 Peter Staubach 2008-01-09 10:09:13 EST
I would guess that RHEL-5 is vulnerable.

I suspect that this problem is really a duplicate of 231143.  The problem
isn't that the client can cache metadata which becomes stale, the problem
is that the client doesn't properly recover when it detects the stale
metadata.