Bug 130165

Summary: nfs mounts "disappear" randomly
Product: [Fedora] Fedora Reporter: Shahms E. King <shahms>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 2CC: nhorman, pfrields, rdieter, redhat, steven, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-04-16 04:50:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shahms E. King 2004-08-17 17:03:42 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7)
Gecko/20040808 Firefox/0.9.3

Description of problem:
NFS mounts from one Fedora Core 2 machine on another Fedora Core 2
machine all running the same versions of the respective software
"disappear" occasionally.  Basically, the mount point disappears and
any access returns "no such file or directory" with no errors in the
logfiles on either machine.  

Occasionally (but not always) there will be an "nfs_statfs: error = 2"
in the client logs.

Logging in to the NFS server and having a shell open in the affected
directory solves the problem on the client.  This is with both NFSv2
and NFSv3 and any combinations of mount and export options I tried. 
NFSv4 is not an option because of the bug where all user ids are
mapped to "root:bin".  (There is an entry for this in Bugzilla, but I
could not find it just now).

Version-Release number of selected component (if applicable):
nfs-utils-1.0.6-22 kernel-2.6.7-1.494.2.2

How reproducible:
Sometimes

Steps to Reproduce:
1. Mount a remote directory over NFS.
2. Log in and use it like normal.
3. Eventually, every access to the mount point will return "no such
file or directory"
    

Actual Results:  Files that are present return "no such file or directory"

Expected Results:  The files that exist on the NFS server should
appear on the NFS client.

Additional info:

Comment 1 Rex Dieter 2004-08-17 17:21:11 UTC
What OS(es) are the NFS clients?

Comment 2 Shahms E. King 2004-08-17 17:24:53 UTC
Both client and server are Fedora Core 2, both compeletely updated.

Comment 3 Shahms E. King 2004-08-17 17:31:24 UTC
More information:

If the client is an updated FC1 machine, the error gets reported as
"Stale NFS File Handle" but the same work around solves the problem.

Comment 4 Neil Horman 2004-08-17 17:38:27 UTC
If you use the touch command on the server in the directory that you
are trying to access via nfs (i.e. run touch .) in the affected
directory, can the clients see it again?

Are you by any chance running a program on your server that might
reset the modification time on various directories of your server
(rsync with the -a option is an example of this.)

Comment 5 Shahms E. King 2004-08-17 17:51:04 UTC
Yes, running touch on the affected directory allows clients to see it
again briefly.  There are some scripts running which may reset
modification times, but they aren't running in the affected
directories.  Additionally, the mtime is unchanged between touching
the directory and the next time it fails.

Comment 6 Neil Horman 2004-08-17 18:13:02 UTC
Unchanged modification times are commonly what lead to these problems.
 If the contents of the directory were updated, but the modification
time was set to a previous value, the nfs client can be "fooled" into
thinking its cached file handles are still good.  A simmilar problem
was fixed for RHEL3 in BZ 113636.  What version of NFS are you using,
2, 3 or 4?  I believe the RHEL3 did not occur if NFS v2 was used. 

Comment 7 Harry Waye 2004-08-18 11:21:45 UTC
With kernel 2.6.5-1.358 (the original kernel) the file ownerships are
correct with NFSv4, any kernel after has the root:bin prob.

Comment 8 Neil Horman 2004-08-18 11:33:58 UTC
So I take it then that you were using NFSv3 when the stale file handle
issues came up?

Comment 9 Shahms E. King 2004-08-18 14:18:14 UTC
Assuming the nfs(5) man page is correct that v2 is the default, then
it occurs with both NFSv2 and NFSv3.  It happens both with nfsvers=3
and whatever the default is.

Comment 10 Neil Horman 2004-08-18 14:19:48 UTC
IIRC, v3 is actually the current default.  Can you try explicitly
mounting with V2?

Comment 11 Shahms E. King 2004-08-18 15:51:29 UTC
Even explicitly mounting as V2 the problem persists only the error
message changes to "Stale NFS file handle."

Comment 12 Hugh Caley 2004-11-21 18:36:09 UTC
I may be having a similar problem.  kernel 2.6.9-1.3_FC2smp, Nexsan
atabeast carved up with LVM2 and formatted with reiserfs, then shared
with nfs.  Processes will start complaining about Stale file handles;
generally they can be umounted and remounted manually  At least in one
case a user did an 'ls' on a directory 2 times and received "stale"
errir, then on third try the filesystem came back.  Happens both to
manual mounts and autofs mounts.  I'm  thinking something screwy in
lockd?  

Not sure how to go about documenting this; it happens maybe once a
day.  Logging?  What is available.

Comment 13 Dave Jones 2005-01-17 08:38:26 UTC
are you still having this problem with the 2.6.10 updates ?


Comment 14 Shahms E. King 2005-01-18 16:31:55 UTC
I'm going to tentatively say it has been fixed.
I haven't experienced any problems today ;-)

Given the random nature of the problem, I can't quit give an unresevered "this
bug is fixed", but so far, it appears to be.

Comment 15 Dave Jones 2005-04-16 04:50:30 UTC
Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.