Bug 127868

Summary: FC3T1: Mount Fails with "mount: Stale NFS file handle" message
Product: [Fedora] Fedora Reporter: Thomas J. Baker <tjb>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: bugzilla
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-10-12 19:53:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ethereal dump
none
ethereal dump taken from server with just server client traffic
none
second ethereal dump from server with just client server traffic none

Description Thomas J. Baker 2004-07-14 20:33:10 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
Gecko/20040510 Galeon/1.3.16

Description of problem:
I just installed FC3T1 and mounting an nfs directory from an FC2
server fails:

[root@doolittle /]# mkdir /xxx
[root@doolittle /]# mount wintermute:/home/tjb /xxx
mount: Stale NFS file handle
[root@doolittle /]#

Nothing is logged on the client but the FC2 server logs say this:

Jul 14 15:50:27 wintermute rpc.mountd: authenticated mount request
from doolittle.sr.unh.edu:842 for /home/tjb (/home)

I left selinux on the default which I believe is enforcing but
/var/log/messages isn't filled with avc messages about the failure,
just the stale nfs handle messages.

The FC3T1 system can mount nfs directories from an RHEL3 machine and
from a system running rawhide. (Along the same lines, the system
running rawhide can mount some but not all of the FC2 systems exports.
The ones that fail fail with the Stale NFS mount error message. This
started happening right after I upgraded it from FC2 to rawhide.)


Version-Release number of selected component (if applicable):
kernel-2.6.7-1.478, nfs-utils-1.0.6-30

How reproducible:
Sometimes

Steps to Reproduce:
1. install fc3t1
2. try to mount an exported directory from an fc2 system
3.
    

Actual Results:  Fails with stale nfs handle.

Expected Results:  works.

Additional info:

Comment 1 Steve Dickson 2004-07-14 21:41:46 UTC
Is it possible to get an ethereal trace of the estale?

Also there were some recent mounting issue that seem to
be cleared up in the 480 kernel... 

Does it help to boot with selinux=0?

Comment 2 Thomas J. Baker 2004-07-16 14:03:53 UTC
Created attachment 101965 [details]
ethereal dump

selinux=0 doesn't help, kernel 2.6.7-1.488 doesn't help. Here is the ethereal
dump. I'll be trying 492 kernel today.

Comment 3 Thomas J. Baker 2004-07-16 19:35:23 UTC
This looks like it could be due to a strange interaction with how the
filesystem is exported from the server. If the FC3T1 host is not
listed explicitly and instead is part of a netgroup, it fails with the
Stale NFS handle. If it is listed explicitly, it works as expected.
I'm trying to isolate it down to a specific case for easier debugging.

Comment 4 Thomas J. Baker 2004-07-20 18:29:16 UTC
It's related to how the file systems are exported. If the FC3T1 client
is listed explicitly in the exports file for a given filesystem, nfs
mounting seems to work fine. If the file system is exported with the
client as part of a netgroup it gives the stale nfs handle problem. If
I have the FC3T1 client explicitly listed, do an nfs mount, and then
remove the explicit listing from the server exports, leaving the
netgroup listing which should still match the client, exportfs again,
the client then behaves like the server has disappeared completely:

nfs: server wintermute.sr.unh.edu not responding, still trying

It's really quite a strange bug. 

Another oddity I've run into is that between two FC3T1 systems,
wildcard matching in the export file doesn't work. If I have a file
system exported to *.unh.edu, and try to mount it, it gives

mount doolittle:/space /xxx
mount: doolittle:/space failed, reason given by server: Permission denied
[tjb@katratzi tjb]# showmount -e doolittle
Export list for doolittle:
/home  @rcc_linux
/space *.unh.edu
[tjb@katratzi tjb]#

I don't know if they're related or not.

Comment 5 Steve Dickson 2004-07-30 14:03:48 UTC
Yes.. I agree with the strangeness... The "Permission denied"
could have to do with rpc.idmapd messing things up...
make sure you have the latest nfs-utils (1.0.6-22 i think).

WRT the estale, is there anything in /var/log/messages on
why the server is failing the fsinfo?

Also can you post the exports tab that works and 
the one that does not work?

Comment 6 Thomas J. Baker 2004-08-02 18:07:00 UTC
Here is the fstab:

/home                           @rcc_linux(rw) \
                                katratzi(rw,no_root_squash) \
                                doolittle(rw,no_root_squash)
 
/space                          katratzi(rw,no_root_squash) \
                                doolittle(rw,no_root_squash)
 
/space/tmp                      *.unh.edu(ro,insecure,all_squash)
 
/space/ftp/redhat               *.unh.edu(ro,insecure,all_squash) \
                               
132.177.0.0/255.255.0.0(ro,insecure,all_squash)
 
/temp/music                     @rcc_linux(ro)
 
/temp/games                     @rcc_linux(ro)

Both katratzi and doolittle are members of yp netgroup file rcc_linux
yet only exports that list them explicity work. I've got the latest
nfs utils (nfs-utils-1.0.6-30). The only thing the server logs is this:

Aug  2 14:05:28 wintermute rpc.mountd: authenticated mount request
from doolittle.sr.unh.edu:683 for /space/ftp/redhat/rcc
(/space/ftp/redhat)
Aug  2 14:05:28 wintermute rpc.mountd: authenticated mount request
from doolittle.sr.unh.edu:690 for /space/ftp/redhat/rcc
(/space/ftp/redhat)

The client logs this:
Aug  2 14:05:15 doolittle kernel: SELinux: initialized (dev 0:1a, type
nfs), uses genfs_contexts
Aug  2 14:05:28 doolittle automount[7283]: >> mount: Stale NFS file handle
Aug  2 14:05:28 doolittle automount[7283]: mount(nfs): nfs: mount
failure redhat-mirror:/space/ftp/redhat/rcc on /net/redhat/rcc
Aug  2 14:05:28 doolittle automount[7283]: failed to mount /net/redhat/rcc
Aug  2 14:05:28 doolittle automount[7285]: >> mount: Stale NFS file handle
Aug  2 14:05:28 doolittle automount[7285]: mount(nfs): nfs: mount
failure redhat-mirror:/space/ftp/redhat/rcc on /net/redhat/rcc
Aug  2 14:05:28 doolittle automount[7285]: failed to mount /net/redhat/rcc

I'm now running the 2.6.7-1.499 kernel.


Comment 7 Thomas J. Baker 2004-08-02 18:08:51 UTC
That should read 'are members of the yp netgroup rcc_linux'.

Comment 8 Steve Dickson 2004-08-10 14:15:22 UTC
Just curious... if you re-export (i.e. exportfs -arv) the
filesystems, does the problem go away?

Comment 9 Thomas J. Baker 2004-08-12 11:50:27 UTC
No, I get the same stale nfs handle error message on the FC3T1 client.

Comment 10 Steve Dickson 2004-08-13 15:31:37 UTC
Just for grins.... added fsid=0 to one of your exports options and
see what happens... 

Comment 11 Thomas J. Baker 2004-08-13 18:07:48 UTC
I added the fsid=0 to the /temp/games export, exportfs -arv, and when
I tried to automount it from the FC3T1 client, the mount hangs. The cd
command that triggered the automount hangs, a df from another window
hangs, and I can't even log in as a normal user though root works. I
looked at the /etc/mtab and it doesn't include the mount for
/temp/games so at least we know that the mount is hanging. Eventually,
the 'nfsserver not responding message is logged' which explains why my
other login attempts fail as my home directory is automounted from the
same server.

Client didn't log anything and server logged a normal mount request:

Aug 13 13:31:41 wintermute rpc.mountd: authenticated mount request
from katratzi.sr.unh.edu:919 for /temp/games (/temp/games)

To make it really interesting, I tried to log into my other FC3T1 test
system and it can't nfs mount my home directory either. Server logs
normally:

Aug 13 13:56:55 wintermute rpc.mountd: authenticated mount request
from doolittle.sr.unh.edu:927 for /home/tjb (/home)

BUT two other RHEL3U2 systems mount my home directory fine. After
rebooting the first FC3T1 client, I tried mounting my home directory
again and it hung again. It's like if I modify the exports file and
reexport, my FC3T1 systems can't talk to the nfs server anymore yet
RHEL3 and FC2 seem fine.

BTW, kernels are 517 on the first and 515 on the second FC3T1 systems.


Comment 12 Steve Dickson 2004-08-13 18:26:59 UTC
can you try the mount by hand (i.e. not using autofs) while
running an ethereal trace... then post the trace....

Comment 13 Thomas J. Baker 2004-08-13 18:59:09 UTC
Created attachment 102711 [details]
ethereal dump taken from server with just server client traffic

Comment 14 Thomas J. Baker 2004-08-13 19:44:47 UTC
Created attachment 102716 [details]
second ethereal dump from server with just client server traffic

I rebooted the server and without any other changes, the fc3t1 system came back
(nfs server OK) from the hung mount request but gave a permissions denied. A
second mount attempt gave the stale nfs handle and a third was captured in the
included dump. It may or may not provide more info but I thought it couldn't
hurt.

Comment 15 Steve Dickson 2004-08-28 15:04:41 UTC
hmm... it sure seems like its an server export problem...
If you simplify your exports to something like: 
/home  *(rw,sync,fsid=0) 

does that work?

Comment 16 Thomas J. Baker 2004-09-28 18:19:34 UTC
I updated the server to FC3T2 and the nfs problem persists. Another
problem is that if I ever run 'exportfs -av' on the server, the FC3T2
clients hang with "NFS server not responding" until I reboot the
server. It makes testing changes rather tedious. Anyway, I exported a
directory as you requested and that doesn't work either:

Client side:

[root@katratzi tjb]# mount wintermute:/raid /xxx
mount: wintermute:/raid failed, reason given by server: Permission denied
[root@katratzi tjb]# showmount -e wintermute
Export list for wintermute:
/raid       *
/temp/games @rcc_linux
/space/ftp  132.177.0.0/255.255.0.0
/home       @rcc_linux,katratzi.sr.unh.edu
[root@katratzi tjb]# mount wintermute:/raid /xxx
mount: Stale NFS file handle
[root@katratzi tjb]# 


Server side:

Sep 28 13:54:52 wintermute rpc.mountd: authenticated mount request
from katratzi.sr.unh.edu:676 for /raid (/raid)

I have no firewall, selinux disabled, and right now, due to another
potential bug (#133906), tcp wrappers completely turned off too.

Comment 17 Carlo Wood 2004-10-08 22:19:04 UTC
This looks very much like a problem I have too.
But as far as I know, I am not using FC3T2, heh.

I am not really into testing fedora - I just run
'apt-get update; apt-get upgrade' sometimes.
Last time I did that I suddenly couldn't mount
anymore and had to downgrade nfs-tools.

I have a VERY simple setup (and simple exports file).
Doesn't get much simpler than this.  In other words,
it seems to ME that fedora's public, current, non-test
version has a totally broken NFS (under certain circumstances
I presume).

Is there anything I can do to help to fix this?


Comment 18 Need Real Name 2004-10-09 18:16:37 UTC
It seems that the failure is in mountd.c (rpc.mountd) when function
cache_get_filehandle tries to open /proc/fs/nfs[d]/filehandle  and
fails (I got 2.6.9-rc1-mm4 and don't got this file there...)

I'm doing some more research.

Comment 19 Need Real Name 2004-10-09 18:20:33 UTC
Got it - add these lines to /etc/fstab (on NFS server):
nfsd                    /proc/fs/nfsd           nfsd    defaults 0 0 
sunrpc                  /var/lib/nfs/rpc_pipefs rpc_pipefs defaults 0 0

See also bug:125345


Comment 20 Thomas J. Baker 2004-10-12 19:32:04 UTC
Whatever was in the last bunch of updates (10/12/2004) seems to have
fixed this problem. Since nfs-utils wasn't updated, I can only assume
it's the kernel?