Bug 127868
Summary: | FC3T1: Mount Fails with "mount: Stale NFS file handle" message | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Thomas J. Baker <tjb> | ||||||||
Component: | nfs-utils | Assignee: | Steve Dickson <steved> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | rawhide | CC: | bugzilla | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i686 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2004-10-12 19:53:42 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Thomas J. Baker
2004-07-14 20:33:10 UTC
Is it possible to get an ethereal trace of the estale? Also there were some recent mounting issue that seem to be cleared up in the 480 kernel... Does it help to boot with selinux=0? Created attachment 101965 [details]
ethereal dump
selinux=0 doesn't help, kernel 2.6.7-1.488 doesn't help. Here is the ethereal
dump. I'll be trying 492 kernel today.
This looks like it could be due to a strange interaction with how the filesystem is exported from the server. If the FC3T1 host is not listed explicitly and instead is part of a netgroup, it fails with the Stale NFS handle. If it is listed explicitly, it works as expected. I'm trying to isolate it down to a specific case for easier debugging. It's related to how the file systems are exported. If the FC3T1 client is listed explicitly in the exports file for a given filesystem, nfs mounting seems to work fine. If the file system is exported with the client as part of a netgroup it gives the stale nfs handle problem. If I have the FC3T1 client explicitly listed, do an nfs mount, and then remove the explicit listing from the server exports, leaving the netgroup listing which should still match the client, exportfs again, the client then behaves like the server has disappeared completely: nfs: server wintermute.sr.unh.edu not responding, still trying It's really quite a strange bug. Another oddity I've run into is that between two FC3T1 systems, wildcard matching in the export file doesn't work. If I have a file system exported to *.unh.edu, and try to mount it, it gives mount doolittle:/space /xxx mount: doolittle:/space failed, reason given by server: Permission denied [tjb@katratzi tjb]# showmount -e doolittle Export list for doolittle: /home @rcc_linux /space *.unh.edu [tjb@katratzi tjb]# I don't know if they're related or not. Yes.. I agree with the strangeness... The "Permission denied" could have to do with rpc.idmapd messing things up... make sure you have the latest nfs-utils (1.0.6-22 i think). WRT the estale, is there anything in /var/log/messages on why the server is failing the fsinfo? Also can you post the exports tab that works and the one that does not work? Here is the fstab: /home @rcc_linux(rw) \ katratzi(rw,no_root_squash) \ doolittle(rw,no_root_squash) /space katratzi(rw,no_root_squash) \ doolittle(rw,no_root_squash) /space/tmp *.unh.edu(ro,insecure,all_squash) /space/ftp/redhat *.unh.edu(ro,insecure,all_squash) \ 132.177.0.0/255.255.0.0(ro,insecure,all_squash) /temp/music @rcc_linux(ro) /temp/games @rcc_linux(ro) Both katratzi and doolittle are members of yp netgroup file rcc_linux yet only exports that list them explicity work. I've got the latest nfs utils (nfs-utils-1.0.6-30). The only thing the server logs is this: Aug 2 14:05:28 wintermute rpc.mountd: authenticated mount request from doolittle.sr.unh.edu:683 for /space/ftp/redhat/rcc (/space/ftp/redhat) Aug 2 14:05:28 wintermute rpc.mountd: authenticated mount request from doolittle.sr.unh.edu:690 for /space/ftp/redhat/rcc (/space/ftp/redhat) The client logs this: Aug 2 14:05:15 doolittle kernel: SELinux: initialized (dev 0:1a, type nfs), uses genfs_contexts Aug 2 14:05:28 doolittle automount[7283]: >> mount: Stale NFS file handle Aug 2 14:05:28 doolittle automount[7283]: mount(nfs): nfs: mount failure redhat-mirror:/space/ftp/redhat/rcc on /net/redhat/rcc Aug 2 14:05:28 doolittle automount[7283]: failed to mount /net/redhat/rcc Aug 2 14:05:28 doolittle automount[7285]: >> mount: Stale NFS file handle Aug 2 14:05:28 doolittle automount[7285]: mount(nfs): nfs: mount failure redhat-mirror:/space/ftp/redhat/rcc on /net/redhat/rcc Aug 2 14:05:28 doolittle automount[7285]: failed to mount /net/redhat/rcc I'm now running the 2.6.7-1.499 kernel. That should read 'are members of the yp netgroup rcc_linux'. Just curious... if you re-export (i.e. exportfs -arv) the filesystems, does the problem go away? No, I get the same stale nfs handle error message on the FC3T1 client. Just for grins.... added fsid=0 to one of your exports options and see what happens... I added the fsid=0 to the /temp/games export, exportfs -arv, and when I tried to automount it from the FC3T1 client, the mount hangs. The cd command that triggered the automount hangs, a df from another window hangs, and I can't even log in as a normal user though root works. I looked at the /etc/mtab and it doesn't include the mount for /temp/games so at least we know that the mount is hanging. Eventually, the 'nfsserver not responding message is logged' which explains why my other login attempts fail as my home directory is automounted from the same server. Client didn't log anything and server logged a normal mount request: Aug 13 13:31:41 wintermute rpc.mountd: authenticated mount request from katratzi.sr.unh.edu:919 for /temp/games (/temp/games) To make it really interesting, I tried to log into my other FC3T1 test system and it can't nfs mount my home directory either. Server logs normally: Aug 13 13:56:55 wintermute rpc.mountd: authenticated mount request from doolittle.sr.unh.edu:927 for /home/tjb (/home) BUT two other RHEL3U2 systems mount my home directory fine. After rebooting the first FC3T1 client, I tried mounting my home directory again and it hung again. It's like if I modify the exports file and reexport, my FC3T1 systems can't talk to the nfs server anymore yet RHEL3 and FC2 seem fine. BTW, kernels are 517 on the first and 515 on the second FC3T1 systems. can you try the mount by hand (i.e. not using autofs) while running an ethereal trace... then post the trace.... Created attachment 102711 [details]
ethereal dump taken from server with just server client traffic
Created attachment 102716 [details]
second ethereal dump from server with just client server traffic
I rebooted the server and without any other changes, the fc3t1 system came back
(nfs server OK) from the hung mount request but gave a permissions denied. A
second mount attempt gave the stale nfs handle and a third was captured in the
included dump. It may or may not provide more info but I thought it couldn't
hurt.
hmm... it sure seems like its an server export problem... If you simplify your exports to something like: /home *(rw,sync,fsid=0) does that work? I updated the server to FC3T2 and the nfs problem persists. Another problem is that if I ever run 'exportfs -av' on the server, the FC3T2 clients hang with "NFS server not responding" until I reboot the server. It makes testing changes rather tedious. Anyway, I exported a directory as you requested and that doesn't work either: Client side: [root@katratzi tjb]# mount wintermute:/raid /xxx mount: wintermute:/raid failed, reason given by server: Permission denied [root@katratzi tjb]# showmount -e wintermute Export list for wintermute: /raid * /temp/games @rcc_linux /space/ftp 132.177.0.0/255.255.0.0 /home @rcc_linux,katratzi.sr.unh.edu [root@katratzi tjb]# mount wintermute:/raid /xxx mount: Stale NFS file handle [root@katratzi tjb]# Server side: Sep 28 13:54:52 wintermute rpc.mountd: authenticated mount request from katratzi.sr.unh.edu:676 for /raid (/raid) I have no firewall, selinux disabled, and right now, due to another potential bug (#133906), tcp wrappers completely turned off too. This looks very much like a problem I have too. But as far as I know, I am not using FC3T2, heh. I am not really into testing fedora - I just run 'apt-get update; apt-get upgrade' sometimes. Last time I did that I suddenly couldn't mount anymore and had to downgrade nfs-tools. I have a VERY simple setup (and simple exports file). Doesn't get much simpler than this. In other words, it seems to ME that fedora's public, current, non-test version has a totally broken NFS (under certain circumstances I presume). Is there anything I can do to help to fix this? It seems that the failure is in mountd.c (rpc.mountd) when function cache_get_filehandle tries to open /proc/fs/nfs[d]/filehandle and fails (I got 2.6.9-rc1-mm4 and don't got this file there...) I'm doing some more research. Got it - add these lines to /etc/fstab (on NFS server): nfsd /proc/fs/nfsd nfsd defaults 0 0 sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs defaults 0 0 See also bug:125345 Whatever was in the last bunch of updates (10/12/2004) seems to have fixed this problem. Since nfs-utils wasn't updated, I can only assume it's the kernel? |