Bug 496121 - NFS server reboot results in "Stale NFS file handle"
NFS server reboot results in "Stale NFS file handle"
Status: CLOSED DUPLICATE of bug 461043
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: nfs-utils (Show other bugs)
4.7
All Linux
low Severity medium
: rc
: ---
Assigned To: Steve Dickson
BaseOS QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-16 14:32 EDT by johnschmidt4
Modified: 2012-08-28 02:32 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 473396
Environment:
Last Closed: 2009-07-28 15:20:53 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description johnschmidt4 2009-04-16 14:32:48 EDT
+++ This bug was initially created as a clone of Bug #473396 +++

Description of problem:

I use FC10 (2.6.27.5-117.fc10.i686.PAE) as a NFS file server and different Fedora clients all behaving identical. Whenever I reboot the NFS server all clients get a "Stale NFS file handle" when accessing the NFS share.

I do use UDP (not TCP) for NFS and I do use "fsid" to ensure, that the filesystem ID is not changing. I'm using NFSv3.

I found out, that executing "exportfs -f" on the freshly rebootet file server does workaround the problem.

Using wireshark I found out, that:
 1. the fsid is indeed identical during reboots
    and matches the fsid written in /etc/exports
 2. I'm using UDP instead of TCP
 3. After reboot of the NFS server the client ("ls /home") asks for:
      GETATTR, FH: 0x83951db5 ... and gets a REPLY of NFS3ERR_STALE
      ACCESS,  FH: 0x83951db5 ... and gets a REPLY of NFS3ERR_STALE
 4. After "exportfs -f" the client asks for:
      ACCESS,  FH: 0x83951db5 ... and gets a REPLY of NFS3_OK
      GETATTR, FH: 0x83951db5 ... and gets a REPLY of NFS3_OK

This means, that the file handle is unchanged during the whole process. It is first valid, then the server reboots and it is stale, and after exportfs it is again valid.

Using Fedora 1 (2.4.31) it did not get this behaviour.

I build up and test server and I'm therefore able to reproduce the problem easily and test any fixes or ideas.

Does nyone knows what is wrong?

Version-Release number of selected component (if applicable):
kernel-2.6.27.5-117.fc10.i686.PAE
nfs-utils-1.1.4-1.fc10

How reproducible:
 always

Steps to Reproduce:
1. Export something via NFS
2. mount on client
3. reboot NFS server
4. watch "Stale NFS file handle"
5. execute "exportfs -f" on server
6. watch NFS working again.
  
Actual results:
Got "Stale NFS file handle"


Expected results:
No "Stale NFS file handle"


Additional info:

--- Additional comment from t.bubeck@reinform.de on 2008-12-01 05:57:45 EDT ---

Created an attachment (id=325213)
Patch to solve problem by changing order in /etc/init.d/nfs

This patch fixes the problem by changing the start order in /etc/init.d/nfs.

Why? Before the patch the start order in /etc/init.d/nfs was:
  1. exportfs -r
  2. modprobe nfsd
     This also does (see modprobe.conf.dist): mount -t nfsd nfsd /proc/fs/nfsd

The patch changes the order to:
  1. modprobe nfsd 
     This also does (see modprobe.conf.dist): mount -t nfsd nfsd /proc/fs/nfsd
  2. exportfs -r

As you can read in "man exportfs" there are two modes: "legacy" and "new" depending on the existance of /proc/fs/nfsd. The unpatched version of /etc/init.d/nfs executes exportfs in lecacy mode and therefore feeds /var/lib/nfs/rmtab into the kernel. After "mount -t nfsd ..." the "new" mode is used. In this case the previously fed rmtab is disturbing because we have a mix of "legacy" and "new" mode. This results in a lot of problems as described in the initial post.

This mix could be fixed by issuing "exportfs -f" as a workaround. But this is not a good solution, because all NFS clients already had a "Stale NFS file handle". But they reconnect after "exportfs -f".

By applying the patch then "exportfs -r" will already find /proc/fs/nfsd and therefore use the "new" mode which is used throughout the rest of Fedora. Therefore no odd mix is created and everything goes smoothly.

--- Additional comment from updates@fedoraproject.org on 2008-12-01 11:19:33 EDT ---

nfs-utils-1.1.4-2.fc10 has been submitted as an update for Fedora 10.
http://admin.fedoraproject.org/updates/nfs-utils-1.1.4-2.fc10

--- Additional comment from updates@fedoraproject.org on 2008-12-01 11:23:51 EDT ---

nfs-utils-1.1.2-7.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/nfs-utils-1.1.2-7.fc9

--- Additional comment from steved@redhat.com on 2008-12-01 13:00:13 EDT ---

Till,

Thank you for your detailed analysis and Yes I think its accurate
to say the nfsd module should loaded before the exports
are created. 

If possible, could you give the 1.1.4-2 version of nfs-utils a try?
found at 
    http://koji.fedoraproject.org/koji/buildinfo?buildID=72546

If it works, I'll push it out to the rest of the community... tia..

--- Additional comment from updates@fedoraproject.org on 2008-12-02 20:25:13 EDT ---

nfs-utils-1.1.4-2.fc10 has been pushed to the Fedora 10 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update nfs-utils'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2008-10642

--- Additional comment from updates@fedoraproject.org on 2008-12-02 20:27:56 EDT ---

nfs-utils-1.1.2-7.fc9 has been pushed to the Fedora 9 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing-newkey update nfs-utils'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-10681

--- Additional comment from t.bubeck@reinform.de on 2008-12-04 03:52:47 EDT ---

I tried nfs-utils-1.1.4-2.fc10.i386.rpm and it works correctly.
Thanks for the fast fix!

--- Additional comment from updates@fedoraproject.org on 2008-12-06 23:26:11 EDT ---

nfs-utils-1.1.2-7.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.

--- Additional comment from updates@fedoraproject.org on 2008-12-06 23:31:47 EDT ---

nfs-utils-1.1.4-2.fc10 has been pushed to the Fedora 10 stable repository.  If problems still persist, please make note of it in this bug report.

--- Additional comment from johnschmidt4@gmail.com on 2009-04-16 13:14:43 EDT ---

This is also present on RHEL 4.7 under these conditions.

root@a2a10 ~ rpm -qa | grep nfs
nfs-utils-1.0.6-87.EL4
nfs-utils-lib-1.0.6-8.z1
root@a2a10 ~ uname -a
Linux a2a10 2.6.9-78.0.8.ELsmp #1 SMP Wed Nov 5 07:10:44 EST 2008 i686 i686 i386 GNU/Linux
root@a2a10 ~ cat /etc/redhat-release 
Red Hat Enterprise Linux ES release 4 (Nahant Update 7)

--- Additional comment from staubach@redhat.com on 2009-04-16 14:18:39 EDT ---

Would you mind creating a bugzilla for RHEL-4 for this, please?
Comment 2 Jeff Layton 2009-07-28 15:18:43 EDT
Now that I look closer, it looks like this is already fixed in RHEL4:

        [ "$NFSD_MODULE" != "noload" ] && {
                [ -x /sbin/modprobe ] && /sbin/modprobe nfsd
        }
        action $"Starting NFS services: " /usr/sbin/exportfs -r

...and /etc/modprobe.conf.dist has this:

install nfsd /sbin/modprobe --first-time --ignore-install nfsd && { /bin/mount -t nfsd nfsd /proc/fs/nfsd > /dev/null 2>&1 || :; }

...so when the modprobe occurs, the nfsd filesystem will get mounted up (assuming they don't have it set up to "noload").

This is on nfs-utils-1.0.6-90.EL4.x86_64. Looks like it went in with the bugfix for 461043.
Comment 3 Jeff Layton 2009-07-28 15:20:53 EDT
Closing this as a duplicate of 461043 since it looks like this was fixed at the same time as that bug.

*** This bug has been marked as a duplicate of bug 461043 ***

Note You need to log in before you can comment on or make changes to this bug.