Bug 7483

Summary: knfsd stops functioning.
Product: [Retired] Red Hat Linux Reporter: Rex Dieter <rdieter>
Component: nfs-utilsAssignee: Michael K. Johnson <johnsonm>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: a.j.every, bartschies, dch, johnb, kirk.erickson, mapatw=bugs, ung
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-01-25 00:56:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rex Dieter 1999-12-01 16:55:04 UTC
I thought it a fluke once, but now this has occurred twice on our linux
nfs server.

I export a couple of items from our server, namely two directories /Users
and /var/spool/mail.  Most of our clients are using the AMD automounter
(from am-utils) to mount these directories.  After a time, the nfs clients
lose their mounts and report "stale NFS handle" errors.  On the server,
the /var/log/messages logfile gets bombarded with entries of the form:
Nov xx xx:xx:xx server kernel: nfsd Security: /// bad export.

Only by either rebooting the server or stopping/restarting the nfs service
(via /etc/rc.d/init.d/nfs stop and /etc/rc.d/init.d/nfs start) restores
operation to normal.

Comment 1 johnb 1999-12-06 08:03:59 UTC
We were experiencing this problem for months.  It frequently occurred
whenever the NFS server was experiencing high load (during backups,
for example.)  I couldn't found a solution anywhere, so i finally bit
the bullet and commented out the piece of kernel code which was
triggering the errors.  We haven't had any NFS problems since.

Maybe the folks at RedHat have a less risky solution?

Comment 2 Rex Dieter 1999-12-07 16:54:59 UTC
As per johnb's comments:

Can you give a few more details pertaining to your metnion of "commenting the
piece of kernel code triggering the error"?

Comment 3 mapatw=bugs 2000-04-04 16:11:59 UTC
We may have a very similar problem.  We export our home directorys from a
redhat 6.1, kernel 2.2.12-20  After a period of usage, sometimes as much as a
day :-) the system becomes overrun with stale file handles.  We are using knfsd-
1.4.7 and have recently tried the latest stable kernel (2.2.14).  None of this
has improved the problem.

This causes us approx one hour of downtime every two days and seems to be
related to load.  We do not have an environment where people are grossly
sharing files etc. so can not understand why so many stale file handles exist.

We are having massive problems with this and if we can't find a work-around
soon we will have to shift all our home filespace back across to our slower
solaris server.  I don't really want this extra work.

Comment 4 Rex Dieter 2000-04-04 20:12:59 UTC
Our problems have almost completely gone away since:
1.  we've started using a lot less non-Linux clients (in our case, NeXTSTEP)
2.  reconfiguring NIS and /etc/nsswitch.conf to NOT use NIS for hostname lookups
3.  Upgrading to kernel-2.2.14-1.3.0 (it was once available at rawhide).  I
wouldn't hesitate in saying that an upgrade from 2.2.12-20 is absolutely
essential.  I haven't upgraded further simply because we've had problem-free
uptimes of 1-2 months.  (If it ain't broke...)
4.  rpc.mountd DOES still occasionally die (once every ~2 weeks), preventing
any new mounts.  I think this is related to hostname lookup problems (our
campus DNS servers crash semi-often).  I wrote a little /etc/cron.hourly script
to check for rpc.mountd's existence, and to relaunch if necessary:

------ /etc/cron.hourly/rpc.mountd -------- snip ------
#!/bin/sh

. /etc/rc.d/init.d/functions

dead=0
prog=rpc.mountd
pid=`pidof $prog`

#Only do check if nfs subsystem is activated
if [ -f /var/lock/subsys/nfs ]; then
  if [ "$pid" != "" ]; then
    dead=0
  else
    dead=1
    date
    echo -n "$prog dead... restarting:"
    daemon /usr/sbin/rpc.mountd --no-nfs-version 3
  fi
fi

-------- /etc/cron.hourly/rpc.mountd ------- snip ------

Comment 5 Cristian Gafton 2000-08-09 02:35:59 UTC
assigned to johnsonm

Comment 6 Stephen John Smoogen 2003-01-25 00:56:30 UTC
Bug 7483 is closed because the problem seems to have been fixed with the major
changes in kernel, nfs and other utilities between 7.0 and 7.3. Our servers seem
to be similarly set up with 200 nfs clients and no stale handle problems