Bug 7483 - knfsd stops functioning.
Summary: knfsd stops functioning.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: nfs-utils
Version: 7.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Michael K. Johnson
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 1999-12-01 16:55 UTC by Rex Dieter
Modified: 2008-05-01 15:37 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-01-25 00:56:30 UTC
Embargoed:


Attachments (Terms of Use)

Description Rex Dieter 1999-12-01 16:55:04 UTC
I thought it a fluke once, but now this has occurred twice on our linux
nfs server.

I export a couple of items from our server, namely two directories /Users
and /var/spool/mail.  Most of our clients are using the AMD automounter
(from am-utils) to mount these directories.  After a time, the nfs clients
lose their mounts and report "stale NFS handle" errors.  On the server,
the /var/log/messages logfile gets bombarded with entries of the form:
Nov xx xx:xx:xx server kernel: nfsd Security: /// bad export.

Only by either rebooting the server or stopping/restarting the nfs service
(via /etc/rc.d/init.d/nfs stop and /etc/rc.d/init.d/nfs start) restores
operation to normal.

Comment 1 johnb 1999-12-06 08:03:59 UTC
We were experiencing this problem for months.  It frequently occurred
whenever the NFS server was experiencing high load (during backups,
for example.)  I couldn't found a solution anywhere, so i finally bit
the bullet and commented out the piece of kernel code which was
triggering the errors.  We haven't had any NFS problems since.

Maybe the folks at RedHat have a less risky solution?

Comment 2 Rex Dieter 1999-12-07 16:54:59 UTC
As per johnb's comments:

Can you give a few more details pertaining to your metnion of "commenting the
piece of kernel code triggering the error"?

Comment 3 mapatw=bugs 2000-04-04 16:11:59 UTC
We may have a very similar problem.  We export our home directorys from a
redhat 6.1, kernel 2.2.12-20  After a period of usage, sometimes as much as a
day :-) the system becomes overrun with stale file handles.  We are using knfsd-
1.4.7 and have recently tried the latest stable kernel (2.2.14).  None of this
has improved the problem.

This causes us approx one hour of downtime every two days and seems to be
related to load.  We do not have an environment where people are grossly
sharing files etc. so can not understand why so many stale file handles exist.

We are having massive problems with this and if we can't find a work-around
soon we will have to shift all our home filespace back across to our slower
solaris server.  I don't really want this extra work.

Comment 4 Rex Dieter 2000-04-04 20:12:59 UTC
Our problems have almost completely gone away since:
1.  we've started using a lot less non-Linux clients (in our case, NeXTSTEP)
2.  reconfiguring NIS and /etc/nsswitch.conf to NOT use NIS for hostname lookups
3.  Upgrading to kernel-2.2.14-1.3.0 (it was once available at rawhide).  I
wouldn't hesitate in saying that an upgrade from 2.2.12-20 is absolutely
essential.  I haven't upgraded further simply because we've had problem-free
uptimes of 1-2 months.  (If it ain't broke...)
4.  rpc.mountd DOES still occasionally die (once every ~2 weeks), preventing
any new mounts.  I think this is related to hostname lookup problems (our
campus DNS servers crash semi-often).  I wrote a little /etc/cron.hourly script
to check for rpc.mountd's existence, and to relaunch if necessary:

------ /etc/cron.hourly/rpc.mountd -------- snip ------
#!/bin/sh

. /etc/rc.d/init.d/functions

dead=0
prog=rpc.mountd
pid=`pidof $prog`

#Only do check if nfs subsystem is activated
if [ -f /var/lock/subsys/nfs ]; then
  if [ "$pid" != "" ]; then
    dead=0
  else
    dead=1
    date
    echo -n "$prog dead... restarting:"
    daemon /usr/sbin/rpc.mountd --no-nfs-version 3
  fi
fi

-------- /etc/cron.hourly/rpc.mountd ------- snip ------

Comment 5 Cristian Gafton 2000-08-09 02:35:59 UTC
assigned to johnsonm

Comment 6 Stephen John Smoogen 2003-01-25 00:56:30 UTC
Bug 7483 is closed because the problem seems to have been fixed with the major
changes in kernel, nfs and other utilities between 7.0 and 7.3. Our servers seem
to be similarly set up with 200 nfs clients and no stale handle problems


Note You need to log in before you can comment on or make changes to this bug.