1735 – nfs server crashes with high usage from solaris

Bug 1735 - nfs server crashes with high usage from solaris

Summary: nfs server crashes with high usage from solaris

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	nfs-server
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Michael K. Johnson
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	1999-03-24 11:05 UTC by Karl Berry
Modified:	2008-05-01 15:37 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2002-12-15 04:11:54 UTC
Embargoed:

Attachments	(Terms of Use)

Description Karl Berry 1999-03-24 11:05:45 UTC

The NFS server shipped with 5.2 crashes regularly under high
stress.  By crash I mean the rpc.nfsd process dies (I have
seen the rpc.mountd process die also, but usually it is just
nfsd).  Then the NFS client starts getting `is a directory'
or `not a file' error on plain files, clearly because the
nfs reads are failing.

I have seen nothing in the system logs to indicate what is
going on.  I've tried running nfsd in debugging mode, but
haven't caught it yet.

Unfortunately I have no recipe guaranteed to reproduce it.
I've seen it happen most often when I do a find on a deep
NFS mounted partition, or doing a cvs update on an NFS
mounted partition, or some other operation that is beating
on the filesystem.

In our present environment, the clients are Solaris 2.7
sparcs and the server is a Dell PowerEdge running Red Hat
5.2. I've had NFS problems with linux since the beginning,
though, including when the clients were also Linux/x86
machines.

I realize this is not an ideal bug report, but I sure hope
the problem can be found.  As it is, we're forced to move
our home directories to a Solaris host :(.

Thanks.

Comment 1 Michael K. Johnson 1999-04-10 01:59:59 UTC

This is not an ideal response, but I can give you some useful
information, at least...

I am told (I have no first-hand knowledge) that there are some
bugs in the solaris client nfs implementation which interact
badly with Linux and for which fixes are available from Sun.
I do not know if the bugs I was told about affect only the
2.2.x kernel-based nfsd or whether they affect the user-level
nfsd as well.

The latest 2.2.x kernels have kernel-based nfs and benefit from
a connectathon session in which NFS interaction was stressed and
the NFS server improved on Linux.  Therefore, you can expect
improvements here in the future.

Comment 2 sxw 1999-10-11 17:28:59 UTC

Just a comment to say that I'm seeing similar problems with
Redhat 5.2, and Solaris 2.5.1 with all recommended patches applied.

rpc.nfsd dies with a SIGSEGV, in the middle of the glibc RPC handling
code. Its unclear at the moment whether the problem is the rpc.nfsd
code polluting something that the libc code needs (or providing bad
parameters), or whether libc is reacting badly to something received
across the network.

Snooping the packets going into our NFS server hasn't yet revealed
anything peculiar at the times that the crashes occur.

I'm going to continue investigating this, as using 6.0 isn't an
option for us at present, and if I find anything I'll update this
report.

Comment 3 ray 2000-03-13 17:22:59 UTC

I am running Solaris 2.6 as the client and RedHat 6.0 as the nfs server.  All I
have to do is copy a large file to the server and rpc.nfsd dies.  When I try to
restart it I get nfssvc: Address already.  I have to actually reboot the server
to fix everything.

Comment 4 Alan Cox 2002-12-15 04:11:54 UTC

unfsd was retired - the bug was eventually fixed however

Note You need to log in before you can comment on or make changes to this bug.