The NFS server shipped with 5.2 crashes regularly under high stress. By crash I mean the rpc.nfsd process dies (I have seen the rpc.mountd process die also, but usually it is just nfsd). Then the NFS client starts getting `is a directory' or `not a file' error on plain files, clearly because the nfs reads are failing. I have seen nothing in the system logs to indicate what is going on. I've tried running nfsd in debugging mode, but haven't caught it yet. Unfortunately I have no recipe guaranteed to reproduce it. I've seen it happen most often when I do a find on a deep NFS mounted partition, or doing a cvs update on an NFS mounted partition, or some other operation that is beating on the filesystem. In our present environment, the clients are Solaris 2.7 sparcs and the server is a Dell PowerEdge running Red Hat 5.2. I've had NFS problems with linux since the beginning, though, including when the clients were also Linux/x86 machines. I realize this is not an ideal bug report, but I sure hope the problem can be found. As it is, we're forced to move our home directories to a Solaris host :(. Thanks.
This is not an ideal response, but I can give you some useful information, at least... I am told (I have no first-hand knowledge) that there are some bugs in the solaris client nfs implementation which interact badly with Linux and for which fixes are available from Sun. I do not know if the bugs I was told about affect only the 2.2.x kernel-based nfsd or whether they affect the user-level nfsd as well. The latest 2.2.x kernels have kernel-based nfs and benefit from a connectathon session in which NFS interaction was stressed and the NFS server improved on Linux. Therefore, you can expect improvements here in the future.
Just a comment to say that I'm seeing similar problems with Redhat 5.2, and Solaris 2.5.1 with all recommended patches applied. rpc.nfsd dies with a SIGSEGV, in the middle of the glibc RPC handling code. Its unclear at the moment whether the problem is the rpc.nfsd code polluting something that the libc code needs (or providing bad parameters), or whether libc is reacting badly to something received across the network. Snooping the packets going into our NFS server hasn't yet revealed anything peculiar at the times that the crashes occur. I'm going to continue investigating this, as using 6.0 isn't an option for us at present, and if I find anything I'll update this report.
I am running Solaris 2.6 as the client and RedHat 6.0 as the nfs server. All I have to do is copy a large file to the server and rpc.nfsd dies. When I try to restart it I get nfssvc: Address already. I have to actually reboot the server to fix everything.
unfsd was retired - the bug was eventually fixed however