Bug 88982
Summary: | RHL 9 NFS-Server Bug | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Stephen Argo <arganad> |
Component: | nfs-utils | Assignee: | Steve Dickson <steved> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Ben Levenson <benl> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 9 | CC: | david.b.kohrn, feleus, k.georgiou, ldd, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | athlon | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-08-11 11:38:17 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Stephen Argo
2003-04-15 23:08:22 UTC
I have experienced similar problems that are consistent with the NFS server on RH 9 being buggy. I spent most of the day yesterday trying to do an NFS install of RH 9 (from a server already running RH 9), each NFS install failed at random points during the package installation. Sometimes anaconda would complain that a package was missing, sometimes it would terminate with signal 11. I checked the media and it was fine. The hardware I was using is also usually stable. After following many dead-end hypotheses (I probably tried a dozen NFS installs), I tried an FTP installation and it worked on the first try. Note that I did an NFS install of RH 9 using an RH 8 NFS server earlier with the same CD and much of the same hardware and that worked fine. Someone at RH might want to mark bug #89050 as a duplicate of this or vice-versa, by the way. I, too, experience problems with the RH9 nfs server. When trying to write from a client on an nfs-mounted directory, the client will hang and after some time spawn the "nfs server not responding" error message. Read-only shares work ok, since I installed 2 machines via a RH9-based nfs server without a hitch. I set everything up identically to my workplace environment which runs flawlessly on 100+ RH8 computers (which I admin). It seems that kernel versions don't matter at all, since I tried this with kernels 2.4.18-14, 2.4.20-8, 2.4.20-18.9, 2.4.20-19.9 and 2.4.20-20.1.2013.nptl (on the server) with no difference in behaviour. I can state with some certainty that the nfs-utils package is the source of the problem. The RH9 version(s) 1.0.1-2.9 and 1.0.1-3.9 and the rawhide version 1.0.3-4.1 all exhibit this problem, while the RH8 version (1.0.1-2) does not. This of course is an acceptable workaround, but things still smell fishy and need fixing... Whoa, spank me for talking too early! It now seems that nfs service provided by nfs-utils 1.0.1-2 is just as unusable as the others, but for a different (set of) reason: the initial success was achived by not using some normally running services on that particular machine as I was regressing back to a former 'known good' state. My guess is now the following: on a (relatively) pristine desktop-type RH9 install the nfs runs normally from what I can see. Differences between the working nfs server and the disfunctional one are these services: the latter also runs dhcpd, iptables, squid, smb, ups, postfix, yppasswdd and ypserv. When I tested this machine with success, the client wasn't using its services at all (these came from my working server instead). Could it be that one of these services interferes with nfs? FWIW, I reverted all software to initial RH9 versions on this non-functioning nfs server. I am in the process of porting a multi-cpu application from a single IRIX supercomputer to several dual-Xeon boxes. I had planned to share the executables and data files over NFS, so I wouldn't need a copy on each node. The NFS "file not found" errors that happen frequently would seem to make that an untenable solution if my users are going to have a usable system when I'm done. I am running multiple hyperthreaded dual-Xeon machines, using the SMP kernel. I have changed the clock granularity in my kernel (HZ in the make file) to 960 to allow better real-time scheduling. I believe that the severity of this bug should be changed to "high", as NFS is an essential part of most non-trivial UNIX programming environments. I can't even build my applications reliably, since my code base is NFS-mounted from my configuration management server. Would it be possbile to post an ethereal trace (i.e. ethereal -w /tmp/data) of this issue? I have not seen anything like this lately so I'm going to close this as fixed in currentl release. |