Red Hat Bugzilla – Bug 65410
RH 7.3/ Solaris NFS problem using nfsvers=3
Last modified: 2007-04-18 12:42:44 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.78 [en] (X11; U; Linux 2.4.18-6mdk i686)
Description of problem:
On a Dell 2450 RH 7.3 machine running kernel-smp-2.4.18-4, I have
an NFS mount from Solaris 8 machine. For large transfers, the linux
server locks up if the mount uses nfsvers=3. If I force the mount to
nfsvers=2, the machine does not lock up.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. mount -o nfsvers=3 solaris8:/export/dk1 /mnt/test
2. cp valhalla-i386-disc1.iso /mnt/test
3. ** Machine locks up **
4. /etc/init.d/network restart
5. umount /mnt/test
6. mount -o nfsvers=2 solaris8:/export/dk1 /mnt/test
7. cp valhalla-i386-disc1.iso /mnt/test
8. ** COPY WORKS! **
This is a serious problem- seems to be related to bugs: 64921, 64984, 65016,
The other bugs suggest a variety of work-arounds. I need to know what really
broke/changed and what the correct workaround is (I've got NT-based servers
running NFS that don't do NFS V2 well).
Seems to be three variables suggested:
1. Change to NFS V2
2. Set the Read/Write block size
3. Set TCP on NFS V3.
Background: I had to upgrade from RH7.2 due to a bug in KDE- KDE was creating
a file in the user's home dir with a colon in the name- illegal when the user's
home was on an NT server (mounted via NFS). RH7.3 deals with this issue
with KDE 3.0, but becomes a show-stopper due to the unexpected major problem
Come on folks! This is EXACTLY the kind of situation my upper management
uses for "Linux/Red Hat isn't ready for prime time" arguments.
I have noticed the exact same problem:
NFS server: Solaris 8
NFS client: Redhat 7.3
On the freshly installed Redhat 7.3 system, copying a large file from the local disk to an NFS mounted
volume would hang the system. Upgrading the Redhat kernel to 2.4.18-4 produced no changes in
Setting "nfsvers=2" in /etc/fstab fixes the problem when copying a large file. However, periodically
(usually after some period of inactivity on the Redhat system), when typing a simple command (like
typing 'ls' while your current directory is an NFS mounted filesystem) will hang the system in the
same way resulting in a series of these messages:
nfs server <hostname> not responding, timed out
This is occuring on a brand new Dell Precision 530MT.
I have similar problems since upgrading to RH7.3.
1) A linux box is running an NFS server and exports a directory. This directory has a word 2000 document. The
directroy is mounted by a win 2K box. From office 2000, I attempt to open the document. Word hangs and
must be killed in the task manager. If I first COPY the file from the NFS directory to a local one on the windows
box, then word opens the file just fine. The NFS client on the windows box is the standard microsoft Unix tools
add-on. This behavior is very consistent.
2) I have various files from various machines that get dropped into an NFS directory. One of the machines
processes these files. Once per minute, a cron job wakes up, walks through the directory, copies the files
locally (I had to do this extra step because of the same problem as above with word), then processes them.
Two problems happen:
A) Files get placed in the NFS directory, then mysteriously disappear. I know this because of the logs kept on
the various servers. Also, the files are serial numbered. I can see that the files were placed onto the NFS
server and I can also see gaps in the serial numbers on the processing machine's logs. All of these machines
are RH Linux. Most are RH7.3. The only straggler in the bunch is the processing machine which is RH6.0 with
security patches. When the NFS server is replaced with an RS/6000 running AIX4.2, everything works great.
This seems to imply it is an issue with the NFS server, not the client.
B) The NFS directory structure does not update properly. The processing machine (Perl script) reads in the
directory listing, then one at a time, copies a file locally, processes it and deletes it from the NFS server. If there
are any problems, it sends me an email. I get emails telling me that various files cannot be found. What is
happening is that the file gets processed and deleted. A minute later, the cron job starts again, looks through
the NFS directory and gets some of the SAME FILENAMES again. When it goes to process these, they have
been deleted a minute prior, so there is an error. I verified in the logs that the "bad" files did indeed get
processed one minute prior. This problem goes away if the NFS server is changed from Linux to an IBM
RS/6000 running AIX4.2.
I suspect that both of the above problems are related. It appears to be a problem with how the NFS directory
structure is managed. Locally, the directory is fine, but to a client, it is screwed up.
The filesystem on the Linux NFS server is ext3.
These problems are holding me back from installing Linux at several client sites. I cannot afford to have an
NFS solution that is not 100%. I will have to go with a Novell or (ick) ms solution if this is not resolved.
all -- if TCP solves your problem, you may have network problems that are
preventing NFS over UDP from working. a packet trace would show exactly
what is happening.
redhat@steeleware -- your problem may be due to close-to-open cache
inconsistency. which version of the kernel runs on your systems?
do some of your problems vanish if you upgrade to 2.4.19 on the clients?
i'm sorry to say i don't have any experience with the Win2K NFS client.
Adding information on completely separate problems to existing bug entries
ensures that the new problems probably won't be handled. Please don't do that.
In any event, this bug is stale since it has not been updated with the effects
of using newer kernel errata (-5 or newer).
This issue seems to be fixed in later kernels