From Bugzilla Helper: User-Agent: Mozilla/4.78 [en] (X11; U; Linux 2.4.18-6mdk i686) Description of problem: On a Dell 2450 RH 7.3 machine running kernel-smp-2.4.18-4, I have an NFS mount from Solaris 8 machine. For large transfers, the linux server locks up if the mount uses nfsvers=3. If I force the mount to nfsvers=2, the machine does not lock up. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. mount -o nfsvers=3 solaris8:/export/dk1 /mnt/test 2. cp valhalla-i386-disc1.iso /mnt/test 3. ** Machine locks up ** 4. /etc/init.d/network restart 5. umount /mnt/test 6. mount -o nfsvers=2 solaris8:/export/dk1 /mnt/test 7. cp valhalla-i386-disc1.iso /mnt/test 8. ** COPY WORKS! ** Additional info:
This is a serious problem- seems to be related to bugs: 64921, 64984, 65016, 65069. The other bugs suggest a variety of work-arounds. I need to know what really broke/changed and what the correct workaround is (I've got NT-based servers running NFS that don't do NFS V2 well). Seems to be three variables suggested: 1. Change to NFS V2 2. Set the Read/Write block size 3. Set TCP on NFS V3. Background: I had to upgrade from RH7.2 due to a bug in KDE- KDE was creating a file in the user's home dir with a colon in the name- illegal when the user's home was on an NT server (mounted via NFS). RH7.3 deals with this issue with KDE 3.0, but becomes a show-stopper due to the unexpected major problem with NFS. Come on folks! This is EXACTLY the kind of situation my upper management uses for "Linux/Red Hat isn't ready for prime time" arguments.
I have noticed the exact same problem: NFS server: Solaris 8 NFS client: Redhat 7.3 On the freshly installed Redhat 7.3 system, copying a large file from the local disk to an NFS mounted volume would hang the system. Upgrading the Redhat kernel to 2.4.18-4 produced no changes in the outcome. Setting "nfsvers=2" in /etc/fstab fixes the problem when copying a large file. However, periodically (usually after some period of inactivity on the Redhat system), when typing a simple command (like typing 'ls' while your current directory is an NFS mounted filesystem) will hang the system in the same way resulting in a series of these messages: nfs server <hostname> not responding, timed out This is occuring on a brand new Dell Precision 530MT.
I have similar problems since upgrading to RH7.3. 1) A linux box is running an NFS server and exports a directory. This directory has a word 2000 document. The directroy is mounted by a win 2K box. From office 2000, I attempt to open the document. Word hangs and must be killed in the task manager. If I first COPY the file from the NFS directory to a local one on the windows box, then word opens the file just fine. The NFS client on the windows box is the standard microsoft Unix tools add-on. This behavior is very consistent. 2) I have various files from various machines that get dropped into an NFS directory. One of the machines processes these files. Once per minute, a cron job wakes up, walks through the directory, copies the files locally (I had to do this extra step because of the same problem as above with word), then processes them. Two problems happen: A) Files get placed in the NFS directory, then mysteriously disappear. I know this because of the logs kept on the various servers. Also, the files are serial numbered. I can see that the files were placed onto the NFS server and I can also see gaps in the serial numbers on the processing machine's logs. All of these machines are RH Linux. Most are RH7.3. The only straggler in the bunch is the processing machine which is RH6.0 with security patches. When the NFS server is replaced with an RS/6000 running AIX4.2, everything works great. This seems to imply it is an issue with the NFS server, not the client. B) The NFS directory structure does not update properly. The processing machine (Perl script) reads in the directory listing, then one at a time, copies a file locally, processes it and deletes it from the NFS server. If there are any problems, it sends me an email. I get emails telling me that various files cannot be found. What is happening is that the file gets processed and deleted. A minute later, the cron job starts again, looks through the NFS directory and gets some of the SAME FILENAMES again. When it goes to process these, they have been deleted a minute prior, so there is an error. I verified in the logs that the "bad" files did indeed get processed one minute prior. This problem goes away if the NFS server is changed from Linux to an IBM RS/6000 running AIX4.2. I suspect that both of the above problems are related. It appears to be a problem with how the NFS directory structure is managed. Locally, the directory is fine, but to a client, it is screwed up. The filesystem on the Linux NFS server is ext3. These problems are holding me back from installing Linux at several client sites. I cannot afford to have an NFS solution that is not 100%. I will have to go with a Novell or (ick) ms solution if this is not resolved.
all -- if TCP solves your problem, you may have network problems that are preventing NFS over UDP from working. a packet trace would show exactly what is happening. redhat@steeleware -- your problem may be due to close-to-open cache inconsistency. which version of the kernel runs on your systems? do some of your problems vanish if you upgrade to 2.4.19 on the clients? i'm sorry to say i don't have any experience with the Win2K NFS client.
Adding information on completely separate problems to existing bug entries ensures that the new problems probably won't be handled. Please don't do that. In any event, this bug is stale since it has not been updated with the effects of using newer kernel errata (-5 or newer).
This issue seems to be fixed in later kernels