From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Description of problem:
server: RH9 kernel-2.4.20-18.9, nfs-utils-1.0.1-3.9
cleints: RH8.0 kernel 2.4.20-18.8, fileutils-4.1.9-11
when clients cp large (>30MB) files within the nfs share, get input/output error
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. export a share (rw,sync)
2. mount to a client
3. cp a large file within the share from client
Actual Results: some time the file copied ok, but most of the time get
input/ouput error and only part of the file copied
Expected Results: the whole file should be copied
Are you mounting with the NFS 'soft' option? Check out man 5 nfs.
There seems to be something in kernel-2.4.20 that causes more minor timeouts.
Maybe it's a feature - better timeout reporting or something.
Another suggested workaround was to raise 'retrans' to a higher level to deal
with it, say 20 (from 3).
Steve, do you have an insight into the cause of the I/O errors? I don't argue with
NOTABUG, 'cause it's a kernel bug if anything, but there might be some default mount
options that would be appropriate.
Yes... Soft mounts are generally the reason for I/O errors.
With Soft mounts, the client requests to the server are only
tried once, which means on busy network (especially
with UDP) packets are drop or more likely delayed long
enough where the client will timeout. On normal mounts
(i.e. hard mounts), the request is retried (which generally
works) but with soft mounts they are not.
Right, but do you know if something changed in kernel 2.4.20 to significantly increase the
The issue at hand was that after applying the update that installed 2.4.20 soft mounts that
were fine under the previous kernel started getting lots of IO errors.
It's been a while, but as I recall it, you could take the machine, boot into the previous
kernel (2.4.18?) and do all the NFS with softmounts you wanted and it would be fine (on a
Rebooting into 2.4.20 and repeating the tests showed lots of IO error.
> Right, but do you know if something changed in kernel 2.4.20 to
>significantly increase the soft timeouts?
No not that I'm aware of... but I know there was a lot of work done
in the 2.4.21 (RHEL3/FC1) kernels on congestion control (actually
I'm pretty sure we increased the timeout a bit to deal with 64bit
machine) so you might what to try one of those kernels, since
RH9 (at this point) an supported release...