Red Hat Bugzilla – Bug 90159
fwrite() returns EIO on fast NFS-client writing to slow server
Last modified: 2007-04-18 12:53:27 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4a) Gecko/20030501
Description of problem:
Every program which writes a larger amount of data on my fast desktop machine to
a NFS filesystem on my slower server gets an "Input/output error". The above
thread seems to address the same problem.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
e.g.: vcdxrip -b sfu3x06.bin
sfu3x06.bin @ NFS filesystem
Actual Results: vcdxrip -b sfu3x06.bin
INFO: extracting avseq01.mpg... (start lsn 450 (+357889))
fwrite(): Input/output error
I just tried the recommended patch from the above URL and it doesn't work for me.
Using UDP or TCP?
Is there server also running kernel-2.4.20-x?
Does an ethereal trace show anything interesting?
Anything in /var/log/messages?
The server is running 2.4.18-17.8.0 and I had no problems while the client was
using the same version. I use the following settings NFS:
/var/log/messages on the nfs client shows the folling kernel errors:
May 5 22:43:03 legolas kernel: nfs: server filesrv not responding, timed out
The server does not recognize the problems at all.
I attach a gziped ethereal trace taken during the problem.
Created attachment 91506 [details]
gziped ethereal trace
Well.. From a quick look at the trace it looks
like a client generated error... :(
What is the vcdxrip -b sfu3x06.bin command
and what is it doing?
How are the filesystem being mounted (i.e with what mount options)?
Also what type of machine are you using? x86, ia64....
The vcdxrip command is extracting the MPEG-stream of a cdimage file. I used it
only to produce a big file (> 700MB) on the NFS partition. The program fails
ever since I upgraded the client machine to Red Hat 9 (kernel 2.4.20-9). Client
and server are ix86 (32 Bit) machines.
I tried to write from the 2.4.20-9 machine (client) to a sparc-sun4u-solaris2.6
and a mips-sgi-irix6.5 machine and it worked without any problems.
I get the EIO only between the 2.4.18-17.8.0 kernel (server) and the 2.4.20-9
(client). I use the same mount options for all mounts:
Things are even worse! I just tried the current kernel (2.4.20-9.i586) on the
server side and got the same error. So even the if both machines are running
RedHat 9 you can expect to be trapped with a slow NFS server serving a fast
client. For verification use e.g. a <= 500MHz machine as server against a >=2GHz
I started seeing similar EIO errors on my NFS client after upgrading it to the
latest kernel, Red Hat Linux 8.0 kernel 2.4.20-13.8. The NFS server in this
case though is running HP-UX 11.0. NFS has worked solidly for a long time prior
to this latest kernel, and nothing on the server side changed. In fact I have
many other NFS clients using that same server/filesystem, including other HP,
AIX, Linux, and even Windows boxes and none of them have any problems except
this linux client running with the new 2.4.20 kernel.
I have performed strace() on several processes (such as ld and even gtar) which
all seem to fail with an EIO; usually during a write(2) system call, but I've
also seen a couple other one's such as stat fail too..all with EIO. This error
usually seems to occur after a significant amount of I/O is performed in a short
amount of time. It is not a space or a quota issue, and I've even seen a write
of just 8 bytes fail.
BTW, it's using NFS v3 over IPv4 TCP. Using the following mount options:
More info on my RHL8.0 client... The previous configuration which worked
without errors was kernel version 2.4.18-27.8.0-i686 with all Red Hat errata
installed just prior to the issue of the 2.4.20 kernel. This is a
single-processor 1.4GHz Pentium-3, 1GB memory. There are 92 separate NFS
filesystems mounted on this client box, all with the same options. Am using
NIS, but not any automounting. All networking components are 100Mbps ethernet.
This NFS client is used as a "server" box as part of a compile farm, so there
is very little video graphics workload. The NFS server (HP L2000 w/ 4x440MHz
PARISC processors) sees and reports no errors of any kind.
Also the EIO errors occur sporadically, and never at the same file offset or
even the same file. And errors are not always at the end of the file either;
I've seen a seek(2) back into the middle of a file following by a write(2) which
then failed with EIO.
Still ocurring on latest RH kernel 2.4.20-18.8.
I noticed in the kernel.org ChangeLog for 2.4.21 the following entry...is this
Summary of changes from v2.4.21-pre5 to v2.4.21-pre6
Trond Myklebust <firstname.lastname@example.org>:
o Fix misleading EIO on NFS client
o Fix unbalanced kunmap() in NFS symlink code
I encount many "Input/output error"s, normal during nighly updates
on clients (using the same NFS share).
But compiling and linking 4 small C-Modules failes quite often too
(on NFS mounted home).
Here, the server is nearly the fastest machine.
The Server and most of the clients run Redhat 7.3 (come clients Redhat 9),
all suppied with the latest patches : kernel rpms:
2.4.20-18.7 , 2.4.20-18.7smp and 2.4.20-18.9
I found a interims solution for me: I mounted the NFS filesystems hard and no
longer soft. I that case the kernel handles the failures and not the application.
But still I think there is a problem and I would prefer soft mounting.
If using hard mounts takes care of the problem, then I would
argue that this is not a bug. Soft mount will always return
IO errors on busy networks and should be used sparely (or
not at all imho)....