90159 – fwrite() returns EIO on fast NFS-client writing to slow server

Bug 90159 - fwrite() returns EIO on fast NFS-client writing to slow server

Summary: fwrite() returns EIO on fast NFS-client writing to slow server

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	9
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Steve Dickson
QA Contact:	Brian Brock
Docs Contact:
URL:	http://www.ussg.iu.edu/hypermail/linu...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-05-03 22:29 UTC by Joerg Lehrke
Modified:	2007-04-18 16:53 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-08-11 11:35:37 UTC
Embargoed:

Attachments	(Terms of Use)
gziped ethereal trace (48.92 KB, application/octet-stream) 2003-05-05 21:32 UTC, Joerg Lehrke	no flags	Details
View All

Description Joerg Lehrke 2003-05-03 22:29:31 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4a) Gecko/20030501

Description of problem:
Every program which writes a larger amount of data on my fast desktop machine to
a NFS filesystem on my slower server gets an "Input/output error". The above
thread seems to address the same problem.

Version-Release number of selected component (if applicable):
kernel-2.4.20-x

How reproducible:
Always

Steps to Reproduce:
e.g.: vcdxrip -b sfu3x06.bin
sfu3x06.bin @ NFS filesystem

Actual Results:  vcdxrip -b sfu3x06.bin
   INFO: extracting avseq01.mpg... (start lsn 450 (+357889))
fwrite(): Input/output error


Additional info:

Comment 1 Joerg Lehrke 2003-05-03 23:12:11 UTC

I just tried the recommended patch from the above URL and it doesn't work for me.

Comment 2 Steve Dickson 2003-05-05 10:56:03 UTC

Using UDP or TCP?
Is there server also running kernel-2.4.20-x?
Does an ethereal trace show anything interesting?
Anything in /var/log/messages?

Comment 3 Joerg Lehrke 2003-05-05 21:02:07 UTC

The server is running 2.4.18-17.8.0 and I had no problems while the client was
using the same version. I use the following settings NFS:
nfs rw,v3,rsize=4096,wsize=4096,soft,intr,udp,lock
/var/log/messages on the nfs client shows the folling kernel errors:
May  5 22:43:03 legolas kernel: nfs: server filesrv not responding, timed out
The server does not recognize the problems at all.
I attach a gziped ethereal trace taken during the problem.

Comment 4 Joerg Lehrke 2003-05-05 21:32:23 UTC

Created attachment 91506 [details]
gziped ethereal trace

Comment 5 Steve Dickson 2003-05-06 20:19:16 UTC

Well.. From a quick look at the trace it looks 
like a client generated error... :(

What is the vcdxrip -b sfu3x06.bin command
and what is it doing?

Comment 6 Steve Dickson 2003-05-06 20:56:35 UTC

How are the filesystem being mounted (i.e with what mount options)?
Also what type of machine are you using? x86, ia64....

Comment 7 Joerg Lehrke 2003-05-07 20:33:36 UTC

The vcdxrip command is extracting the MPEG-stream of a cdimage file. I used it
only to produce a big file (> 700MB) on the NFS partition. The program fails
ever since I upgraded the client machine to Red Hat 9 (kernel 2.4.20-9). Client
and server are ix86 (32 Bit) machines.
I tried to write from the 2.4.20-9 machine (client) to a sparc-sun4u-solaris2.6
and a mips-sgi-irix6.5 machine and it worked without any problems.
I get the EIO only between the 2.4.18-17.8.0 kernel (server) and the 2.4.20-9
(client). I use the same mount options for all mounts:
rw,v3,rsize=4096,wsize=4096,soft,intr,udp,lock

Comment 8 Joerg Lehrke 2003-05-11 18:45:05 UTC

Things are even worse! I just tried the current kernel (2.4.20-9.i586) on the
server side and got the same error. So even the if both machines are running
RedHat 9 you can expect to be trapped with a slow NFS server serving a fast
client. For verification use e.g. a <= 500MHz machine as server against a >=2GHz
client.

Comment 9 Deron Meranda 2003-05-19 16:29:08 UTC

I started seeing similar EIO errors on my NFS client after upgrading it to the
latest kernel, Red Hat Linux 8.0 kernel 2.4.20-13.8.  The NFS server in this
case though is running HP-UX 11.0.  NFS has worked solidly for a long time prior
to this latest kernel, and nothing on the server side changed.  In fact I have
many other NFS clients using that same server/filesystem, including other HP,
AIX, Linux, and even Windows boxes and none of them have any problems except
this linux client running with the new 2.4.20 kernel.

I have performed strace() on several processes (such as ld and even gtar) which
all seem to fail with an EIO; usually during a write(2) system call, but I've
also seen a couple other one's such as stat fail too..all with EIO.  This error
usually seems to occur after a significant amount of I/O is performed in a short
amount of time.  It is not a space or a quota issue, and I've even seen a write
of just 8 bytes fail.

BTW, it's using NFS v3 over IPv4 TCP.  Using the following mount options:
   rw,nodev,nosuid,soft,intr,bg,posix,rsize=8192,wsize=8192

Comment 10 Deron Meranda 2003-05-19 16:55:09 UTC

More info on my RHL8.0 client...  The previous configuration which worked
without errors was kernel version 2.4.18-27.8.0-i686 with all Red Hat errata
installed just prior to the issue of the 2.4.20 kernel.  This is a
single-processor 1.4GHz Pentium-3, 1GB memory.  There are 92 separate NFS
filesystems mounted on this client box, all with the same options.  Am using
NIS, but not any automounting.  All networking components are 100Mbps ethernet.
 This NFS client is used as a "server" box as part of a compile farm, so there
is very little video graphics workload.  The NFS server (HP L2000 w/ 4x440MHz
PARISC processors) sees and reports no errors of any kind.

Also the EIO errors occur sporadically, and never at the same file offset or
even the same file.  And errors are not always at the end of the file either;
I've seen a seek(2) back into the middle of a file following by a write(2) which
then failed with EIO.

Comment 11 Deron Meranda 2003-06-13 20:32:27 UTC

Still ocurring on latest RH kernel 2.4.20-18.8.

I noticed in the kernel.org ChangeLog for 2.4.21 the following entry...is this
possibly related??

  Summary of changes from v2.4.21-pre5 to v2.4.21-pre6
  ...
  Trond Myklebust <trond.myklebust.no>:
     o Fix misleading EIO on NFS client
     o Fix unbalanced kunmap() in NFS symlink code

Comment 12 Herbert Gasiorowski 2003-07-11 09:56:08 UTC

I encount many "Input/output error"s, normal during nighly updates
on clients (using the same NFS share).

But compiling and linking 4 small C-Modules failes quite often too
(on NFS mounted home).

Here, the server is nearly the fastest machine.

The Server and most of the clients run Redhat 7.3 (come clients Redhat 9),
all suppied with the latest patches : kernel rpms:
2.4.20-18.7 , 2.4.20-18.7smp  and 2.4.20-18.9

Comment 13 Joerg Lehrke 2003-07-12 08:38:09 UTC

I found a interims solution for me: I mounted the NFS filesystems hard and no
longer soft. I that case the kernel handles the failures and not the application.
But still I think there is a problem and I would prefer soft mounting.

Comment 14 Steve Dickson 2004-08-11 11:35:37 UTC

If using hard mounts takes care of the problem, then I would
argue that this is not a bug. Soft mount will always return
IO errors on busy networks and should be used sparely (or
not at all imho)....

Note You need to log in before you can comment on or make changes to this bug.