From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4a) Gecko/20030501 Description of problem: Every program which writes a larger amount of data on my fast desktop machine to a NFS filesystem on my slower server gets an "Input/output error". The above thread seems to address the same problem. Version-Release number of selected component (if applicable): kernel-2.4.20-x How reproducible: Always Steps to Reproduce: e.g.: vcdxrip -b sfu3x06.bin sfu3x06.bin @ NFS filesystem Actual Results: vcdxrip -b sfu3x06.bin INFO: extracting avseq01.mpg... (start lsn 450 (+357889)) fwrite(): Input/output error Additional info:
I just tried the recommended patch from the above URL and it doesn't work for me.
Using UDP or TCP? Is there server also running kernel-2.4.20-x? Does an ethereal trace show anything interesting? Anything in /var/log/messages?
The server is running 2.4.18-17.8.0 and I had no problems while the client was using the same version. I use the following settings NFS: nfs rw,v3,rsize=4096,wsize=4096,soft,intr,udp,lock /var/log/messages on the nfs client shows the folling kernel errors: May 5 22:43:03 legolas kernel: nfs: server filesrv not responding, timed out The server does not recognize the problems at all. I attach a gziped ethereal trace taken during the problem.
Created attachment 91506 [details] gziped ethereal trace
Well.. From a quick look at the trace it looks like a client generated error... :( What is the vcdxrip -b sfu3x06.bin command and what is it doing?
How are the filesystem being mounted (i.e with what mount options)? Also what type of machine are you using? x86, ia64....
The vcdxrip command is extracting the MPEG-stream of a cdimage file. I used it only to produce a big file (> 700MB) on the NFS partition. The program fails ever since I upgraded the client machine to Red Hat 9 (kernel 2.4.20-9). Client and server are ix86 (32 Bit) machines. I tried to write from the 2.4.20-9 machine (client) to a sparc-sun4u-solaris2.6 and a mips-sgi-irix6.5 machine and it worked without any problems. I get the EIO only between the 2.4.18-17.8.0 kernel (server) and the 2.4.20-9 (client). I use the same mount options for all mounts: rw,v3,rsize=4096,wsize=4096,soft,intr,udp,lock
Things are even worse! I just tried the current kernel (2.4.20-9.i586) on the server side and got the same error. So even the if both machines are running RedHat 9 you can expect to be trapped with a slow NFS server serving a fast client. For verification use e.g. a <= 500MHz machine as server against a >=2GHz client.
I started seeing similar EIO errors on my NFS client after upgrading it to the latest kernel, Red Hat Linux 8.0 kernel 2.4.20-13.8. The NFS server in this case though is running HP-UX 11.0. NFS has worked solidly for a long time prior to this latest kernel, and nothing on the server side changed. In fact I have many other NFS clients using that same server/filesystem, including other HP, AIX, Linux, and even Windows boxes and none of them have any problems except this linux client running with the new 2.4.20 kernel. I have performed strace() on several processes (such as ld and even gtar) which all seem to fail with an EIO; usually during a write(2) system call, but I've also seen a couple other one's such as stat fail too..all with EIO. This error usually seems to occur after a significant amount of I/O is performed in a short amount of time. It is not a space or a quota issue, and I've even seen a write of just 8 bytes fail. BTW, it's using NFS v3 over IPv4 TCP. Using the following mount options: rw,nodev,nosuid,soft,intr,bg,posix,rsize=8192,wsize=8192
More info on my RHL8.0 client... The previous configuration which worked without errors was kernel version 2.4.18-27.8.0-i686 with all Red Hat errata installed just prior to the issue of the 2.4.20 kernel. This is a single-processor 1.4GHz Pentium-3, 1GB memory. There are 92 separate NFS filesystems mounted on this client box, all with the same options. Am using NIS, but not any automounting. All networking components are 100Mbps ethernet. This NFS client is used as a "server" box as part of a compile farm, so there is very little video graphics workload. The NFS server (HP L2000 w/ 4x440MHz PARISC processors) sees and reports no errors of any kind. Also the EIO errors occur sporadically, and never at the same file offset or even the same file. And errors are not always at the end of the file either; I've seen a seek(2) back into the middle of a file following by a write(2) which then failed with EIO.
Still ocurring on latest RH kernel 2.4.20-18.8. I noticed in the kernel.org ChangeLog for 2.4.21 the following entry...is this possibly related?? Summary of changes from v2.4.21-pre5 to v2.4.21-pre6 ... Trond Myklebust <trond.myklebust.no>: o Fix misleading EIO on NFS client o Fix unbalanced kunmap() in NFS symlink code
I encount many "Input/output error"s, normal during nighly updates on clients (using the same NFS share). But compiling and linking 4 small C-Modules failes quite often too (on NFS mounted home). Here, the server is nearly the fastest machine. The Server and most of the clients run Redhat 7.3 (come clients Redhat 9), all suppied with the latest patches : kernel rpms: 2.4.20-18.7 , 2.4.20-18.7smp and 2.4.20-18.9
I found a interims solution for me: I mounted the NFS filesystems hard and no longer soft. I that case the kernel handles the failures and not the application. But still I think there is a problem and I would prefer soft mounting.
If using hard mounts takes care of the problem, then I would argue that this is not a bug. Soft mount will always return IO errors on busy networks and should be used sparely (or not at all imho)....