100980 – cp large file within nfs share gets input/output error

Bug 100980 - cp large file within nfs share gets input/output error

Summary: cp large file within nfs share gets input/output error

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	nfs-server
Sub Component:
Version:	9
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Steve Dickson
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-07-28 11:40 UTC by Need Real Name
Modified:	2007-04-18 16:56 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-08-11 11:41:48 UTC
Embargoed:

Attachments	(Terms of Use)

Description Need Real Name 2003-07-28 11:40:15 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)

Description of problem:
server: RH9 kernel-2.4.20-18.9, nfs-utils-1.0.1-3.9
cleints: RH8.0 kernel 2.4.20-18.8, fileutils-4.1.9-11

when clients cp large (>30MB) files within the nfs share, get input/output error

Version-Release number of selected component (if applicable):
RH9 nfs-utils-1.0.1-3.9

How reproducible:
Sometimes

Steps to Reproduce:
1. export a share (rw,sync)
2. mount to a client
3. cp a large file within the share from client
    

Actual Results:  some time the file copied ok, but most of the time get 
input/ouput error and only part of the file copied

Expected Results:  the whole file should be copied

Additional info:

Comment 1 Bill McGonigle 2003-08-13 14:10:52 UTC

Are you mounting with the NFS 'soft' option?   Check out man 5 nfs.
There seems to be something in kernel-2.4.20 that causes more minor timeouts. 
Maybe it's a feature - better timeout reporting or something.
Another suggested workaround was to raise 'retrans' to a higher level to deal
with it, say 20 (from 3).

Comment 2 Bill McGonigle 2004-08-11 15:31:00 UTC

Steve, do you have an insight into the cause of the I/O errors?  I don't argue with 
NOTABUG, 'cause it's a kernel bug if anything, but there might be some default mount 
options that would be appropriate.

Comment 3 Steve Dickson 2004-08-11 15:50:06 UTC

Yes... Soft mounts are generally the reason for I/O errors.
With Soft mounts, the client requests to the server are only
tried once, which means on busy network (especially
with UDP) packets are drop or more likely delayed long
enough where the client will timeout. On normal mounts
(i.e. hard mounts), the request is retried (which generally
works) but with soft mounts they are not.

Comment 4 Bill McGonigle 2004-08-11 17:12:23 UTC

Right, but do you know if something changed in kernel 2.4.20 to significantly increase the 
soft timeouts?  

The issue at hand was that after applying the update that installed 2.4.20 soft mounts that 
were fine under the previous kernel started getting lots of IO errors. 

It's been a while, but as I recall it, you could take the machine, boot into the previous 
kernel (2.4.18?) and do all the NFS with softmounts you wanted and it would be fine (on a 
given network). 

Rebooting into 2.4.20 and repeating the tests showed lots of IO error.

Comment 5 Steve Dickson 2004-08-11 18:06:27 UTC

> Right, but do you know if something changed in kernel 2.4.20 to 
>significantly increase the soft timeouts?

No not that I'm aware of... but I know there was a lot of work done
in the 2.4.21 (RHEL3/FC1) kernels on congestion control (actually
I'm pretty sure we increased the timeout a bit to deal with 64bit
machine) so you might what to try one of those kernels, since 
RH9 (at this point) an supported release...

Note You need to log in before you can comment on or make changes to this bug.