Bug 84548

Summary:	NFS Client Transfer Rates Extremely Slow or non-functional
Product:	[Retired] Red Hat Linux	Reporter:	Larry Hauch <larry.hauch>
Component:	kernel	Assignee:	Steve Dickson <steved>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Brian Brock <bbrock>
Severity:	high	Docs Contact:
Priority:	high
Version:	8.0	CC:	dclark, dmay, graham, jadams, jepler, jlcoleman, lnewby, mitr, vipul.lal
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2004-08-11 11:31:15 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Larry Hauch 2003-02-18 18:01:58 UTC

Description of problem:NFS connections from RedHat Linux 8.0 clients to either 
a RedHat Linux 8.0 NFS file server or RedHat Linux 7.x NFS file server are 
extremely slower than RH 7.x clients (using either mount -t nfs server:/share 
or automount.)
Transfer rates from 7.x client of 128MB from either a RH 7.x or RH 8.0 NFS file 
server take an average of 19.77/0.02/2.57 seconds  (real/user/system) using
dd if=/dev/zero of=/mnt/point/testfile bs=8k count=16384
Transfers from a RH 8.0 client to either a RH 7.x or RH 8.0 NFS file server 
take an average of 23m29.64s/0.041s/4.00s using the same command using default 
nfs mount block negotiation.  Setting rsize=8192,wsize=8192, the transfer rates 
drop to an average of 12m12s/0.045/3.24s


Version-Release number of selected component (if applicable):


How reproducible:
setup a local network with one each RH 7.x client, RH 7.x server, RH 8.0 
client, and RH 8.0 server - Network connections should be set to match the 
100Mb/s hub or switch for optimized transfer rates (I used an Intel Express 
330T Hub which handles 100Mb/s Half-Duplex for maximum transfer rates.)
Create separate nfs exported mount points of each of the servers.

Steps to Reproduce:
1. on RH7 client; mount -t nfs rh7server:/export/rh7 /mnt/rh7
2. cd /mnt/rh7; time dd if=/dev/zero of=rh7test bs=8k count=16384
3. record results and repeat using RH7 client and RH8 server
4. record results and repeat using RH8 client and RH7 server
5. record results and repeat using RH8 cleint and RH8 server
    
Actual results:
RH7 client RH7 server: 19.770s/0.020s/2.570s
RH7 client RH8 server: 19.856s/0.030s/1.000s
RH8 client RH7 server: 23m39.636s/0.041s/4.00s
RH8 client RH8 server: 57m679s/0.053s/3.205s
RH8 client RH8 server: (rsize=8192,wsize=8192) 12m.55.559s/0.053s/3.234s
RH8 client RH7 server: (rsize=8192,wsize=8192) 11m.58.826s/0.037s/3.287s

Expected results:
RH8 client should perform close to RH7 client

Additional info: Using ifconfig on all systems before and after transfer showed 
very few errors (less than 2) during transfers.  No other system activity was 
happening, as the test systems were setup on a private network.  The initial 
report coming to me was on a live network, and transfers were taking even 
longer - up to 85x slower than expected.

Comment 1 Need Real Name 2003-03-05 19:27:44 UTC

Our CAD group is currently testing RH 8 and have run into a brick wall
with this same problem. We are waiting on a fix from RH. 
Please copy us on the progress.

Thanks,
dclark

Comment 2 Lew Newby 2003-03-15 00:46:29 UTC

Cadence has also been hit with this issue with major delays in our builds and
testing being a result.

Lew Newby
lnewby

Comment 3 Lew Newby 2003-03-24 19:34:47 UTC

Do we have any ETA on this issue?

Comment 4 Lew Newby 2003-04-02 21:51:13 UTC

This is also starting to have an impact on EE2.1 and the latest kernel revs for
RH7.3

Comment 5 Steve Dickson 2003-04-04 14:39:52 UTC

Do the slower transfers have more rpc badcalls/retrans than the faster ones?
nfsstat -c and nfsstat -s shows rpc badcalls/retrans. Also 
are the more rpc calls with the slower transfers? Again
nfsstat can be used to see this...

Comment 6 Larry Hauch 2003-04-07 19:52:31 UTC

Checking the nfsstat on both the client and the server show 0 badcalls and only 
2 retrans calls.

Comment 7 Derek May 2003-06-27 22:30:46 UTC

This is critical to us moving forward with Redhat 8.x.

Comment 8 Derek May 2003-06-30 16:53:46 UTC

Is this bug getting any attention? Can we expect to see a patch for Redhat 8.x
or are there no plans to fix this in the current kernel? This performance loss
is unacceptable. What do we need to do in icrease the priority of this problem?
How come this bug is still considered NEW and has not been ASSIGNED for 4.5
months even though it is "high" priority and "high" severity?

Comment 9 Jared A. Adams 2003-06-30 19:27:17 UTC

Please expedite this case! We see this as a high priority in moving to Redhat 8.0.
We cannot live with the performance degradation as a result of this bug.. This
problem is causing us in various applications to achieve much less performance
than we currently have with our Sun machines.

Comment 10 Martin Graham 2003-07-03 05:14:26 UTC

Yes, this problem is a solid barrier to 8.x support by Cadence 
(or anyone else, you would think)
Why have we seen no activity or comment by Red Hat on this issue?

Comment 11 Lew Newby 2003-07-06 16:51:36 UTC

Within Cadence IT we have found that the following modifications will improve
performance greatly as well as reliability.

Upgrade to at least kernel-2.4.18-27 if not 2.4.20-*
     (If you are using RedHat as an NFS server and wish to use TCP you will need
to use the 2.4.20-* kernel and modify the config to include support for NFS over
TCP.)
set mount options to rsize=8096,wsize=8096,udp

Until kernel-2.4.20 the TCP implementation of NFS has had severe problems.

At this time I would surmise that RedHat has little if any interest in resolving
issues for the consumer editions of RedHat. If you can replicate this problem in
Enterprise edition 2.1 they may address it.

Lew Newby
Cadence

Comment 12 Lew Newby 2003-07-06 16:54:32 UTC

I fumble fingered the numbers it should be 8192 for rsize and wsize

Lew

Comment 13 Justin Coleman 2003-07-11 13:45:19 UTC

I see that this is a high priority high severity bug, but nothing seems to be
happening, except more people getting hit by it.  What does it take before an
issue will get looked at?  Looking at the list of comments is a like perusing
the who's who of EDA and hardware companies.  So visibility can't be an issue. 
Is more information required?  Is it assigned to the wrong person?

Comment 14 Steve Dickson 2004-08-11 11:31:15 UTC

I believe this turned out to be a VM issue that has been 
fixed in later kernels. Please upgrade to FC1 since RH8
is not longer supported.