Bug 90717

Summary: NFS client reads are slow when using UDP
Product: Red Hat Enterprise Linux 3 Reporter: Lans Carstensen <lans>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED WORKSFORME QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: jepler, jim.laverty, lakamine, mkunjal, petrides, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-10-05 22:54:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dtest3.cpp for testing many small read/writes over NFS none

Description Lans Carstensen 2003-05-12 20:31:27 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225

Description of problem:
NFS read performance is 50-75% slower on Red Hat 9 when utilizing UDP instead of
TCP.  Write performance remains high regardless of the configuration.

BTW, I'm pleased that Red Hat is shipping w/ ttcp.  The next thing I'd like to
see added would be the GPL'ed lmbench tools.  The test case below uses "lmdd".

Version-Release number of selected component (if applicable):
2.4.20-9smp

How reproducible:
Always

Steps to Reproduce:
1. Download and build lmbench 2.0.4 from ftp://ftp.bitmover.com/lmbench/
2. Mount an NFS filesystem with "-o nfsvers=3,udp"
3. Generate a file on the NFS filesystem with lmdd to baseline write
performance: lmdd of=/mnt/tmp/file fsync=y count=8192 bs=32k
4. Unmount, then re-mount the NFS filesystem w/ the same options.
5. Do a read performance test:  lmdd if=/mnt/tmp/file count=8192 bs=32k

I'm using a gigabit-attached Network Appliance F960 filer as my NFS server.

Actual Results:  Here are the results from various mount option combinations. 
The first number is what was fed to rsize and wsize (4096, 8192,16384, 32768). 
Writes are always slow when using UDP.

4k NFS v2 write test:
268.4355 MB in 23.3415 secs, 11.5004 MB/sec
4k NFS v2 read test:
268.4355 MB in 48.4379 secs, 5.5418 MB/sec
8k NFS v2 write test:
268.4355 MB in 22.8383 secs, 11.7538 MB/sec
8k NFS v2 read test:
268.4355 MB in 80.1808 secs, 3.3479 MB/sec
16k NFS v2 write test:
268.4355 MB in 22.8416 secs, 11.7520 MB/sec
16k NFS v2 read test:
268.4355 MB in 79.3533 secs, 3.3828 MB/sec
32k NFS v2 write test:
268.4355 MB in 22.7890 secs, 11.7792 MB/sec
32k NFS v2 read test:
268.4355 MB in 82.3852 secs, 3.2583 MB/sec
4k NFS v3 UDP write test:
268.4355 MB in 23.3635 secs, 11.4895 MB/sec
4k NFS v3 UDP read test:
268.4355 MB in 48.6092 secs, 5.5223 MB/sec
8k NFS v3 UDP write test:
268.4355 MB in 22.8077 secs, 11.7695 MB/sec
8k NFS v3 UDP read test:
268.4355 MB in 80.1629 secs, 3.3486 MB/sec
16k NFS v3 UDP write test:
268.4355 MB in 22.8416 secs, 11.7520 MB/sec
16k NFS v3 UDP read test:
268.4355 MB in 73.6243 secs, 3.6460 MB/sec
32k NFS v3 UDP write test:
268.4355 MB in 23.5653 secs, 11.3911 MB/sec
32k NFS v3 UDP read test:
268.4355 MB in 89.6828 secs, 2.9932 MB/sec
4k NFS v3 TCP write test:
268.4355 MB in 23.8105 secs, 11.2738 MB/sec
4k NFS v3 TCP read test:
268.4355 MB in 24.1816 secs, 11.1008 MB/sec
8k NFS v3 TCP write test:
268.4355 MB in 23.3194 secs, 11.5112 MB/sec
8k NFS v3 TCP read test:
268.4355 MB in 23.7496 secs, 11.3027 MB/sec
16k NFS v3 TCP write test:
268.4355 MB in 23.0368 secs, 11.6525 MB/sec
16k NFS v3 TCP read test:
268.4355 MB in 23.5675 secs, 11.3901 MB/sec
32k NFS v3 TCP write test:
268.4355 MB in 22.9225 secs, 11.7105 MB/sec
32k NFS v3 TCP read test:
268.4355 MB in 23.4455 secs, 11.4493 MB/sec


Expected Results:  All results should have been around 11.5 MB/sec, or basically
the same as whatever you can get with "ttcp" - on the same hardware under Red
Hat 7.x you can consistently get around 11.5 MB/sec.

Additional info:

I performed a series of tests with ttcp, as shipped with Red Hat 9.  All
application-level UDP and TCP performance is as it should be.  This appears to
just be related to the NFS client code.

This problem can be worked around by using TCP.  The problems are:

1. NFS over TCP is much more expensive on the server side, and as such doesn't
scale as efficiently.  We witnessed that first hand the last time we attempted
to switch.

2. Most people's defaults are UDP.  People expect NFS to operate similarly
between Red Hat Linux versions and not experience a 50-75% degradation in
performance.

Comment 1 Lans Carstensen 2003-05-12 21:41:18 UTC
I should have also mentioned that I explicitly chkconfig'ed iptables off and
made sure there were no iptables or other traffic engineering modules loaded. 
The version of mount used was "mount-2.11y-9".  The version of nfs-utils used
was "nfs-utils-1.0.1-2.9".  And as mentioned, the kernel is 2.4.20-9smp. 
Non-SMP kernels seemed similarly afflicted, although I didn't collect as much data.

Comment 2 Jim Laverty 2003-06-04 14:14:25 UTC
We have also spoken to other admins who are seeing speed issues with NFS to
other platforms (Solaris and Unixware), with the newer kernels.

We are seeing similar results with the NetApp FAS 960C, using 2.4.20-13.9smp and
2.4.18-26.7smp.  NFS using 2.4.18-5smp is faster by about 30%.

We have tested with iozone, dd and an tiny in-house C app which writes 1mil 256
byte records to a file sequentially.  We can supply the results on request
(files are large).

We have tried various rmem/wmem settings, UDP and TCP mounts, 1K, 4k, 8k, 16k
and 32k w/r sizes , etc.

We are going to run tests on the new RH 9 kernel released yesterday
(2.4.20-18.7) and we will post the results here.


Comment 3 Lans Carstensen 2003-06-05 17:54:28 UTC
Under Red Hat 7.x and 8 the new 2.4.20-18 erratum kernel, this now amounts to a
serious regression.  Performance tests on a 7.2 system upgraded to
2.4.20-18.7smp is included below.  There were no benefits for Red Hat 9 under
this kernel, either.

This is now a really big deal.  Sites concerned about security can't pick up the
security errata and continue to operate in a common centralized NAS environment.

4k NFS v2 write test:
268.4355 MB in 24.3497 secs, 11.0242 MB/sec
4k NFS v2 read test:
268.4355 MB in 50.0570 secs, 5.3626 MB/sec
8k NFS v2 write test:
268.4355 MB in 22.7779 secs, 11.7849 MB/sec
8k NFS v2 read test:
268.4355 MB in 61.5758 secs, 4.3594 MB/sec
16k NFS v2 write test:
268.4355 MB in 22.7875 secs, 11.7800 MB/sec
16k NFS v2 read test:
268.4355 MB in 59.9235 secs, 4.4796 MB/sec
32k NFS v2 write test:
268.4355 MB in 22.7788 secs, 11.7844 MB/sec
32k NFS v2 read test:
268.4355 MB in 60.4922 secs, 4.4375 MB/sec
4k NFS v3 UDP write test:
268.4355 MB in 23.3406 secs, 11.5008 MB/sec
4k NFS v3 UDP read test:
268.4355 MB in 49.9578 secs, 5.3732 MB/sec
8k NFS v3 UDP write test:
268.4355 MB in 22.9060 secs, 11.7190 MB/sec
8k NFS v3 UDP read test:
268.4355 MB in 66.2322 secs, 4.0529 MB/sec
16k NFS v3 UDP write test:
268.4355 MB in 22.6011 secs, 11.8771 MB/sec
16k NFS v3 UDP read test:
268.4355 MB in 83.4446 secs, 3.2169 MB/sec
32k NFS v3 UDP write test:
268.4355 MB in 22.6790 secs, 11.8363 MB/sec
32k NFS v3 UDP read test:
268.4355 MB in 75.9692 secs, 3.5335 MB/sec
4k NFS v3 TCP write test:
268.4355 MB in 23.7628 secs, 11.2964 MB/sec
4k NFS v3 TCP read test:
268.4355 MB in 24.2271 secs, 11.0800 MB/sec
8k NFS v3 TCP write test:
268.4355 MB in 23.2774 secs, 11.5320 MB/sec
8k NFS v3 TCP read test:
268.4355 MB in 23.1907 secs, 11.5751 MB/sec
16k NFS v3 TCP write test:
268.4355 MB in 22.9976 secs, 11.6724 MB/sec
16k NFS v3 TCP read test:
268.4355 MB in 23.0102 secs, 11.6659 MB/sec
32k NFS v3 TCP write test:
268.4355 MB in 22.8861 secs, 11.7292 MB/sec
32k NFS v3 TCP read test:
268.4355 MB in 23.2306 secs, 11.5553 MB/sec

Comment 4 Jim Laverty 2003-06-05 19:21:51 UTC
This problem seems to go away in the very latest errata kernel 2.4.20-18.9.  We
have been testing it with a NetApp FAS 960C, using ONTAP 6.4.1 and ONTAP 6.1.2.
 The testing of this kernel was done with NFS over UDP not TCP, using 8k r/w
sizes.  TCP testing will be done in the next day.

Prior to this the 2.4.18-5smp performed the best for us with NFS.

Comment 5 Lans Carstensen 2003-06-18 20:53:57 UTC
I haven't updated this lately, but 2.4.20-18.9 and 2.4.20-18.9smp still exhibit
the slow UDP read behavior shown above.  We also can't go to later errata on RH
7.2 without seeing this behavior as well.  We're also testing w/ FAS960c's with
ONTAP 6.4.1.

Comment 6 Jim Laverty 2003-06-27 15:55:30 UTC
Our performance on a NetApp FAS 960C using OnTAP 6.4.1 and Red Hat 9
2.4.20-18.9smp is twice the speed as using OnTAP 6.1.2 on our 840C.  We are
doing many high speed sequential writes of 256 bytes (8-12MB/s overall) from a
few dozen servers.  

We are using 8k r/w mounts, with rmem_max and wmem_max set at 256k (larger
numbers show no tangable gains).  We are using the Broadcom BCM57xx and tg3, on
twin PIII Dell 1650s.  Our TCP performance is within fractions of a second to
our UDP performance.

The OnTAP 6.4.1P1 patch released last week however erases all of the speed gains.  

There is a ongoing conversation about this issue on the valhalla list also.

I will attach the C code (dtest3.cpp) we are using to test the worst case writes
from multiple servers.  It will also do read, random read/write and seek test
via cli.  This should build fine with gcc and was not writen with style in mind.

 

Comment 7 Jim Laverty 2003-06-27 16:01:30 UTC
Created attachment 92657 [details]
dtest3.cpp for testing many small read/writes over NFS

The little piece of test code is by no means at the level of IOZone, yet it
mimics the high speed writes of small blocks used by our applications.

Comment 9 Steve Dickson 2004-01-22 18:52:51 UTC
Lans,

I guess I don't understand how your getting your numbers....

when I do 'lmdd of=/mnt/tmp/file fsync=y count=8192 bs=4k', I get
33.5544 MB in 3.3673 secs, 9.9647 MB/sec (i.e. 33MBs of data)

but it seems when you do 
'lmdd of=/mnt/tmp/file fsync=y count=8192 bs=4k', you get
268.4355 MB in 24.3497 secs, 11.0242 MB/sec (i.e. 268MBs of data)

Without really understanding what lmdd is doing it
makes sense I would only get 33MBs of data when
doing 8k of sends with 4k of data (i.e. 8k * 4k = 33M)

The only time I get 268MBs of data is when I do a 
lmdd of=/mnt/pxeon5/home/tmp/file fsync=y count=8192 bs=32k
which is also confusing since 8k *32k == 256M (not 268MB)..

What am I missing?

BTW, I'm not seen the same (or any) perfomance degradation w.r.t
v3 using either udp and tcp during writes or readhs. With v2 
there does seem to be some degradation as well as (or due to) 
an excessive number of retrans.




Comment 10 Steve Dickson 2004-10-14 20:39:45 UTC
Is this still an issue with more recent RHEL3 kernels?

Comment 11 Ernie Petrides 2005-10-05 22:54:19 UTC
Closing due to lack of response.