Red Hat Bugzilla – Bug 391021
Slow performance writing to NetApp filer
Last modified: 2009-02-24 19:45:06 EST
Description of problem:
We see very slow performance writing from Fedora 7 and 8 clients to our NetApp
FAS3020 filer. I've tested with dd (from /dev/zero), bonnie++, iozone, tar and
simple cp. In all cases, write speed is limited to ~5MB/s on a relatively
unloaded filer. Tests on identical hardware running CentOS 5 achieve ~50MB/s.
This is not a generic NFS problem, as pointing the Fedora boxes at a Panasas
filer and running similar tests yields write speeds on the order of 40MB/s.
Note that read performance is not similarly affected.
I've tested with 2 different client platforms:
1) HP DL140 G3 (dual quad-core Xeon) with NetXtreme BCM5721 NIC (tg3 driver)
running x86_64 OS
2) Dell PowerEdge 1855 (dual NetBurst Xeon) with Intel 82546GB NIC (e1000
driver) running i386 OS
All testing was done using a standard MTU of 1500 and a wide variety of NFS
mount options, none of which made any difference. All networking hardware is
Version-Release number of selected component (if applicable):
Most recent tested is 220.127.116.11-42.fc8.
Steps to Reproduce:
1. Install Fedora 7 or 8.
2. Write to NetApp filer.
Watch it crawl along.
Performance on par with CentOS on the same hardware.
Created attachment 265131 [details]
tshark capture of NFS traffic
I've attached a tshark capture of a simple 'cp' of a 200MB file of random data
from a f8 client to the NetApp. The tshark command used was 'tshark -w
/tmp/bz391021 -s 192 -i eth1 host netapp', and the NFS mount options used were
hmm... it appears you are having a lager number
of TCP retransmissions, which would explain for the
slowness... Now the question is why?
Has there a recent network topology change of some
kind? Maybe a mis-configured router or has there
been a new NIC added the mix? Are both sides using
the same duplex mode?
There was a network reorganization several months ago (before I got here) that
involved adding several VLANs and ACLs. However, the bulk of the clients (which
are still running FC4 -- we're evaluating which distro to move them to) still
perform just fine. It's only F7 and F8 that I've hit this issue with. Also, if
it were a network topology problem, wouldn't we expect CentOS-5 to have the same
problem? I'll attach another tshark capture of the same test from a CentOS-5
client identical (in the same rack even) to the f8 client above.
And, yes, the f8 client reports 1000Mb/s full duplex, as does everywhere between
it and the NetApp.
Created attachment 265211 [details]
tshark capture of NFS traffic from CentOS-5 client
> 'tshark -w /tmp/bz391021 -s 192 -i eth1 host netapp'
Why the -i flag? that stops be from see any NFS traffic?
> Also, if it were a network topology problem, wouldn't we expect CentOS-5 to
> have the same problem?
Well CentOS is based off of RHEL and Fedora is much closer to
the upstream kernel (at the time). Since Upstream changes
quite a bit more than RHEL (or CentOS), regressions
are always a possibility.
Erm, doesn't -i just tell tshark which interface to listen on? I guess it is
redundant, since eth1 is the only active non-loopback interface on these
systems. But it shouldn't stop you from seeing *any* traffic between the client
and the netapp.
Let me know if you need me to re-run the packet captures (and any other flags I
may need to use).
Can you set up a port mirror and capture the traffic from there for the slow
client? Packets might be getting delayed in the client's network adapter before
Also, try turning off TCP window scaling on the client. You will proably need to
unmount and remount the filesystem after doing that.
Oops... I meant the '-s 192' flag... my bad...
Looking at the second post (Comment #4) I still can't
see any NFS traffic... Could you please drop the -s
Ah, that makes sense. Sorry about that. '-s 192' is a tcpdump holdover, where
it defaulted to only grabbing 68 bytes and actually recommended 192 for looking
at NFS traffic.
I've redone the captures, but they're too big to upload to bugzilla. I've put
Fedora 8, tcp_window_scaling=1
Fedora 8, tcp_window_scaling=0
CentOS 5, tcp_window_scaling=1
On f8, turning off TCP window scaling made no difference in the write speed.
wow they are huge... just out of curiosity (with out posting them)
could you capture a trace using CentOS to see if the traces
are as large?
The CentOS capture is up there - third one in the list. I'm guessing that the
huge increase in size is due to the dropping of the -s flag as requested, and
the fact that there is 200MB of 'random' data being copied there - probably not
As pointed out, the CentOS trace is posted and it is large as well (although it
is a fair bit smaller than the Fedora traces). I tried reducing the size of the
test file even further to get the trace size down, but going much smaller really
starts to introduce too much measurement error, IMO.
If the big size of the compressed trace files is an issue, let me know and I can
try sourcing the test files from /dev/zero rather than /dev/urandom -- those
might compress a bit better.
I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.
I am CC'ing myself to this bug and will try and assist you in resolving it if I can.
There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?
If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.
The lack of activity, you may note, was due to the Fedora folks losing interest,
not me. ;) In any case, I haven't been able to track down any spare boxes to
install the most recent kernel on (all my test boxes got put into production),
but I will continue to try to do so and update again sometime next week.
Okay Joshua, thanks for updating this anyway. We're just in the process of
prodding bugs that haven't seen much change - it may generate some renewed
interest. The 2.6.24 kernel has been released and updates-testing should have it
soon so it might be an idea to test with this when it arrives...?
I just tested with the most recent released kernel for F8 (18.104.22.168-107.fc8),
and this problem still exists. It's a *bit* better (I get 10MB/s rather than
5), but still much slower than CentOS-5. If/when a 2.6.24 kernel comes down the
pike, I'll try to give that a shot as well.
Our for our MRG product we create a "realtime" kernel variant
2.6.24-7.72.el5rt also for debug purposes we build a "vanilla" kernel variant
2.6.24-7.72.el5rtvanilla. These kernels can be installed on top of Red Hat
Enterprise Server 5.2.
If you would like to try the 2.6.24-7.72rtvanilla for debug purposes let me
know I can put it up on my people page.
Out of interest what version of Data OnTap are you running on the FAS3020? We have a lot of these devices and have found that in some circumstances NetApp bug 226424 causes some interesting NFS performance issues.
(you need a NetApp login to access this BTW).
At the time, we were running 7.0.3 (yeah, a bit old) on the FAS3020. Since then and for unrelated reasons, we've upgraded to a FAS3070 which is running 7.2.4. Looks like it's time to test again, using F9 this time.
There's not much detail in that NetApp bug, but it *still* seems odd to me that it would affect Fedora so much differently than RHEL/CentOS.
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '8'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 8's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 8 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.
Just to close the loop, I finally got a chance to test with Fedora 10, and this appears to have resolved itself. *shrug*