Description of problem: We see very slow performance writing from Fedora 7 and 8 clients to our NetApp FAS3020 filer. I've tested with dd (from /dev/zero), bonnie++, iozone, tar and simple cp. In all cases, write speed is limited to ~5MB/s on a relatively unloaded filer. Tests on identical hardware running CentOS 5 achieve ~50MB/s. This is not a generic NFS problem, as pointing the Fedora boxes at a Panasas filer and running similar tests yields write speeds on the order of 40MB/s. Note that read performance is not similarly affected. I've tested with 2 different client platforms: 1) HP DL140 G3 (dual quad-core Xeon) with NetXtreme BCM5721 NIC (tg3 driver) running x86_64 OS 2) Dell PowerEdge 1855 (dual NetBurst Xeon) with Intel 82546GB NIC (e1000 driver) running i386 OS All testing was done using a standard MTU of 1500 and a wide variety of NFS mount options, none of which made any difference. All networking hardware is from Foundry. Version-Release number of selected component (if applicable): Most recent tested is 2.6.23.1-42.fc8. How reproducible: Every time. Steps to Reproduce: 1. Install Fedora 7 or 8. 2. Write to NetApp filer. Actual results: Watch it crawl along. Expected results: Performance on par with CentOS on the same hardware. Additional info:
Created attachment 265131 [details] tshark capture of NFS traffic I've attached a tshark capture of a simple 'cp' of a 200MB file of random data from a f8 client to the NetApp. The tshark command used was 'tshark -w /tmp/bz391021 -s 192 -i eth1 host netapp', and the NFS mount options used were 'rsize=32768,wsize=32768,hard,intr'.
hmm... it appears you are having a lager number of TCP retransmissions, which would explain for the slowness... Now the question is why? Has there a recent network topology change of some kind? Maybe a mis-configured router or has there been a new NIC added the mix? Are both sides using the same duplex mode?
There was a network reorganization several months ago (before I got here) that involved adding several VLANs and ACLs. However, the bulk of the clients (which are still running FC4 -- we're evaluating which distro to move them to) still perform just fine. It's only F7 and F8 that I've hit this issue with. Also, if it were a network topology problem, wouldn't we expect CentOS-5 to have the same problem? I'll attach another tshark capture of the same test from a CentOS-5 client identical (in the same rack even) to the f8 client above. And, yes, the f8 client reports 1000Mb/s full duplex, as does everywhere between it and the NetApp.
Created attachment 265211 [details] tshark capture of NFS traffic from CentOS-5 client
> 'tshark -w /tmp/bz391021 -s 192 -i eth1 host netapp' ^^^^^^^ Why the -i flag? that stops be from see any NFS traffic? > Also, if it were a network topology problem, wouldn't we expect CentOS-5 to > have the same problem? Well CentOS is based off of RHEL and Fedora is much closer to the upstream kernel (at the time). Since Upstream changes quite a bit more than RHEL (or CentOS), regressions are always a possibility.
Erm, doesn't -i just tell tshark which interface to listen on? I guess it is redundant, since eth1 is the only active non-loopback interface on these systems. But it shouldn't stop you from seeing *any* traffic between the client and the netapp. Let me know if you need me to re-run the packet captures (and any other flags I may need to use).
Can you set up a port mirror and capture the traffic from there for the slow client? Packets might be getting delayed in the client's network adapter before delivery. Also, try turning off TCP window scaling on the client. You will proably need to unmount and remount the filesystem after doing that. sysctl net.ipv4.tcp_window_scaling=0
Oops... I meant the '-s 192' flag... my bad... Looking at the second post (Comment #4) I still can't see any NFS traffic... Could you please drop the -s flag? tia...
Ah, that makes sense. Sorry about that. '-s 192' is a tcpdump holdover, where it defaulted to only grabbing 68 bytes and actually recommended 192 for looking at NFS traffic. I've redone the captures, but they're too big to upload to bugzilla. I've put them at: Fedora 8, tcp_window_scaling=1 http://www.duke.edu/~jlb17/bz391021.f8.ws1.bz2 Fedora 8, tcp_window_scaling=0 http://www.duke.edu/~jlb17/bz391021.f8.ws0.bz2 CentOS 5, tcp_window_scaling=1 http://www.duke.edu/~jlb17/bz391021.c5.ws1.bz2 On f8, turning off TCP window scaling made no difference in the write speed.
wow they are huge... just out of curiosity (with out posting them) could you capture a trace using CentOS to see if the traces are as large?
The CentOS capture is up there - third one in the list. I'm guessing that the huge increase in size is due to the dropping of the -s flag as requested, and the fact that there is 200MB of 'random' data being copied there - probably not too compressible.
As pointed out, the CentOS trace is posted and it is large as well (although it is a fair bit smaller than the Fedora traces). I tried reducing the size of the test file even further to get the trace size down, but going much smaller really starts to introduce too much measurement error, IMO. If the big size of the compressed trace files is an issue, let me know and I can try sourcing the test files from /dev/zero rather than /dev/urandom -- those might compress a bit better.
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest kernel? If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged.
The lack of activity, you may note, was due to the Fedora folks losing interest, not me. ;) In any case, I haven't been able to track down any spare boxes to install the most recent kernel on (all my test boxes got put into production), but I will continue to try to do so and update again sometime next week. Thanks.
Okay Joshua, thanks for updating this anyway. We're just in the process of prodding bugs that haven't seen much change - it may generate some renewed interest. The 2.6.24 kernel has been released and updates-testing should have it soon so it might be an idea to test with this when it arrives...? Cheers Chris
I just tested with the most recent released kernel for F8 (2.6.23.14-107.fc8), and this problem still exists. It's a *bit* better (I get 10MB/s rather than 5), but still much slower than CentOS-5. If/when a 2.6.24 kernel comes down the pike, I'll try to give that a shot as well.
Joshua, Our for our MRG product we create a "realtime" kernel variant 2.6.24-7.72.el5rt also for debug purposes we build a "vanilla" kernel variant 2.6.24-7.72.el5rtvanilla. These kernels can be installed on top of Red Hat Enterprise Server 5.2. If you would like to try the 2.6.24-7.72rtvanilla for debug purposes let me know I can put it up on my people page. Jeff
Out of interest what version of Data OnTap are you running on the FAS3020? We have a lot of these devices and have found that in some circumstances NetApp bug 226424 causes some interesting NFS performance issues. http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=226424 (you need a NetApp login to access this BTW).
At the time, we were running 7.0.3 (yeah, a bit old) on the FAS3020. Since then and for unrelated reasons, we've upgraded to a FAS3070 which is running 7.2.4. Looks like it's time to test again, using F9 this time. There's not much detail in that NetApp bug, but it *still* seems odd to me that it would affect Fedora so much differently than RHEL/CentOS.
This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.
Just to close the loop, I finally got a chance to test with Fedora 10, and this appears to have resolved itself. *shrug*