Bug 196715
Summary: | broken network devices cause problems when TCP window scaling enabled | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | James Ralston <ralston> | ||||
Component: | kernel | Assignee: | David Miller <davem> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 5 | CC: | dale, davej, fedora, kalamatee, mishu, trevin, wtogami, zing | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-10-05 08:59:27 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
James Ralston
2006-06-26 17:26:11 UTC
does it go back to normal if you do .. echo 0 > /proc/sys/net/ipv4/tcp_window_scaling (as root) Yes; disabling TCP window scaling fixes the problem. Additionally, I did not see the problem with a 3c59x card on i386. (I haven't been able to test i386+tg3 yet.) for me TCP windows scaling is disabled by defaul on 2.6.17-1.2139_FC5 x86_64 using forcedeth driver... why was it enabled in your case? I have the same pb than james ralston and the soluce is the good way but i have an other pb when the system reboot, net.ipv4.tcp_window_scaling turns back in ENABLE. What can we do to valide for every time ??? An update: I don't experience this problem with i386+tg3. So: 3c59x on i386: GOOD tg3 on i386: GOOD tg3 on x86_64: BAD I'd really like to be able to test a different driver on x86_64, but I don't have the capability to do so. Others who have piled onto this thread: with TCP window scaling enabled, do you see poor network performance on your x86_64 systems with drivers other than tg3? Also, 2.6.17-1.2145_FC5 is still broken. dragoran: TCP window scaling is enabled by default in the kernel. If it's disabled for you, something is disabling it deliberately--I suggest checking /etc/sysctl.conf. BigZeta: put the following line in /etc/sysctl.conf: net.ipv4.tcp_window_scaling = 0 Don't forget to remove that line once this problem is (eventually) fixed--unless you're on a dialup, TCP window scaling is pretty much a necessity for good performance. See: http://lwn.net/Articles/92727/ This thing started to bite me when downloading a certain website with firefox and it would just hang. It worked fine with the same version of firefox on a WinXP machine. I fired up Ethereal and noticed that it stopped midway downloading a certain javascript file. I tried downloading the file with wget and got the same problem, I could only get 4,344 per transaction and it took several "wget -c" in a row to get the whole file. Seems like a TCP windowing problem of some sort. It may have started with 2139 but I'm up to 2157 and the problem persists. My network card is a MB builtin Gigabit card: Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01) using the b44 driver. From work I have this same problem on an old machine, both with a 3COM 3509 and a RealTek card. Just reread the whole thread and noticed I forgot to add my CPU. I have an ASUS MB with an AMD Athlon(TM) XP 2400+, i.e. x86 not x86_64. Well, so much for the "this problem only affects x86_64 systems" theory. Dave, any ideas on what the problem is, or how to fix it? Having to disable TCP window scaling to work around it is a performance lose... Another data point: disabling TCP window scaling fixed performance issues I had been seeing since at least 2006-04, and possibly late 2005. So, really, problems with TCP window scaling have existed for quite some time. The 2.6.17-1.2139_FC5 kernel just exacerbated them. Created attachment 133196 [details]
a mail message concerning TCP performance problems
Here's a mail message I sent to an internal mailing list back in April 2006,
trying to track down a network performance problems I was seeing.
Note that the machine I was experiencing problems with was running FC4 (i386)
and kernel 2.6.16-1.2069_FC4!
This FC4 machine is still around, and I just tested to see if disabling TCP
window scaling fixed the network performance problems. It did.
Conclusion: TCP window scaling is way broken in recent 2.6 kernels, where
"recent" means at least as far back as 2.6.16-1.2069 (and very likely earlier).
An additional clarification: I know about the issues with broken routers rewriting the window scale in the SYN/SYN+ACK packets. That is *not* the case here: I have a mix of servers that do and do not experience TCP window scaling problems sitting on the same network segment. Furthermore, I verified with tcpdump that the single router in between my source PC and target servers was not mucking with the window scaling options; the SYN/SYN+ACK packets captured on both systems are identical. Prior to 2.6.17-1.2139_FC5, I could only reproduce the problem if the target system were an x86_64 system, but with 2.6.17-1.2139_FC5, pretty much everything is broken. :( net.ipv4.tcp_window_scaling works for me, this is a killer, my system is unusable past 2133 - x64 CPU, but running i386 with the skge driver. I think it's pretty much a given at this point that TCP window scaling is badly broken, regardless of the architecture/driver. I haven't yet been able to test FC4test2, so I'm not sure if the 2.6.18 kernel will fix this problem. (I did briefly review patch-2.6.18-rc4.bz2; it seems to contain some changes related to congestion handling, but nothing that seems to directly address TCP window scaling issues.) as I understand it, the problem is the other end of the connection. Linux changed the default window sizes in .17, and some operating systems don't deal well with this change. DaveM, is that summary correct? I recall something changing in this area, but I'm fuzzy on details. I don't buy that; I noted in comment 11 that problems related to window scaling have been visible all the way back to 2.6.16-1.2069, and almost certainly earlier. Furthermore, in comment 12, I was testing pushing data from a FC5 box to a RHEL4 box, and I tested the (single) intervening router to make sure it wasn't hosing the window scaling negotation. Throughput still disintegrated. The available data point to a bug somewhere in the TCP stack. It might not be a bug with window scaling directly, but enabling window scaling certainly brings the bug to light. I saw the same thing on my system. At first I thought it was a problem with the open-source forcedeth driver. I have a Tyan Thunder K8WE (dual AMD x86_64) with built-in nVidia nForce dual ethernet ports. The proprietary nvnet driver and forcedeth both work at full speed under FC5, but nVidia's driver won't load/compile under FC6 at all. I installed an old Intel (e100) based dual-port card to see if the e100 driver would run any better. Under FC6, it did not -- it was just as slow as the forcedeth driver. After reading the above bug report, I checked my /proc/sys/net/ipv4/tcp_window_scaling setting and saw that it was turned on. So I turned it off, and now the e100 and forcedeth are both running at full speed under FC6. First, it's not about an intermediate system "mucking with the window scale options in the SYN and SYN+ACK packets" rather, it's an intermediate box doing firewalling which doesn't take the window scaling into consideration when it's doing window validation checks. Various BSD based firewalls are known to have this problem. They try to drop packets which are not in the window, but they also try to be %100 stateless and this means they totally ignore the window scale negotiated at connection startup. Obviously, this doesn't work at all. Stateless validation of TCP windows is simply not possible when the connection uses window scaling. All of these followup comments are totally unrelated problems. Would the BSD based firewalls known to have this problem include Cisco routers? I have a Cisco C837 DSL router running IOS version 12.2(8r)YN. Cisco's own documentation (http://www.cisco.com/en/US/products/sw/iosswrel/ps1839/products_feature_guide09186a0080087d52.html) states that TCP window scaling was introduced in version 12.2(8)T. It is not enabled by default. I enabled TCP window scaling on my router with a window size of 2^22 bytes, then re-enabled tcp_window_scaling in my FC6RC3 kernel. The network ran slow again. I also tried downloading file with window scaling enabled in the router but disabled in the Fedora kernel. In that configuration the network ran at full speed. So either window scaling is broken in Cisco IOS, broken in the kernel, or broken somewhere else upstream that I have no control over. I have no idea how to find out where the fault lies. So I'd much rather it just be turned off by default. It may make sense to turn it off on a site speific basis when you run into problems you can't immediately diagnose. But it works fine, and has worked fine, for most people over a very long period of time. Turning things off by default therefore makes no sense here. And it's also not the way you debug things. *** Bug 220870 has been marked as a duplicate of this bug. *** After reading the following from http://lwn.net/Articles/92727/: "To keep from breaking TCP on systems which do not understand window scaling, the TCP option can only be provided in the initial SYN packet which initiates the connection, and scaling can only be used if the SYN+ACK packet sent in response also contains that option. The scale factor is thus set as part of the setup handshake, and cannot be changed thereafter. "The details are still being figured out, but it would appear that some routers on the net are rewriting the window scale TCP option on SYN packets as they pass through. In particular, they seem to be setting the scale factor to zero, but leaving the option in place. The receiving side sees the option, and responds with a window scale factor of its own. At this point, the initiating system believes that its scale factor has been accepted, and scales its windows accordingly. The other end, however, believes that the scale factor is zero." there's something about this I still don't understand. Shouldn't the SYN+ACK packet also contain a scaling factor? And wouldn't that factor be zero if that's what the receiving end thinks it is? Why would the initiating system believe its scale factor has been accepted if it doesn't receive the same scaling factor from the receiving side? If the SYN+ACK packet does not contain the scaling factor, how can the receiving side possibly negotiate a smaller window? Trevin, the LWN article you're referencing is fairly old, and seems like it's a "first guess" at the problem. I don't think it's correct (for reasons I'll explain shortly). Since I reported this issue, our Windows guys have spent some time playing with the Windows Vista beta. Vista's TCP/IP stack can perform window scaling. They have discovered that--surprise, surprise--when they enable TCP window scaling, they see exactly the same problems as I'm seeing with my Linux boxes--connections drop, throughput tanks, etc. I've verified that our intermediate routers aren't mucking with the initial window scale settings in the SYN and SYN+ACK packets. I agree with your assessment in comment 22; I don't see how the initial negotiation could be a problem, as (per RFC1323) both sides have to agree to window scaling for it to be enabled. On final (and very useful) data point: we recently updated the version of IOS we're running on our intermediate router, and our problems with TCP window scaling seem to have vanished. (I say "seem to" because I haven't heard back from our Windows guys yet, but I can no longer reproduce the problem with my Linux boxes.) So, I now think David's comment 18 is dead on: problems with TCP windows scaling are being caused by buggy intermediate stateful firewalls that aren't properly following the window scale and thus are dropping valid packets because they think they're outside the window size, not by the end boxes themselves. Furthermore, the negotiation specification of each side is unidirectional. The SYN gets that side's window scale value. At that point, the receiving system has two choices about that value. It can 1) accept it and thus send a window scale option back in the SYN+ACK with the scale it would like to use or 2) reject it, and not send any window scale option at all in the SYN+ACK. So it's an all-or-nothing affair. Also, by sending a window scale option in the first SYN, that system is implicitly agreeing to whatever scale the other side advertises in it's SYN+ACK, since there is no way to reject the window scale that arrives in the SYN+ACK. It's always intermediate systems that cause the problems, ALWAYS. These bugs fall into two classes: 1) Systems that pretend that stateless firewalling of TCP connections is possible, and ignore the window scale options in the initial connection startup entirely. OpenBSD falls into this category. I haven't kept up with what they've done about this, if anything. They may have provided a way to deal with this in current releases. 2) Systems that do record the window scales and try to handle them properly, but have some implementation bugs that are fixes in system updates. Some Cisco products fall into this cateogy. Even more insidious is that the intermediate systems don't fail in an immediate and obvious way, like (e.g.) causing the initial handshake to fail. The failure mode of the intermediate systems essentially looks like bad packet loss, which can be hard to distinguish from things like, say, bad packet loss. :p Also note that netfilter can fall into class #1 if ip_conntrack_tcp_loose is > 0 and ip_conntrack_tcp_be_liberal = 0 (both of which are the default settings); see bug 191336. That bug had been fixed upstream for a long time. It was difficult to cure that problem in the product the bugzilla is against because we aren't allowed to change data structure layout in order to preserve KABI. The fix is trivial if you can change the datastructure liberally. True, but the entire reason why we filed bug 191336 is because we ran into that nasty little problem when we were upgrading servers from RHEL3 to RHEL4. The fact that our FC boxes weren't affected didn't help us, because we don't use FC for server machines. Some additional information for this bug: we are fairly confident that upgrading our core router from IOS 12.2(25)S8 to 12.3(19) is what fixed the problem. (I say "fairly confident" because we have also been massively updating our switching infrastructure, so if any part of this problem depended on layer 2, then updating the switches might have also been involved with the resolution. But since I strongly suspect this was a layer 3 problem, then updating IOS on the router is what fixed it, because the router was the only layer 3 device between the end systems I was testing.) |