Bug 196715

Summary: broken network devices cause problems when TCP window scaling enabled
Product: [Fedora] Fedora Reporter: James Ralston <ralston>
Component: kernelAssignee: David Miller <davem>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: dale, davej, fedora, kalamatee, mishu, trevin, wtogami, zing
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-05 08:59:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
a mail message concerning TCP performance problems none

Description James Ralston 2006-06-26 17:26:11 UTC
Under 2.6.17-1.2139_FC5, it appears as if network throughput is highly
throttled.  Connections that should be running at multiple Mbit/s run at about
8Kbit/s, and often time out entirely.  I see no unusual messages in syslog.

If I revert to 2.6.16-1.2133_FC5, network throughput returns to normal.

So far, I am seeing this problem on x86_64 using the tg3 driver.  (I will test
i386+tg3 and i386+3c59x soon.)

Comment 1 Dave Jones 2006-06-27 03:08:40 UTC
does it go back to normal if you do ..

echo 0 > /proc/sys/net/ipv4/tcp_window_scaling

(as root)



Comment 2 James Ralston 2006-06-27 14:48:10 UTC
Yes; disabling TCP window scaling fixes the problem.

Additionally, I did not see the problem with a 3c59x card on i386.  (I haven't
been able to test i386+tg3 yet.)


Comment 3 drago01 2006-06-29 18:14:47 UTC
for me TCP windows scaling is disabled by defaul on 2.6.17-1.2139_FC5 x86_64
using forcedeth driver...
why was it enabled in your case?

Comment 4 BigZeta 2006-06-29 23:32:43 UTC
I have the same pb than james ralston and the soluce is the good way but i have
an other pb when the system reboot, net.ipv4.tcp_window_scaling turns back in
ENABLE.
What can we do to valide for every time ???

Comment 5 James Ralston 2006-07-06 15:03:23 UTC
An update:

I don't experience this problem with i386+tg3.  So:

3c59x on i386: GOOD
tg3 on i386: GOOD
tg3 on x86_64: BAD

I'd really like to be able to test a different driver on x86_64, but I don't
have the capability to do so.

Others who have piled onto this thread: with TCP window scaling enabled, do you
see poor network performance on your x86_64 systems with drivers other than tg3?

Also, 2.6.17-1.2145_FC5 is still broken.


Comment 6 James Ralston 2006-07-06 15:05:13 UTC
dragoran: TCP window scaling is enabled by default in the kernel.  If it's
disabled for you, something is disabling it deliberately--I suggest checking
/etc/sysctl.conf.

BigZeta: put the following line in /etc/sysctl.conf:

net.ipv4.tcp_window_scaling = 0

Don't forget to remove that line once this problem is (eventually) fixed--unless
you're on a dialup, TCP window scaling is pretty much a necessity for good
performance.  See:

http://lwn.net/Articles/92727/


Comment 7 Henrique Martins 2006-07-20 04:28:08 UTC
This thing started to bite me when downloading a certain website with firefox
and it would just hang.  It worked fine with the same version of firefox on a
WinXP machine.

I fired up Ethereal and noticed that it stopped midway downloading a certain
javascript file.  I tried downloading the file with wget and got the same
problem, I could only get 4,344 per transaction and it took several "wget -c" in
a row to get the whole file.  Seems like a TCP windowing problem of some sort.

It may have started with 2139 but I'm up to 2157 and the problem persists. My
network card is a MB builtin Gigabit card:
  Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01)
using the b44 driver.

From work I have this same problem on an old machine, both with a 3COM 3509 and
a RealTek card.


Comment 8 Henrique Martins 2006-07-20 04:33:15 UTC
Just reread the whole thread and noticed I forgot to add my CPU.  I have an ASUS
MB with an AMD Athlon(TM) XP 2400+, i.e. x86 not x86_64.

Comment 9 James Ralston 2006-07-24 16:38:13 UTC
Well, so much for the "this problem only affects x86_64 systems" theory.

Dave, any ideas on what the problem is, or how to fix it?  Having to disable TCP
window scaling to work around it is a performance lose...


Comment 10 James Ralston 2006-07-27 20:35:02 UTC
Another data point: disabling TCP window scaling fixed performance issues I had
been seeing since at least 2006-04, and possibly late 2005.

So, really, problems with TCP window scaling have existed for quite some time. 
The 2.6.17-1.2139_FC5 kernel just exacerbated them.


Comment 11 James Ralston 2006-07-27 20:48:40 UTC
Created attachment 133196 [details]
a mail message concerning TCP performance problems

Here's a mail message I sent to an internal mailing list back in April 2006,
trying to track down a network performance problems I was seeing.

Note that the machine I was experiencing problems with was running FC4 (i386)
and kernel 2.6.16-1.2069_FC4!

This FC4 machine is still around, and I just tested to see if disabling TCP
window scaling fixed the network performance problems.	It did.

Conclusion: TCP window scaling is way broken in recent 2.6 kernels, where
"recent" means at least as far back as 2.6.16-1.2069 (and very likely earlier).

Comment 12 James Ralston 2006-07-27 22:25:59 UTC
An additional clarification: I know about the issues with broken routers
rewriting the window scale in the SYN/SYN+ACK packets.  That is *not* the case
here: I have a mix of servers that do and do not experience TCP window scaling
problems sitting on the same network segment.

Furthermore, I verified with tcpdump that the single router in between my source
PC and target servers was not mucking with the window scaling options; the
SYN/SYN+ACK packets captured on both systems are identical.

Prior to 2.6.17-1.2139_FC5, I could only reproduce the problem if the target
system were an x86_64 system, but with 2.6.17-1.2139_FC5, pretty much everything
is broken.  :(


Comment 13 Cott Lang 2006-08-23 23:56:07 UTC
net.ipv4.tcp_window_scaling works for me, this is a killer, my system is
unusable past 2133 - x64 CPU, but running i386 with the skge driver.

Comment 14 James Ralston 2006-08-24 15:11:13 UTC
I think it's pretty much a given at this point that TCP window scaling is badly
broken, regardless of the architecture/driver.

I haven't yet been able to test FC4test2, so I'm not sure if the 2.6.18 kernel
will fix this problem.  (I did briefly review patch-2.6.18-rc4.bz2; it seems to
contain some changes related to congestion handling, but nothing that seems to
directly address TCP window scaling issues.)


Comment 15 Dave Jones 2006-09-12 21:35:13 UTC
as I understand it, the problem is the other end of the connection. Linux
changed the default window sizes in .17, and some operating systems don't deal
well with this change.

DaveM, is that summary correct? I recall something changing in this area, but
I'm fuzzy on details.


Comment 16 James Ralston 2006-09-13 00:36:55 UTC
I don't buy that; I noted in comment 11 that problems related to window scaling
have been visible all the way back to 2.6.16-1.2069, and almost certainly earlier.

Furthermore, in comment 12, I was testing pushing data from a FC5 box to a RHEL4
box, and I tested the (single) intervening router to make sure it wasn't hosing
the window scaling negotation.  Throughput still disintegrated.

The available data point to a bug somewhere in the TCP stack.  It might not be a
bug with window scaling directly, but enabling window scaling certainly brings
the bug to light.

Comment 17 Trevin Beattie 2006-09-29 03:57:22 UTC
I saw the same thing on my system.  At first I thought it was a problem with the
open-source forcedeth driver.  I have a Tyan Thunder K8WE (dual AMD x86_64) with
built-in nVidia nForce dual ethernet ports.  The proprietary nvnet driver and
forcedeth both work at full speed under FC5, but nVidia's driver won't
load/compile under FC6 at all.

I installed an old Intel (e100) based dual-port card to see if the e100 driver
would run any better.  Under FC6, it did not -- it was just as slow as the
forcedeth driver.

After reading the above bug report, I checked my
/proc/sys/net/ipv4/tcp_window_scaling setting and saw that it was turned on.  So
I turned it off, and now the e100 and forcedeth are both running at full speed
under FC6.

Comment 18 David Miller 2006-10-05 08:59:27 UTC
First, it's not about an intermediate system "mucking with the window scale
options in the SYN and SYN+ACK packets"  rather, it's an intermediate box
doing firewalling which doesn't take the window scaling into consideration
when it's doing window validation checks.

Various BSD based firewalls are known to have this problem.  They try to
drop packets which are not in the window, but they also try to be %100
stateless and this means they totally ignore the window scale negotiated at
connection startup.  Obviously, this doesn't work at all.  Stateless validation
of TCP windows is simply not possible when the connection uses window scaling.

All of these followup comments are totally unrelated problems.


Comment 19 Trevin Beattie 2006-10-06 02:55:47 UTC
Would the BSD based firewalls known to have this problem include Cisco routers?

I have a Cisco C837 DSL router running IOS version 12.2(8r)YN.  Cisco's own
documentation
(http://www.cisco.com/en/US/products/sw/iosswrel/ps1839/products_feature_guide09186a0080087d52.html)
states that TCP window scaling was introduced in version 12.2(8)T.  It is not
enabled by default.

I enabled TCP window scaling on my router with a window size of 2^22 bytes, then
re-enabled tcp_window_scaling in my FC6RC3 kernel.  The network ran slow again.

I also tried downloading file with window scaling enabled in the router but
disabled in the Fedora kernel.  In that configuration the network ran at full speed.

So either window scaling is broken in Cisco IOS, broken in the kernel, or broken
somewhere else upstream that I have no control over.  I have no idea how to find
out where the fault lies.  So I'd much rather it just be turned off by default.


Comment 20 David Miller 2006-10-06 15:05:59 UTC
It may make sense to turn it off on a site speific basis when you run
into problems you can't immediately diagnose.  But it works fine,
and has worked fine, for most people over a very long period
of time.

Turning things off by default therefore makes no sense here.  And
it's also not the way you debug things.


Comment 21 Jarod Wilson 2007-01-04 15:17:06 UTC
*** Bug 220870 has been marked as a duplicate of this bug. ***

Comment 22 Trevin Beattie 2007-01-04 16:10:12 UTC
After reading the following from http://lwn.net/Articles/92727/:

"To keep from breaking TCP on systems which do not understand window scaling,
the TCP option can only be provided in the initial SYN packet which initiates
the connection, and scaling can only be used if the SYN+ACK packet sent in
response also contains that option. The scale factor is thus set as part of the
setup handshake, and cannot be changed thereafter.

"The details are still being figured out, but it would appear that some routers
on the net are rewriting the window scale TCP option on SYN packets as they pass
through. In particular, they seem to be setting the scale factor to zero, but
leaving the option in place. The receiving side sees the option, and responds
with a window scale factor of its own. At this point, the initiating system
believes that its scale factor has been accepted, and scales its windows
accordingly. The other end, however, believes that the scale factor is zero."

there's something about this I still don't understand.  Shouldn't the SYN+ACK
packet also contain a scaling factor?  And wouldn't that factor be zero if
that's what the receiving end thinks it is?  Why would the initiating system
believe its scale factor has been accepted if it doesn't receive the same
scaling factor from the receiving side?  If the SYN+ACK packet does not contain
the scaling factor, how can the receiving side possibly negotiate a smaller window?

Comment 23 James Ralston 2007-01-04 17:47:21 UTC
Trevin, the LWN article you're referencing is fairly old, and seems like it's a
"first guess" at the problem.  I don't think it's correct (for reasons I'll
explain shortly).

Since I reported this issue, our Windows guys have spent some time playing with
the Windows Vista beta.  Vista's TCP/IP stack can perform window scaling.  They
have discovered that--surprise, surprise--when they enable TCP window scaling,
they see exactly the same problems as I'm seeing with my Linux
boxes--connections drop, throughput tanks, etc.

I've verified that our intermediate routers aren't mucking with the initial
window scale settings in the SYN and SYN+ACK packets.  I agree with your
assessment in comment 22; I don't see how the initial negotiation could be a
problem, as (per RFC1323) both sides have to agree to window scaling for it to
be enabled.

On final (and very useful) data point: we recently updated the version of IOS
we're running on our intermediate router, and our problems with TCP window
scaling seem to have vanished.  (I say "seem to" because I haven't heard back
from our Windows guys yet, but I can no longer reproduce the problem with my
Linux boxes.)

So, I now think David's comment 18 is dead on: problems with TCP windows scaling
are being caused by buggy intermediate stateful firewalls that aren't properly
following the window scale and thus are dropping valid packets because they
think they're outside the window size, not by the end boxes themselves.

Comment 24 David Miller 2007-01-04 21:23:09 UTC
Furthermore, the negotiation specification of each side is unidirectional.

The SYN gets that side's window scale value.  At that point, the receiving
system has two choices about that value.  It can 1) accept it and thus
send a window scale option back in the SYN+ACK with the scale it would
like to use or 2) reject it, and not send any window scale option at all
in the SYN+ACK.

So it's an all-or-nothing affair.  Also, by sending a window scale option
in the first SYN, that system is implicitly agreeing to whatever scale the
other side advertises in it's SYN+ACK, since there is no way to reject the
window scale that arrives in the SYN+ACK.

It's always intermediate systems that cause the problems, ALWAYS.  These bugs
fall into two classes:

1) Systems that pretend that stateless firewalling of TCP connections is
   possible, and ignore the window scale options in the initial connection
   startup entirely.  OpenBSD falls into this category.  I haven't kept up
   with what they've done about this, if anything.  They may have provided
   a way to deal with this in current releases.

2) Systems that do record the window scales and try to handle them properly,
   but have some implementation bugs that are fixes in system updates. Some
   Cisco products fall into this cateogy.


Comment 25 James Ralston 2007-01-04 23:09:01 UTC
Even more insidious is that the intermediate systems don't fail in an immediate
and obvious way, like (e.g.) causing the initial handshake to fail.  The failure
mode of the intermediate systems essentially looks like bad packet loss, which
can be hard to distinguish from things like, say, bad packet loss.  :p

Also note that netfilter can fall into class #1 if ip_conntrack_tcp_loose is > 0
and ip_conntrack_tcp_be_liberal = 0 (both of which are the default settings);
see bug 191336.


Comment 26 David Miller 2007-01-05 00:27:02 UTC
That bug had been fixed upstream for a long time.  It was difficult to
cure that problem in the product the bugzilla is against because we
aren't allowed to change data structure layout in order to preserve
KABI.  The fix is trivial if you can change the datastructure liberally.


Comment 27 James Ralston 2007-01-05 18:45:49 UTC
True, but the entire reason why we filed bug 191336 is because we ran into that
nasty little problem when we were upgrading servers from RHEL3 to RHEL4.  The
fact that our FC boxes weren't affected didn't help us, because we don't use FC
for server machines.

Some additional information for this bug: we are fairly confident that upgrading
our core router from IOS 12.2(25)S8 to 12.3(19) is what fixed the problem.

(I say "fairly confident" because we have also been massively updating our
switching infrastructure, so if any part of this problem depended on layer 2,
then updating the switches might have also been involved with the resolution. 
But since I strongly suspect this was a layer 3 problem, then updating IOS on
the router is what fixed it, because the router was the only layer 3 device
between the end systems I was testing.)