Bug 128215
Summary: | Extreme network slowdown (NOT tcp_window_scaling) | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Tim Waugh <twaugh> | ||||
Component: | kernel | Assignee: | David Miller <davem> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | rawhide | CC: | bnocera, wtogami | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | 2.6.7-1.499 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-07-29 16:12:49 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Tim Waugh
2004-07-20 09:18:20 UTC
This also happens when 192.168.1.1 is running 2.6.6-1.435 (and 196.168.1.6 is still running 2.6.7-1.488). What you are seeing is "window scaling", that is why the advertised window drops right after connection setup. This is a standard TCP feature that has been around for years, and now Linux is advertising a window scale of "7" by default. Some device (router or NAT box usually, some cases it is a DSL box doing port forwarding) is corrupting the connection due to the presence of window scaling. It is not a Linux bug. I don't think that's it, because the tcpdump capture was performed *on* 192.168.1.6, and so the 192.168.1.6 packets are accurate. I know about the window scaling problem with some hardware, and that's why I reported this separately. This looks different. The packets where the problem happens are these four: 09:47:35.399458 IP 192.168.1.6.32773 > 192.168.1.1.ssh: P 39777:39825(48) ack 1528466 win 65160 <nop,nop,timestamp 337971 87774586> Here the window is advertised as 65160 *bytes* -- there is no wscale option in this packet. 09:47:35.399901 IP 192.168.1.1.ssh > 192.168.1.6.32773: . ack 39825 win 9120 <nop,nop,timestamp 87774587 337971> 09:47:35.446854 IP 192.168.1.1.ssh > 192.168.1.6.32773: P 1528466:1529330(864) ack 39825 win 9120 <nop,nop,timestamp 87774634 337971> There is nothing unusual about the reverse traffic. 09:47:35.447117 IP 192.168.1.6.32773 > 192.168.1.1.ssh: P 39825:39873(48) ack 1529330 win 128 <nop,nop,timestamp 338019 87774634> Suddenly the window is 128 *bytes* -- again, there is no wscale option here. These packets were captured *on* 192.168.1.6 using 'tcpdump -w capture -i eth0'. > and now Linux is advertising
> a window scale of "7" by default.
Incidentally, look at the trace: wscale 0 appears in the first SYN,
not wscale 7. In fact, there are no wscale options in the entire
thing other than that initiating SYN.
Full pcap file available if you want it.
I have also tried disabling tcp_window_scaling on both machines, but still have similar problems: at some point during the connection packets will (a) fragmented due to a small window size (1072 on 192.168.1.6 running 2.6.7-1.488 compared to 23168 on 192.168.1.1 running 2.6.6-1.435) (b) delayed at 192.168.1.1, presumably due to it filling up the advertised window on 192.168.1.6. Again netstat -ntoc on 192.168.1.1 shows a send buffer slowly emptying. So this can't be a window scaling problem alone. Fixing platform field. This seems to be linked to tcp_moderate_rcvbuf. I haven't seen a repeat of the problem since setting it to zero. First of all, window scale options don't go into the individual packets. They only appear in the initial SYN/SYN+ACK exchange, and then if enabled they apply for the entirety of the connection. What you will see happen is that the end running the newer 2.6.x kernels will advertise a window scale of "7" and the other end (something running 2.4.x for example) will use a window scale of "0". Please do an experiment for me, set "tcp_default_win_scale" to zero on the 2.6.x system(s). Let me know what happens. I hope this isn't some x86_64 weird miscompilation issue. They are both runing 2.6.x. I've observed the problem symptoms several times this morning, and this is the state of things: x86_64 (192.168.1.6) 2.6.7-1.488 tcp_moderate_rcvbuf=1 tcp_window_scaling=1 tcp_default_win_scale=0 i686 (192.168.1.1) 2.6.7-1.435 tcp_moderate_rcvbuf=0 tcp_window_scaling=0 tcp_default_win_scale=0 I've now changed tcp_moderate_rcvbuf to 0 on 192.168.1.6 so I can do some work. I have never seen the problem symptoms with tcp_moderate_rcvbuf=0 on 192.168.1.6. Thanks to your excellent reporting I think I know where the problem is, we're letting the window overflow 16-bits somewhere. I'll try to figure out what check is missing. Created attachment 102158 [details]
Fix for TCP window wrapping bug
This should fix it. We were only validating the
range of the window field on connection startup,
not in tcp_select_window() where it needs to happen
as well.
This patch certainly fixes the problem here. Thanks! |