Description of problem: When selecting a new window, tcp_select_window() tries not to shrink the offered window by using the maximum of the remaining offered window size and the newly calculated window size. The newly calculated window size is always a multiple of the window scaling factor, the remaining window size however might not be since it depends on rcv_wup/rcv_nxt. This means we're effectively shrinking the window when scaling it down. The dump below shows the problem (scaling factor 2^7): - Window size of 557 (71296) is advertised, up to 3111907257: IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...> - New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes below the last end: IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...> The number 40 results from downscaling the remaining window: 3111907257 - 3111841425 = 65832 65832 / 2^7 = 514 65832 % 2^7 = 40 If the sender uses up the entire window before it is shrunk, this can have chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq() will notice that the window has been shrunk since tcp_wnd_end() is before tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number. This will fail the receivers checks in tcp_sequence() however since it is before it's tp->rcv_wup, making it respond with a dupack. If both sides are in this condition, this leads to a constant flood of ACKs until the connection times out. Make sure the window is never shrunk by aligning the remaining window to the window scaling factor. Customer would like to get this upstream fix http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=607bfbf2d55dd1cfe5368b41c2a81a8c9ccf4723 backported. Proposed patch for kernel-2.6.18-194: diff --git a/a/net/ipv4/tcp_output.c b/b/net/ipv4/tcp_output.c index b4f3ffe..07f1452 100644 --- a/a/net/ipv4/tcp_output.c +++ b/b/net/ipv4/tcp_output.c @@ -246,7 +246,7 @@ static u16 tcp_select_window(struct sock *sk) * * Relax Will Robinson. */ - new_win = cur_win; + new_win = ALIGN(cur_win, 1 << tp->rx_opt.rcv_wscale); } tp->rcv_wnd = new_win; tp->rcv_wup = tp->rcv_nxt; Version-Release number of selected component (if applicable): RHEL 5.5 How reproducible: We don't have a clear steps to reproduce this issue and the issue was appearing intermittently on customer systems. However this patch fixed customer issue permanently. Customer concern is that, this patch has already included in upstream as well as other destributions like Ubuntu, debian then why we are still far behind to understand this problem. Ubuntu bug: https://bugs.launchpad.net/ubuntu/hardy/+source/linux/+bug/230456 Steps to Reproduce: Intermittent Actual results: Expected results: Additional info: Given a test kernel with the patch and Customer confirmed that the test kernel resolves his issue.
Created attachment 441142 [details] Upstream patch Upstream patch which needs to be backported to RHEL 5
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug.
in kernel-2.6.18-241.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: When selecting a new window, the tcp_select_window() function tried not to shrink the offered window by using the maximum of the remaining offered window size and the newly calculated window size. The newly calculated window size was always a multiple of the window scaling factor, however, the remaining window size was not since it depended on rcv_wup/rcv_nxt. As a result, a window was shrunk when it was scaled down. With this update, aligning the remaining window to the window scaling factor assures a window is no longer shrunk.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html