Bug 627496

Summary: Fix shrinking windows with window scaling
Product: Red Hat Enterprise Linux 5 Reporter: Sumeet Gandhare <sgandhar>
Component: kernelAssignee: Jiri Pirko <jpirko>
Status: CLOSED ERRATA QA Contact: Network QE <network-qe>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.5CC: dhoward, haliu, hjia, jeder, john.daley, jpirko, jwest, nhorman, qcai, rkhan, tao
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When selecting a new window, the tcp_select_window() function tried not to shrink the offered window by using the maximum of the remaining offered window size and the newly calculated window size. The newly calculated window size was always a multiple of the window scaling factor, however, the remaining window size was not since it depended on rcv_wup/rcv_nxt. As a result, a window was shrunk when it was scaled down. With this update, aligning the remaining window to the window scaling factor assures a window is no longer shrunk.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-21 09:26:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 669300    
Attachments:
Description Flags
Upstream patch none

Description Sumeet Gandhare 2010-08-26 08:12:40 UTC
Description of problem:

When selecting a new window, tcp_select_window() tries not to shrink
the offered window by using the maximum of the remaining offered window
size and the newly calculated window size. The newly calculated window
size is always a multiple of the window scaling factor, the remaining
window size however might not be since it depends on rcv_wup/rcv_nxt.
This means we're effectively shrinking the window when scaling it down.

The dump below shows the problem (scaling factor 2^7):

- Window size of 557 (71296) is advertised, up to 3111907257:

IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...>

- New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes
 below the last end:

IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...>

The number 40 results from downscaling the remaining window:

3111907257 - 3111841425 = 65832
65832 / 2^7 = 514
65832 % 2^7 = 40

If the sender uses up the entire window before it is shrunk, this can have
chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq()
will notice that the window has been shrunk since tcp_wnd_end() is before
tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number.
This will fail the receivers checks in tcp_sequence() however since it
is before it's tp->rcv_wup, making it respond with a dupack.

If both sides are in this condition, this leads to a constant flood of
ACKs until the connection times out.

Make sure the window is never shrunk by aligning the remaining window to
the window scaling factor.

Customer would like to get this upstream fix 

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=607bfbf2d55dd1cfe5368b41c2a81a8c9ccf4723 

backported.
Proposed patch for kernel-2.6.18-194:

diff --git a/a/net/ipv4/tcp_output.c b/b/net/ipv4/tcp_output.c
index b4f3ffe..07f1452 100644
--- a/a/net/ipv4/tcp_output.c
+++ b/b/net/ipv4/tcp_output.c
@@ -246,7 +246,7 @@ static u16 tcp_select_window(struct sock *sk)
                *
                * Relax Will Robinson.
                */
-               new_win = cur_win;
+               new_win = ALIGN(cur_win, 1 << tp->rx_opt.rcv_wscale);
       }
       tp->rcv_wnd = new_win;
       tp->rcv_wup = tp->rcv_nxt;

Version-Release number of selected component (if applicable):
RHEL 5.5

How reproducible:

 We don't have a clear steps to reproduce this issue and the issue was appearing intermittently on customer systems. However this patch fixed customer issue permanently.

Customer concern is that, this patch has already included in upstream as well as other destributions like Ubuntu, debian then why we are still far behind to understand this problem.

Ubuntu bug: https://bugs.launchpad.net/ubuntu/hardy/+source/linux/+bug/230456



Steps to Reproduce:
Intermittent
  
Actual results:


Expected results:


Additional info:

Given a test kernel with the patch and Customer confirmed that the test kernel resolves his issue.

Comment 1 Sumeet Gandhare 2010-08-26 08:37:43 UTC
Created attachment 441142 [details]
Upstream patch

Upstream patch which needs to be backported to RHEL 5

Comment 4 RHEL Program Management 2010-12-07 10:26:16 UTC
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 10 Jarod Wilson 2011-01-26 21:07:42 UTC
in kernel-2.6.18-241.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 22 Martin Prpič 2011-07-13 20:19:03 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When selecting a new window, the tcp_select_window() function tried not to shrink the offered window by using the maximum of the remaining offered window size and the newly calculated window size. The newly calculated window size was always a multiple of the window scaling factor, however, the remaining window size was not since it depended on rcv_wup/rcv_nxt. As a result, a window was shrunk when it was scaled down. With this update, aligning the remaining window to the window scaling factor assures a window is no longer shrunk.

Comment 23 errata-xmlrpc 2011-07-21 09:26:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html