Created attachment 377518 [details] Proposed patch, tested and verified. Description of problem: RHEL4 contains a non-recoverable TCP receive window clamping problem that was fixed around 2.6.14 and 2.6.15. The upstream discussion starts in: http://marc.info/?l=linux-kernel&m=112561417224145&w=2 When a TCP connection starts getting bursts of very small packet and the receive buffer is filled with those high-overhead packets, the kernel decides to coalesce the receive buffer and clamp the receive window to slow down the sender. The problem is that the receive window never fully heals from this conditions even when the traffic pattern changes. Version-Release number of selected component (if applicable): Any current RHEL-4 kernel. How reproducible: Depends on receive buffer size and how much data is allowed to linger there by a slow reader. The python scripts attached reproduce the issue easily by simulating a financial application traffic pattern. Steps to Reproduce: 1. Run slow_receiver.py on a RHEL-4 box; 2. Point the IP adddres of small_sender.py to the RHEL-4 box and run it; 3. Capture traffic and analyze the receive window changes. Actual results: Receive window gets clamped and never fully heals, like the graph attached. Expected results: Receive window expands back to original value when traffic conditions allow for it. Graph with patched kernel attached as well. Additional info: Patch attached, based on upstream thread above and commit 326f36e9e7de362e09745ce6f84b65e7ccac33ba. Workaround: This issue can be worked around by tuning net.ipv4.tcp_rmem way up, so that any existing applications that receive bursts of small packets and take too long to read them get a large enough receive buffer that won't trigger the clamping.
Created attachment 377519 [details] Python script to implement a slow receiver.
Created attachment 377520 [details] Python script to implement a "small sender"
Created attachment 377522 [details] Graph of tcp rwnd values as seen from the receiver side, showing permanent window clamp
Created attachment 377523 [details] Graph of tcp rwnd values as seen from the receiver side, with patch applied
Nice work Fabio. One question, it appears you backported most of the patch, but not all of it. How come you didn't move the sk_rcvbuf check to outside the ofo_win conditional like the upstream patch did?
Neil, With this being RHEL-4, I wanted the simplest possible solution, so I focused on the "two-liner" proposed by Alexey Kuznetsov on <http://marc.info/?l=linux-netdev&m=112568628430571&w=2>. After removing those two lines, I noticed that app_win was no longer needed so I removed any lines calculating it as well. I did not know the effect of applying the whole patch to the older RHEL-4 kernel, so I decided to keep it simple. If you are sure the whole patch applies well and causes no problems, feel free to consider it. :) Regards, Fábio Olivé
well, I'm a bit worried the potential for out of order windows to allow inappropriate expansion of a sockets rx buffer. But I think your right, its more important to minimize change at this point. We'll go with your patch as is. Thanks!
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
*** Bug 523425 has been marked as a duplicate of this bug. ***
Committed in 89.33.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Created attachment 469975 [details] receive window size on 4U8
Created attachment 469976 [details] receive window size on 2.6.9-94
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0263.html