Bug 523425 - TCP Frozen Window
Summary: TCP Frozen Window
Keywords:
Status: CLOSED DUPLICATE of bug 546324
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.4
Hardware: i686
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Danny Feng
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-15 13:27 UTC by Chris Horton
Modified: 2009-12-29 07:06 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-12-29 07:06:04 UTC


Attachments (Terms of Use)

Description Chris Horton 2009-09-15 13:27:16 UTC
Description of problem:

We have a cluster of OAS servers running RHEL4 U4 that are built and configured identically.  On one of the servers we've been having an issue with a TCP frozen window, but it has not occurred on any of the other servers.  When the issue does occur, it only lasts 4-5 seconds but is enough to cause timeouts to the application.

I've searched through the bug lists and the knowledge base and have not been able to find anything applicable to this issue.  There is certainly not enough in this ticket to identify a specific new bug, but I would like to know if there is an existing issue/fix that might be known.  I can supply additional information if necessary.


Version-Release number of selected component (if applicable):
$ cat /proc/version
Linux version 2.6.9-42.ELsmp (bhcompile@hs20-bc1-1.build.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-2)) #1 SMP Wed Jul 12 23:27:17 EDT 2006


How reproducible:
It seems to occur irregularly with not specific way to reproduce the issue.  When the server is in production, it will occur but not in a predictable manner or timeframe.  Testing has been unable to reproduce the issue outside of that.


Additional info:
The issue has been identified using the Opnet tool to do packet captures on all devices involved and analyzing after the situation occurs.  The report does not include details of the session, but I'm including the output below:

TCP Frozen Window

Diagnosis
If a tier pair is identified as having a TCP Frozen Window bottleneck, the advertised TCP Receive Window has dropped to a value smaller than the Maximum Segment Size (MSS). This is affecting your application response time.

Explanation
The advertised TCP Receive Window has dropped to a value smaller than the MSS. When this occurs, the sender cannot send any data until the receive window is one MSS or larger.
To determine if the receive window has become larger, the sending side periodically sends one-byte probe packets. These contents of these probe packets depends on the particular implementation, but they are usually sent with an exponential backoff.
The usual case of a TCP frozen window is that the application on the receiving side is not taking data from the TCP receive buffer quickly enough.

Suggestions
Consider the following solutions:
1.	Send less data. 
2.	Have the receiving application retrieve the data more quickly; if the application cannot process all the data at once, consider storing the data in another buffer. 
3.	Upgrade the receiving machine.

Comment 1 Fabio Olive Leite 2009-12-04 16:26:50 UTC
Unfortunately RHEL-4 contains a TCP receive window clamping issue that was fixed upstream around versions 2.6.14 and 2.6.15. It can be triggered by applications that send a lot of small packets to a slow reader.

http://marc.info/?l=linux-kernel&m=112561417224145&w=2

This has received some attention lately, so I'll be looking for the right patch in the next few days. Let's hope we can still fix this for RHEL-4.

Cheers,
Fábio Olivé

Comment 2 Danny Feng 2009-12-15 08:43:36 UTC
   The TCP frozen window issue could be from kernel/application/hardwares, so it could be hard to locate the root cause. How often does the application timeout occur? 

   As mentioned in comment #2, rhel4 kernel has a receive window clamp issue, the fix is upstream commit 09e9ec87. I'm not sure if this can fix reporter TCP frozen window issue, but we can handle better with this commit.

Comment 3 Danny Feng 2009-12-29 07:06:04 UTC

*** This bug has been marked as a duplicate of bug 546324 ***


Note You need to log in before you can comment on or make changes to this bug.