Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 4 product line. The current stable release is 4.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 523425

Summary:	TCP Frozen Window
Product:	Red Hat Enterprise Linux 4	Reporter:	Chris Horton <christopher.horton>
Component:	kernel	Assignee:	Danny Feng <dfeng>
Status:	CLOSED DUPLICATE	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	medium	Docs Contact:
Priority:	low
Version:	4.4	CC:	fleite
Target Milestone:	rc
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2009-12-29 07:06:04 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Chris Horton 2009-09-15 13:27:16 UTC

Description of problem:

We have a cluster of OAS servers running RHEL4 U4 that are built and configured identically. On one of the servers we've been having an issue with a TCP frozen window, but it has not occurred on any of the other servers. When the issue does occur, it only lasts 4-5 seconds but is enough to cause timeouts to the application.

I've searched through the bug lists and the knowledge base and have not been able to find anything applicable to this issue. There is certainly not enough in this ticket to identify a specific new bug, but I would like to know if there is an existing issue/fix that might be known. I can supply additional information if necessary.

Version-Release number of selected component (if applicable):
$ cat /proc/version
Linux version 2.6.9-42.ELsmp (bhcompile.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-2)) #1 SMP Wed Jul 12 23:27:17 EDT 2006

How reproducible:
It seems to occur irregularly with not specific way to reproduce the issue. When the server is in production, it will occur but not in a predictable manner or timeframe. Testing has been unable to reproduce the issue outside of that.

Additional info:
The issue has been identified using the Opnet tool to do packet captures on all devices involved and analyzing after the situation occurs. The report does not include details of the session, but I'm including the output below:

TCP Frozen Window

Diagnosis
If a tier pair is identified as having a TCP Frozen Window bottleneck, the advertised TCP Receive Window has dropped to a value smaller than the Maximum Segment Size (MSS). This is affecting your application response time.

Explanation
The advertised TCP Receive Window has dropped to a value smaller than the MSS. When this occurs, the sender cannot send any data until the receive window is one MSS or larger.
To determine if the receive window has become larger, the sending side periodically sends one-byte probe packets. These contents of these probe packets depends on the particular implementation, but they are usually sent with an exponential backoff.
The usual case of a TCP frozen window is that the application on the receiving side is not taking data from the TCP receive buffer quickly enough.

Suggestions
Consider the following solutions:
1. Send less data.
2. Have the receiving application retrieve the data more quickly; if the application cannot process all the data at once, consider storing the data in another buffer.
3. Upgrade the receiving machine.

Comment 1 Fabio Olive Leite 2009-12-04 16:26:50 UTC

Unfortunately RHEL-4 contains a TCP receive window clamping issue that was fixed upstream around versions 2.6.14 and 2.6.15. It can be triggered by applications that send a lot of small packets to a slow reader.

http://marc.info/?l=linux-kernel&m=112561417224145&w=2

This has received some attention lately, so I'll be looking for the right patch in the next few days. Let's hope we can still fix this for RHEL-4.

Cheers,
Fábio Olivé

Comment 2 Danny Feng 2009-12-15 08:43:36 UTC

   The TCP frozen window issue could be from kernel/application/hardwares, so it could be hard to locate the root cause. How often does the application timeout occur? 

   As mentioned in comment #2, rhel4 kernel has a receive window clamp issue, the fix is upstream commit 09e9ec87. I'm not sure if this can fix reporter TCP frozen window issue, but we can handle better with this commit.

Comment 3 Danny Feng 2009-12-29 07:06:04 UTC


*** This bug has been marked as a duplicate of bug 546324 ***