Bug 494400 - TCP: Treason uncloaked! during Network Stress Testing
Summary: TCP: Treason uncloaked! during Network Stress Testing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
low
urgent
Target Milestone: rc
: ---
Assignee: Thomas Graf
QA Contact: Network QE
URL: http://rhts.redhat.com/cgi-bin/rhts/t...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-04-06 17:36 UTC by Qian Cai
Modified: 2014-06-18 08:30 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-13 20:46:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Proposed fix for problem 494400. (2.65 KB, patch)
2010-08-21 22:43 UTC, Quentin Barnes
no flags Details | Diff
proposed patch (3.35 KB, patch)
2010-08-25 09:05 UTC, Thomas Graf
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Qian Cai 2009-04-06 17:36:43 UTC
Description of problem:
One of network stress testing is to create async multiple FTP sessions to upload files to a FTP server using vsftpd. By running the test for 5369 seconds, the following messages had been seen,

TCP: Treason uncloaked! Peer 10.16.65.2:30929/43133 shrinks window 1454493069:1454493181. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:11412/32825 shrinks window 1455223983:1455223999. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:60762/47808 shrinks window 1540525259:1540525283. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:60762/47808 shrinks window 1540525259:1540525283. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:29102/34032 shrinks window 1537279726:1537279782. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:48134/40988 shrinks window 2116907755:2116907827. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:17795/40071 shrinks window 2507860403:2507860411. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:63525/38264 shrinks window 2530768102:2530768214. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:6970/35744 shrinks window 2527092526:2527092638. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:58573/38937 shrinks window 2725737921:2725737953. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:41026/54177 shrinks window 2721859297:2721859305. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:45678/41090 shrinks window 2678731282:2678731394. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:61704/36478 shrinks window 2769746775:2769746791. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:33882/60420 shrinks window 2786137782:2786137790. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:12864/56566 shrinks window 2780615668:2780615724. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:63337/54108 shrinks window 2768169071:2768169159. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:23917/46247 shrinks window 2827003487:2827003599. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:21245/43466 shrinks window 2777907290:2777907298. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:56791/35263 shrinks window 2838361343:2838361399. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:33882/60420 shrinks window 2786137782:2786137790. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:37380/55687 shrinks window 2677457939:2677457947. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:18852/44810 shrinks window 2770259308:2770259356. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:23917/46247 shrinks window 2827003487:2827003599. Repaired.
printk: 10 messages suppressed.
TCP: Treason uncloaked! Peer 10.16.65.2:42493/51720 shrinks window 2957366109:2957366125. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:11898/41197 shrinks window 2993448802:2993448874. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:11898/41197 shrinks window 2993448802:2993448874. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:63715/43089 shrinks window 2993490359:2993490407. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:11898/41197 shrinks window 2993448802:2993448874. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:63715/43089 shrinks window 2993490359:2993490407. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:63715/43089 shrinks window 2993490359:2993490407. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:23910/42185 shrinks window 3591659058:3591659130. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:14029/54342 shrinks window 3589837957:3589837981. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:19314/52612 shrinks window 3579204604:3579204652. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:62788/44666 shrinks window 3749790764:3749790836. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:20801/37999 shrinks window 3733524289:3733524401. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:12261/51059 shrinks window 3748299481:3748299553. Repaired.
TCP: Treason uncloaked! Peer 10.16.65.2:49105/43639 shrinks window 96390206:96390254. Repaired.

Version-Release number of selected component (if applicable):
kernel-2.6.18-128.el5.i686
vsftpd-2.0.5-12.el5.i386
curl-7.15.5-2.el5.i386

How reproducible:
unknown

Steps to Reproduce:
1. setup vsftpd FTP client (hp-dl360g5-02.rhts.bos.redhat.com i386) and server (hp-xw6400-01.rhts.bos.redhat.com i386).
2. create file /var/ftp/regularFile in size 1048576 bytes in client.
3. upload the file to the server async for 5369 seconds (the server disk might run out of space).

Comment 1 Tim Fox 2010-07-28 13:01:11 UTC
We (JBoss) are also hitting what appears to be the same bug during performance testing of JBoss HornetQ - which is the messaging component of JBoss Enterprise Application Platform (a Red Hat product).

This causes TCP connections to intermittently hang at very high load. I'd therefore consider this to be a higher priority/severity than is currently set against it, since it means one of Red Hat's main products can hang at high load.

Doing some googling this bug appears already to be fixed in the Linux kernel version 2.6.25 (?).

Please see this link:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5ea3a7480606cef06321cd85bc5113c72d2c7c68

Thanks!

Comment 2 Thomas Graf 2010-08-14 04:17:05 UTC
Right, the bug is fixed upstream and in RHEL6. We should back port this.

Comment 3 Quentin Barnes 2010-08-21 22:42:35 UTC
I think I have a fix for you.  The attached patch has been tested for
three days so far on 350 servers running RHEL5.4 (2.6.18-164.15.1) with
this patch.  The servers no longer suffer pauses in network traffic
nor see the erroneous "treason uncloaked" messages.


The function in question, tcp_window_allows(), was changed by several
commits including renaming it to tcp_mss_split_point().

The function in question and how it is invoked have not changed between
RHEL6 Beta2 (2.6.32-44.2) and the current kernel.org source.

The commits which touch the function in question that were evaluated
in creating this patch include:

0e3a4803 Mon Dec 24 21:33:45 2007 -0800
	This is the big one that changes the name of the function and
	rewrites where it is invoked from.  It is completely contained
	within tcp_output.c.  However it is just a dependent change
	and does not address the problem.  I took the commit wholesale
	and applies cleanly except for the last delta to tcp_push_one()
	which is rejected.  I did not use it since this delta did not
	have to do with the this change, but was an unused variable
	cleanup issue left unaddressed from the earlier commit 66f5fe62
	slipped into this commit that I didn't wish to pull in.

90840def Mon Dec 31 04:48:41 2007 -0800
	Rejected this commit leaving the one line changed in
	tcp_mss_split_point() unmodified.  This was a cleanup only
	commit and did not change functionality.  The other changes
	in the commit are unnecessary.  The commit can be applied if
	desired, but will need more dependent commits if applied.

056834d9 Mon Dec 31 14:57:14 2007 -0800
	Just some whitespace cleanup.  I did not apply it in my patch.
	The commit may be applied if desired, but skipped it due to
	it pulling in other dependencies.

5ea3a748 Tue Mar 11 17:55:27 2008 -0700
	This is the commit that fixes the bug in question and is
	mentioned in Comment 1.  It has to be logically applied by hand
	due to not having the above two dependent commits.  The inline
	function tcp_write_queue_tail() has to be expanded manually to
	its skb_peek_tail() call.

17515408 Tue Apr 15 20:36:55 2008 -0700
	This commit is not necessary to fix the bug, but adds an
	efficiency to the code.  It may be skipped if desired, but it
	is an obvious cleanup and brings the tcp_mss_split_point()
	function logically in sync with the RHEL6 Beta2 (2.6.32-44.2)
	and the latest kernel.org version.


Any feedback on the patch is appreciated.

Comment 4 Quentin Barnes 2010-08-21 22:43:23 UTC
Created attachment 440169 [details]
Proposed fix for problem 494400.

Comment 5 Thomas Graf 2010-08-24 08:15:02 UTC
Patch looks good, will test it.

Comment 7 Thomas Graf 2010-08-25 09:05:44 UTC
Created attachment 440868 [details]
proposed patch

Comment 9 RHEL Program Management 2010-09-07 23:19:19 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 11 Jarod Wilson 2010-09-10 21:38:01 UTC
in kernel-2.6.18-219.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 13 Quentin Barnes 2010-09-15 20:42:26 UTC
Unfortunately, the group in my company that had hit the problem won't want to go through the work to install the -219 kernel above since their systems are already fully working.  The 2.6.18-164.15.1 version of the patch I submitted earlier continues to work on those 350 servers flawlessly for four weeks now.  And I checked your RHEL5u5 version of tcp_output.c in -219 against my 5u5 version of the file and it matched exactly (even down to the whitespace).  That kernel build has been rolled out internally in limited release already, so it looks like we're in parallel testing the same code change.

Comment 18 errata-xmlrpc 2011-01-13 20:46:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.