Bug 714670 - TCP_CRR and concurrent TCP stream tests over IPv6 sometime fails on rhel5.7
Summary: TCP_CRR and concurrent TCP stream tests over IPv6 sometime fails on rhel5.7
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.7
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Jiri Benc
QA Contact: Adam Okuliar
URL:
Whiteboard:
Depends On:
Blocks: 742099 784372
TreeView+ depends on / blocked
 
Reported: 2011-06-20 12:04 UTC by Adam Okuliar
Modified: 2012-02-21 03:39 UTC (History)
7 users (show)

Fixed In Version: kernel-2.6.18-305.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 742099 (view as bug list)
Environment:
Last Closed: 2012-02-21 03:39:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:0150 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux 5.8 kernel update 2012-02-21 07:35:24 UTC

Comment 1 Jiri Benc 2011-10-10 17:02:12 UTC
Cannot reproduce using KVM in ~30 runs. Could you provide more details about the setup (or access to the machines showing the problem, if you still have them)?

Comment 2 Adam Okuliar 2011-10-11 13:53:39 UTC
Hi Jiri,

I prepared two affected systems for you. You can use

redclient-01.rhts.bos.redhat.com
redclient-02.rhts.bos.redhat.com

Please run:
sysctl net.ipv4.tcp_tw_reuse=1
to enable TIME_WAIT connections reusing. 

Results of tests on redclients are following:
for i in `seq 1 30`; do netperf  -H fd20::2 -L fd20::1 -t TCP_CRR -l 60 -P0; done 
16384  87380  1        1       60.00    4365.31   
16384  87380 
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
16384  87380  1        1       60.00    4354.53   
16384  87380 
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
16384  87380  1        1       60.00    4353.37   
16384  87380 
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
16384  87380  1        1       60.00    4352.90   
16384  87380 
send_tcp_conn_rr: data recv error: Connection reset by peer
16384  87380  1        1       60.00    4357.50   
16384  87380 
16384  87380  1        1       60.00    4358.20   
16384  87380 
16384  87380  1        1       60.00    4363.02   
16384  87380 
16384  87380  1        1       60.00    4360.61   
16384  87380 
16384  87380  1        1       60.00    4356.58   
16384  87380 
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer
send_tcp_conn_rr: data recv error: Connection reset by peer

Redclient machines are reserved for 7 days, but please try to investigate this problem ASAP, Shacks team uses these machines for NFS testing.

Comment 3 Adam Okuliar 2011-10-11 16:02:48 UTC
I believe that sysctl net.ipv4.tcp_tw_reuse=1 does not work for ipv6 connections.

for i in `seq 1 10`; do netperf  -H 192.168.1.2 -t TCP_CRR -l 60 -P0>/dev/null ; netstat -nta | grep TIME_WAIT | wc -l;done
19
17
25
17
18
18
30
15
19
21

During IPv4 CRR test number of connections in TIME_WAIT state stays small during all time.

for i in `seq 1 10`; do netperf  -H fd20::2 -Lfd20::1 -t TCP_CRR -l 60 -P0>/dev/null ; netstat -nta | grep TIME_WAIT | wc -l;done
send_tcp_conn_rr: data recv error: Connection reset by peer
109
send_tcp_conn_rr: data recv error: Connection reset by peer
146
send_tcp_conn_rr: data recv error: Connection reset by peer
11553
send_tcp_conn_rr: data recv error: Connection reset by peer

During IPv6 CRR test number of connections in TIME_WAIT state rises rapidly until it exhausts whole port range available for assigning TCP source ports.

Comment 4 Jiri Benc 2011-10-17 19:58:31 UTC
> redclient-01.rhts.bos.redhat.com
> redclient-02.rhts.bos.redhat.com

The machines have been down today. I still haven't found the culprit; If I can get access to the machines again for few hours, I'll gather as much data as possible and will continue the analysis off-line.

The TIME_WAIT reuse is not the source of the problem, as the sockets are opened with SO_REUSEADDR (but you're correct that tcp_tw_reuse is not supported on IPv6). Btw, the problem is highly timing-sensitive and some attempts to debug it make it irreproducible.

Comment 5 Jiri Benc 2012-01-03 19:58:51 UTC
Successfully tested the fix from bug 742099 comment 10 on RHEL5.8 kernel.

Comment 7 RHEL Program Management 2012-01-03 20:09:44 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 Jarod Wilson 2012-01-18 17:02:15 UTC
Patch(es) available in kernel-2.6.18-305.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5/
Detailed testing feedback is always welcomed.
If you require guidance regarding testing, please ask the bug assignee.

Comment 11 Adam Okuliar 2012-01-20 14:14:27 UTC
Reproduced on
Linux redclient-01.rhts.bos.redhat.com 2.6.18-268.el5 

Verified on 
Linux redclient-01.rhts.bos.redhat.com 2.6.18-305.el5

Comment 12 errata-xmlrpc 2012-02-21 03:39:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0150.html


Note You need to log in before you can comment on or make changes to this bug.