Bug 714670
Summary: | TCP_CRR and concurrent TCP stream tests over IPv6 sometime fails on rhel5.7 | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Adam Okuliar <aokuliar> | |
Component: | kernel | Assignee: | Jiri Benc <jbenc> | |
Status: | CLOSED ERRATA | QA Contact: | Adam Okuliar <aokuliar> | |
Severity: | medium | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 5.7 | CC: | ccui, davem, haliu, herbert.xu, lmiksik, syeghiay, tgraf | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | kernel-2.6.18-305.el5 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 742099 (view as bug list) | Environment: | ||
Last Closed: | 2012-02-21 03:39:58 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 742099, 784372 |
Comment 1
Jiri Benc
2011-10-10 17:02:12 UTC
Hi Jiri, I prepared two affected systems for you. You can use redclient-01.rhts.bos.redhat.com redclient-02.rhts.bos.redhat.com Please run: sysctl net.ipv4.tcp_tw_reuse=1 to enable TIME_WAIT connections reusing. Results of tests on redclients are following: for i in `seq 1 30`; do netperf -H fd20::2 -L fd20::1 -t TCP_CRR -l 60 -P0; done 16384 87380 1 1 60.00 4365.31 16384 87380 send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer 16384 87380 1 1 60.00 4354.53 16384 87380 send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer 16384 87380 1 1 60.00 4353.37 16384 87380 send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer 16384 87380 1 1 60.00 4352.90 16384 87380 send_tcp_conn_rr: data recv error: Connection reset by peer 16384 87380 1 1 60.00 4357.50 16384 87380 16384 87380 1 1 60.00 4358.20 16384 87380 16384 87380 1 1 60.00 4363.02 16384 87380 16384 87380 1 1 60.00 4360.61 16384 87380 16384 87380 1 1 60.00 4356.58 16384 87380 send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer Redclient machines are reserved for 7 days, but please try to investigate this problem ASAP, Shacks team uses these machines for NFS testing. I believe that sysctl net.ipv4.tcp_tw_reuse=1 does not work for ipv6 connections. for i in `seq 1 10`; do netperf -H 192.168.1.2 -t TCP_CRR -l 60 -P0>/dev/null ; netstat -nta | grep TIME_WAIT | wc -l;done 19 17 25 17 18 18 30 15 19 21 During IPv4 CRR test number of connections in TIME_WAIT state stays small during all time. for i in `seq 1 10`; do netperf -H fd20::2 -Lfd20::1 -t TCP_CRR -l 60 -P0>/dev/null ; netstat -nta | grep TIME_WAIT | wc -l;done send_tcp_conn_rr: data recv error: Connection reset by peer 109 send_tcp_conn_rr: data recv error: Connection reset by peer 146 send_tcp_conn_rr: data recv error: Connection reset by peer 11553 send_tcp_conn_rr: data recv error: Connection reset by peer During IPv6 CRR test number of connections in TIME_WAIT state rises rapidly until it exhausts whole port range available for assigning TCP source ports. > redclient-01.rhts.bos.redhat.com
> redclient-02.rhts.bos.redhat.com
The machines have been down today. I still haven't found the culprit; If I can get access to the machines again for few hours, I'll gather as much data as possible and will continue the analysis off-line.
The TIME_WAIT reuse is not the source of the problem, as the sockets are opened with SO_REUSEADDR (but you're correct that tcp_tw_reuse is not supported on IPv6). Btw, the problem is highly timing-sensitive and some attempts to debug it make it irreproducible.
Successfully tested the fix from bug 742099 comment 10 on RHEL5.8 kernel. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Patch(es) available in kernel-2.6.18-305.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5/ Detailed testing feedback is always welcomed. If you require guidance regarding testing, please ask the bug assignee. Reproduced on Linux redclient-01.rhts.bos.redhat.com 2.6.18-268.el5 Verified on Linux redclient-01.rhts.bos.redhat.com 2.6.18-305.el5 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0150.html |