Cannot reproduce using KVM in ~30 runs. Could you provide more details about the setup (or access to the machines showing the problem, if you still have them)?
Hi Jiri, I prepared two affected systems for you. You can use redclient-01.rhts.bos.redhat.com redclient-02.rhts.bos.redhat.com Please run: sysctl net.ipv4.tcp_tw_reuse=1 to enable TIME_WAIT connections reusing. Results of tests on redclients are following: for i in `seq 1 30`; do netperf -H fd20::2 -L fd20::1 -t TCP_CRR -l 60 -P0; done 16384 87380 1 1 60.00 4365.31 16384 87380 send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer 16384 87380 1 1 60.00 4354.53 16384 87380 send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer 16384 87380 1 1 60.00 4353.37 16384 87380 send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer 16384 87380 1 1 60.00 4352.90 16384 87380 send_tcp_conn_rr: data recv error: Connection reset by peer 16384 87380 1 1 60.00 4357.50 16384 87380 16384 87380 1 1 60.00 4358.20 16384 87380 16384 87380 1 1 60.00 4363.02 16384 87380 16384 87380 1 1 60.00 4360.61 16384 87380 16384 87380 1 1 60.00 4356.58 16384 87380 send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer send_tcp_conn_rr: data recv error: Connection reset by peer Redclient machines are reserved for 7 days, but please try to investigate this problem ASAP, Shacks team uses these machines for NFS testing.
I believe that sysctl net.ipv4.tcp_tw_reuse=1 does not work for ipv6 connections. for i in `seq 1 10`; do netperf -H 192.168.1.2 -t TCP_CRR -l 60 -P0>/dev/null ; netstat -nta | grep TIME_WAIT | wc -l;done 19 17 25 17 18 18 30 15 19 21 During IPv4 CRR test number of connections in TIME_WAIT state stays small during all time. for i in `seq 1 10`; do netperf -H fd20::2 -Lfd20::1 -t TCP_CRR -l 60 -P0>/dev/null ; netstat -nta | grep TIME_WAIT | wc -l;done send_tcp_conn_rr: data recv error: Connection reset by peer 109 send_tcp_conn_rr: data recv error: Connection reset by peer 146 send_tcp_conn_rr: data recv error: Connection reset by peer 11553 send_tcp_conn_rr: data recv error: Connection reset by peer During IPv6 CRR test number of connections in TIME_WAIT state rises rapidly until it exhausts whole port range available for assigning TCP source ports.
> redclient-01.rhts.bos.redhat.com > redclient-02.rhts.bos.redhat.com The machines have been down today. I still haven't found the culprit; If I can get access to the machines again for few hours, I'll gather as much data as possible and will continue the analysis off-line. The TIME_WAIT reuse is not the source of the problem, as the sockets are opened with SO_REUSEADDR (but you're correct that tcp_tw_reuse is not supported on IPv6). Btw, the problem is highly timing-sensitive and some attempts to debug it make it irreproducible.
Successfully tested the fix from bug 742099 comment 10 on RHEL5.8 kernel.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Patch(es) available in kernel-2.6.18-305.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5/ Detailed testing feedback is always welcomed. If you require guidance regarding testing, please ask the bug assignee.
Reproduced on Linux redclient-01.rhts.bos.redhat.com 2.6.18-268.el5 Verified on Linux redclient-01.rhts.bos.redhat.com 2.6.18-305.el5
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0150.html