1189241 – TCP_USER_TIMEOUT does not work when connection is stalled on zero-window probes

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1189241 - TCP_USER_TIMEOUT does not work when connection is stalled on zero-window probes

Summary: TCP_USER_TIMEOUT does not work when connection is stalled on zero-window probes

Keywords:
Status:	CLOSED DUPLICATE of bug 1151756
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	7.2
Assignee:	Florian Westphal
QA Contact:	Hangbin Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1175685 1189480 1190776 1190783 1194407
TreeView+	depends on / blocked

Reported:	2015-02-04 19:04 UTC by John Eckersberg
Modified:	2023-02-22 23:02 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1215924 (view as bug list)
Environment:
Last Closed:	2015-04-18 08:59:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
backported patch (5.85 KB, patch) 2015-02-09 14:55 UTC, John Eckersberg	no flags	Details \| Diff
View All

Description John Eckersberg 2015-02-04 19:04:17 UTC

Description of problem:
Setting TCP_USER_TIMEOUT via setsockopt() does not work properly if the connection is stalled on zero-window probes.  This behavior also exhibits if the source address is removed from the host and then the packet becomes unroutable.  Both of these cases trigger the TCP persist timer, and the value of TCP_USER_TIMEOUT does not take effect.

Version-Release number of selected component (if applicable):
kernel-3.10.0-123.20.1.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1.  Add an IP address to some interface.  In my case, I chose 192.168.201.254.

2.  Open a listening socket, bind to chosen address, and set TCP_USER_TIMEOUT via setsockopt()

    # socat tcp-listen:12345,bind=192.168.201.254,setsockopt-int=6:18:5000 -

3.  Connect to the listening socket to get an established connection

    # socat tcp:192.168.201.254:12345 -

4.  Delete the bound IP address from the interface

5.  Write data to the listen end of the connection

6.  Observe session state and probe timer with:

    # ss -ntpo src :12345

Actual results:
The connection will spin on the persist timer for RTO * tcp_retries2 seconds before aborting (generally some tens of minutes).

State      Recv-Q Send-Q        Local Address:Port          Peer Address:Port 
ESTAB      0      6           192.168.201.254:12345        192.168.201.3:50457  timer:(persist,24sec,7) users:(("socat",5986,4))

Expected results:
The connection will terminate when the user supplied timeout is reached.  In the above example, that would be 5000ms.

Additional info:
This was fixed upstream in 3.18 here:

commit b248230c34970a6c1c17c591d63b464e8d2cfc33
Author: Yuchung Cheng <ycheng>
Date:   Mon Sep 29 13:20:38 2014 -0700

    tcp: abort orphan sockets stalling on zero window probes
    
    Currently we have two different policies for orphan sockets
    that repeatedly stall on zero window ACKs. If a socket gets
    a zero window ACK when it is transmitting data, the RTO is
    used to probe the window. The socket is aborted after roughly
    tcp_orphan_retries() retries (as in tcp_write_timeout()).
    
    But if the socket was idle when it received the zero window ACK,
    and later wants to send more data, we use the probe timer to
    probe the window. If the receiver always returns zero window ACKs,
    icsk_probes keeps getting reset in tcp_ack() and the orphan socket
    can stall forever until the system reaches the orphan limit (as
    commented in tcp_probe_timer()). This opens up a simple attack
    to create lots of hanging orphan sockets to burn the memory
    and the CPU, as demonstrated in the recent netdev post "TCP
    connection will hang in FIN_WAIT1 after closing if zero window is
    advertised." http://www.spinics.net/lists/netdev/msg296539.html
    
    This patch follows the design in RTO-based probe: we abort an orphan
    socket stalling on zero window when the probe timer reaches both
    the maximum backoff and the maximum RTO. For example, an 100ms RTT
    connection will timeout after roughly 153 seconds (0.3 + 0.6 +
    .... + 76.8) if the receiver keeps the window shut. If the orphan
    socket passes this check, but the system already has too many orphans
    (as in tcp_out_of_resources()), we still abort it but we'll also
    send an RST packet as the connection may still be active.
    
    In addition, we change TCP_USER_TIMEOUT to cover (life or dead)
    sockets stalled on zero-window probes. This changes the semantics
    of TCP_USER_TIMEOUT slightly because it previously only applies
    when the socket has pending transmission.
    
    Signed-off-by: Yuchung Cheng <ycheng>
    Signed-off-by: Eric Dumazet <edumazet>
    Signed-off-by: Neal Cardwell <ncardwell>
    Reported-by: Andrey Dmitrov <andrey.dmitrov>
    Signed-off-by: David S. Miller <davem>

I have verified that the 3.17 kernel(s) in F21 exhibit this bug, and the 3.18 kernel from updates fixes the bug.  I have also verified that the RHEL7 kernel exhibits this bug.

This has been proposed to be queued for -stable upstream:

http://marc.info/?l=linux-netdev&m=142300689120800&w=2

I've also checked that the patch applies cleanly to the stable linux-3.10.y tree.

We're hitting this problem in RHEL OpenStack Platform when using HAProxy and doing VIP failover.  We need to be able to detect quickly when the connection has failed in the manner described above.  I've been working with the HAProxy guys and they've just implemented the bits on their side to make sure the socket option is configurable:

http://git.haproxy.org/?p=haproxy.git;a=commitdiff;h=2af207a5f5e853

However we're going to keep hitting this issue until the kernel is fixed.

Comment 2 John Eckersberg 2015-02-09 14:55:11 UTC

Created attachment 989728 [details]
backported patch

Here's the patch, backported onto the RHEL7 kernel.  The only change I made was to replace the newer skb_mstamp bits with skb->when/tcp_time_stamp.  I applied this on top of kernel-3.10.0-229.el7.x86_64, and when running the reproducer I now correctly get the timeout as expected:

2015/02/09 09:40:26 socat[22956] E read(4, 0x1ab9c80, 8192): Connection timed out

Comment 12 Florian Westphal 2015-06-02 19:51:10 UTC

The upstream fix mentioned in the description is slated for 7.2 via #1151756.
A backport of the same upstream fix is on its way to 7.1 zstream via #1215924.

Note You need to log in before you can comment on or make changes to this bug.