1215927 – Incomplete connection tracking of half-closed tcp connections

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1215927 - Incomplete connection tracking of half-closed tcp connections

Summary: Incomplete connection tracking of half-closed tcp connections

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.6
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Rashid Khan
QA Contact:	Xin Long
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-04-28 07:20 UTC by Michele Baldessari
Modified:	2019-08-15 04:31 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-05-15 14:29:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
python server script (672 bytes, text/plain) 2015-04-28 07:21 UTC, Michele Baldessari	no flags	Details
client script (1.22 KB, text/x-python) 2015-04-28 07:22 UTC, Michele Baldessari	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	1427963	0	None	None	None	Never

Description Michele Baldessari 2015-04-28 07:20:46 UTC

Description of problem:
A RHEL 6.6 client with the the following iptables rules:
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A INPUT -m state --state NEW -p tcp --dport 22 -j ACCEPT
iptables -A INPUT -j REJECT

Here are the steps:
1. The client (B) has an established TCP connection with server (A).
The client (B) has the iptables rules described above.
2. Now the (B) does a shutdown(SHUT_WR) and closes only the sending direction. Receiving connection
is still working.
3. Now if the server will send data after 60 seconds, the connection tracking will simply drop
this data and won't recognize that the connection is still half-open (i.e. in FIN_WAIT2)

I think this is because we simple jump to CLOSE_WAIT in the connection tracking, even when the
connection is in FIN_WAIT2

Version-Release number of selected component (if applicable):
RHEL 6.6
kernel-2.6.32-504.12.2.el6.x86_64
iptables-1.4.7-14.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Launch "./server.py 90" on (A) (90 seconds is the interval used to send data and is well above the 60 seconds as defined in
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close_wait)
2. Launch "./client.py" on (B) with the above iptables rules

Actual results:
Observe that no messages will be reaching the client. netstat will show the connection in
FIN_WAIT2 and conntrack will show it in CLOSE_WAIT

Expected results:
conntrack realizing that the connection is in FIN_WAIT2 and still let traffic through

Additional info:
Increasing /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close_wait will work around this

Will attach the two simple python scripts I used to reproduce this

Comment 1 Michele Baldessari 2015-04-28 07:21:59 UTC

Created attachment 1019552 [details]
python server script

Comment 2 Michele Baldessari 2015-04-28 07:22:32 UTC

Created attachment 1019553 [details]
client script

Comment 3 Marcelo Ricardo Leitner 2015-05-04 21:24:55 UTC

Ugh, my Solution-finding-fu still isn't good. I found this similar solution https://access.redhat.com/solutions/342433 but didn't find one that I was looking for. I had a similar situation back in SEG, it was just a bit different. A NAT would stop working with a similar situation because the conntrack entry went away, due to that.. and as it was missing just the last reset pkt, it was unable to pick up the connection after the expiration.

Comment 4 Marcelo Ricardo Leitner 2015-05-07 15:19:03 UTC

Michele, please confirm one thing:
A RHEL 6.6 **client** with the the following iptables rules:
...
iptables -A **INPUT** -m state --state NEW -p tcp **--dport** 22 -j ACCEPT

Just so I understand on which side the firewall is running, on server or client? Because the rules are for server, but the text mentioned client..

Comment 5 Michele Baldessari 2015-05-07 15:32:07 UTC

Hey Marcelo,

the rules are always for the client as I mentioned. There is the line for ssh
as well because I had it on my system to be able to hop on it, but it is not
relevant. Sorry if that confused things ;)

You can assume the server has no iptables rules. On the client we basically
allow all outgoing connections and only packets related to those outgoing
connections. If you need a repro to see this live, just let me know

thanks,
Michele

Comment 6 Marcelo Ricardo Leitner 2015-05-07 16:58:28 UTC

Cool, thanks bandini! That's exactly how I'm reproducing it in here. :)

Comment 8 Michele Baldessari 2015-05-12 10:47:42 UTC

Hi Marcelo,

I tried with tcp_loose = 1 but the behaviour is still the same:
[root@rhel66b sfdc01433234]# cat /proc/sys/net/netfilter/nf_conntrack_tcp_loose 
1
[root@rhel66b sfdc01433234]# uname -a
Linux rhel66b.testlab 2.6.32-504.12.2.el6.x86_64 #1 SMP Sun Feb 1 12:14:02 EST 2015 x86_64 x86_64 x86_64 GNU/Linux


I have not tried RHEL 7 though. Maybe something changed about this knob between rhel 6 and 7?

Let me know if you need me to try out something else.

thanks,
Michele

Comment 10 Florian Westphal 2015-05-14 10:14:49 UTC

This does not look like a bug to me, conntrack seems to do the correct state
transitions, namely

ESTABLISHED -> FIN_WAIT (saw FIN in one direction)
FIN_WAIT -> CLOSE_WAIT  (saw ACK to this FIN in reply direction, i.e. proof other end got this fin)

The expected transition to LAST_ACK doesn't happen since the other side
is silent & doesn't close its side of the connection within the 60 second default timeout of CLOSE_WAIT state.

Note that conntrack states don't map 1:1 to tcp state machine terminology since
we are a observer. CLOSE_WAIT in tcp conntrack lingo means 'we know
other side received FIN notice (we saw ack for it), so we expect it to
also close its side of the connection soon.

So the only thing that in my opinion could be debated is if the default 1 minute
timeout is too low.

But what would a good timeout look like?  It would be easy to change the reproducer to work with 15 closewait timeout too...

IMO the default 1 minute is quite ok and users that typically have a larger
time delta where no packets are exchanged after one side has been closed
can easily increase the timeout.

net.netfilter.nf_conntrack_tcp_loose = 1 will work only if the resulting
'new' connection packet is permitted by the ruleset which is why it doesn't
help for the reporters case (we hit REJECT rule).

Marcelo, Jesper, whats your take on this?

Thanks!

Comment 11 Jeremy Harris 2015-05-14 10:18:56 UTC

(In reply to Florian Westphal from comment #10)
> Note that conntrack states don't map 1:1 to tcp state machine terminology
> since
> we are a observer. CLOSE_WAIT in tcp conntrack lingo means 'we know
> other side received FIN notice (we saw ack for it), so we expect it to
> also close its side of the connection soon.

That expectation is a bug.  A half-open connection is a legitimate long-term state for a TCP connection per RFC793.

Comment 12 Florian Westphal 2015-05-14 12:31:29 UTC

(In reply to Jeremy Harris from comment #11)
> (In reply to Florian Westphal from comment #10)
> > Note that conntrack states don't map 1:1 to tcp state machine terminology
> > since
> > we are a observer. CLOSE_WAIT in tcp conntrack lingo means 'we know
> > other side received FIN notice (we saw ack for it), so we expect it to
> > also close its side of the connection soon.
> 
> That expectation is a bug.  A half-open connection is a legitimate long-term
> state for a TCP connection per RFC793.

Yes, but what is your suggstion?

Any timeout we chose for this will be "too low" by TCP standard.
So its about picking a *reasonable* value that will work in practice.

So, in your opinion, what would a sane default value look like?
2 minutes? 5 minutes? 5 days?

Comment 13 Jeremy Harris 2015-05-14 12:34:25 UTC

(In reply to Florian Westphal from comment #12)
> Yes, but what is your suggstion?
> 
> Any timeout we chose for this will be "too low" by TCP standard.
> So its about picking a *reasonable* value that will work in practice.

Not using a timeout at all, but properly tracking the actual TCP endpoint state by introducing more state/s in conntrack.

Comment 14 Florian Westphal 2015-05-14 12:43:38 UTC

(In reply to Jeremy Harris from comment #13)
> (In reply to Florian Westphal from comment #12)
> > Yes, but what is your suggstion?
> > 
> > Any timeout we chose for this will be "too low" by TCP standard.
> > So its about picking a *reasonable* value that will work in practice.
> 
> Not using a timeout at all, but properly tracking the actual TCP endpoint
> state by introducing more state/s in conntrack.

We can only attempt to guess what the endpoint state is, which we do by tracking/observing packet flow.

In this case, we see fin in one direction and ack in the reply direction.
We can therefore infer that one direction has closed (fin) and the other is aware of it (acknowledged fin).

In this scenario, there are no more packets after this, hence no more
state transitions are possible purely from packet observations.

Not using any timeout could mean that the conntrack entry may hang around
forever.

The only alternative that I see to using timeouts is to inject tcp probe packets
periodically from the conntrack machine and see if we get RST from the peers.

But even that might not work (never mind its active process and kind-of orthogonal to the (passive) connection tracking) in case other box was e.g. powered off.  So I am afraid that there is not much that conntrack could do to improve behaviour.

Comment 15 Marcelo Ricardo Leitner 2015-05-14 12:51:42 UTC

(In reply to Florian Westphal from comment #14)
> (In reply to Jeremy Harris from comment #13)
> > (In reply to Florian Westphal from comment #12)
> > > Yes, but what is your suggstion?
> > > 
> > > Any timeout we chose for this will be "too low" by TCP standard.
> > > So its about picking a *reasonable* value that will work in practice.
> > 
> > Not using a timeout at all, but properly tracking the actual TCP endpoint
> > state by introducing more state/s in conntrack.
> 
> We can only attempt to guess what the endpoint state is, which we do by
> tracking/observing packet flow.
> 
> In this case, we see fin in one direction and ack in the reply direction.
> We can therefore infer that one direction has closed (fin) and the other is
> aware of it (acknowledged fin).
> 
> In this scenario, there are no more packets after this, hence no more
> state transitions are possible purely from packet observations.

That's the example I was missing, thx!
This also applies to async routing, on which conntrack simply won't see one flow entirely.

Comment 16 Jeremy Harris 2015-05-14 12:56:45 UTC

(In reply to Florian Westphal from comment #14)
> In this case, we see fin in one direction and ack in the reply direction.
> We can therefore infer that one direction has closed (fin) and the other is
> aware of it (acknowledged fin).
> 
> In this scenario, there are no more packets after this, hence no more
> state transitions are possible purely from packet observations.

I'm not sure where you get that "there are no more".  Per TCP spec, we expect
further packets.  There may be a fin in the other direction, or there may be further data in the other direction; either can happen after an indefinite delay - but either can be observed.

> Not using any timeout could mean that the conntrack entry may hang around
> forever.

This condition is no different to a fully-open connection that has gone
quiescent.

Comment 17 Florian Westphal 2015-05-14 17:43:31 UTC

(In reply to Jeremy Harris from comment #16)
> (In reply to Florian Westphal from comment #14)
> > In this case, we see fin in one direction and ack in the reply direction.
> > We can therefore infer that one direction has closed (fin) and the other is
> > aware of it (acknowledged fin).
> > 
> > In this scenario, there are no more packets after this, hence no more
> > state transitions are possible purely from packet observations.
> 
> I'm not sure where you get that "there are no more".  Per TCP spec, we expect
> further packets.

Yes, but these might not occur within the default timeout period that
we have by default (1 minute).

> There may be a fin in the other direction, or there may be
> further data in the other direction; either can happen after an indefinite
> delay - but either can be observed.

Yes, my point is that there is no guarantee that this will happen.
So we can fix this by changing default timeout to 10 minutes or whatever
but this would just "fix" things if further data is exchanged within that time period.

And, as you've already mentioned, even a 5 day timeout would not be sufficient
per tcp specs.  But, we have to be able to cope with any possible scenario
and that does include "no further packets", no matter what the current state is
(or what we think the current state is).

(Marcelo reminds me of async routing, yes, thats also something we need to be able to cope with).

> > Not using any timeout could mean that the conntrack entry may hang around
> > forever.
> 
> This condition is no different to a fully-open connection that has gone
> quiescent.

True, fully-open (established state) conntrack entries that went silent are killed after default timeout of 5 days expires.

Comment 18 Jeremy Harris 2015-05-15 08:07:38 UTC

(In reply to Florian Westphal from comment #17)
> (In reply to Jeremy Harris from comment #16)
> > (In reply to Florian Westphal from comment #14)
> > > In this case, we see fin in one direction and ack in the reply direction.
> > > We can therefore infer that one direction has closed (fin) and the other is
> > > aware of it (acknowledged fin).
> > > 
> > > In this scenario, there are no more packets after this, hence no more
> > > state transitions are possible purely from packet observations.
> > 
> > I'm not sure where you get that "there are no more".  Per TCP spec, we expect
> > further packets.
> 
> Yes, but these might not occur within the default timeout period that
> we have by default (1 minute).

Irrelevant.  They also might not, and in the customer case they do not, despite being present and correct.

> > This condition is no different to a fully-open connection that has gone
> > quiescent.
> 
> True, fully-open (established state) conntrack entries that went silent are
> killed after default timeout of 5 days expires.

What is your rationale for treating the two cases differently?

Comment 19 Jeremy Harris 2015-05-15 08:27:54 UTC

I shouldn't type in such a hurry. Sorry.
>> They also *might*, and in the customer case they *do*

Comment 20 Florian Westphal 2015-05-15 11:39:01 UTC

(In reply to Jeremy Harris from comment #18)
> (In reply to Florian Westphal from comment #17)
> > (In reply to Jeremy Harris from comment #16)
> > > (In reply to Florian Westphal from comment #14)
> > > > In this case, we see fin in one direction and ack in the reply direction.
> > > > We can therefore infer that one direction has closed (fin) and the other is
> > > > aware of it (acknowledged fin).
> > > > 
> > > > In this scenario, there are no more packets after this, hence no more
> > > > state transitions are possible purely from packet observations.
> > > 
> > > I'm not sure where you get that "there are no more".  Per TCP spec, we expect
> > > further packets.
> > 
> > Yes, but these might not occur within the default timeout period that
> > we have by default (1 minute).
> 
> Irrelevant.  They also *might*, and in the customer case they *do*,
> despite being present and correct.
> 
> > > This condition is no different to a fully-open connection that has gone
> > > quiescent.
> > 
> > True, fully-open (established state) conntrack entries that went silent are
> > killed after default timeout of 5 days expires.
> 
> What is your rationale for treating the two cases differently?

Cooperation of endpoints.  Not having seen any FIN/RST in either direction means
that neither end has given up.

Seeing a FIN, however, signals us that one side won't send any more data.

Things get hairy here -- BSD and Linux both violate RFC in that after a
close() operation, we send fin, wait for ack, and arm timewait timer in
FIN_WAIT_2 state to kill the socket even if other end turns silent.

Otherwise non-cooperative client could eat server resources indefinitely
by not sending any data.

This won't happen with shutdown() but conntrack can't tell
close/shutdown apart since same data is sent on wire.

conntrack follows similar rule, if one end wants to close, lower the timeout
to not have conntrack entry stay around for 5 more days.

Comment 21 Jeremy Harris 2015-05-15 13:47:48 UTC

(In reply to Florian Westphal from comment #20)
> (In reply to Jeremy Harris from comment #18)
> > (In reply to Florian Westphal from comment #17)
> > > (In reply to Jeremy Harris from comment #16)
> > > > (In reply to Florian Westphal from comment #14)
> > > > > In this case, we see fin in one direction and ack in the reply direction.
> > > > > We can therefore infer that one direction has closed (fin) and the other is
> > > > > aware of it (acknowledged fin).
> > > > > 
> > > > > In this scenario, there are no more packets after this, hence no more
> > > > > state transitions are possible purely from packet observations.
> > > > 
> > > > I'm not sure where you get that "there are no more".  Per TCP spec, we expect
> > > > further packets.
> > > 
> > > Yes, but these might not occur within the default timeout period that
> > > we have by default (1 minute).
> > 
> > Irrelevant.  They also *might*, and in the customer case they *do*,
> > despite being present and correct.
> > 
> > > > This condition is no different to a fully-open connection that has gone
> > > > quiescent.
> > > 
> > > True, fully-open (established state) conntrack entries that went silent are
> > > killed after default timeout of 5 days expires.
> > 
> > What is your rationale for treating the two cases differently?
> 
> Cooperation of endpoints.  Not having seen any FIN/RST in either direction
> means
> that neither end has given up.

You're begging the question.  Having seen a FIN in one direction (only) does not mean that either end has "given up".  It says that the sending end will not be sending any further data.  It specifically does not say that the end sending the FIN is no longer prepared to receive any data.  The two endpoints are still cooperating.


> Seeing a FIN, however, signals us that one side won't send any more data.
> 
> Things get hairy here -- BSD and Linux both violate RFC in that after a
> close() operation, we send fin, wait for ack, and arm timewait timer in
> FIN_WAIT_2 state to kill the socket even if other end turns silent.

No close() systemcall operation has been done in the customer case at hand.
The client has done a shutdown(fd, SHUT_WR).  The server has not done any such.

Seeing a FIN in one direction does not signify that a close() was done;
the above RFC violation is not relevant.


> Otherwise non-cooperative client could eat server resources indefinitely
> by not sending any data.

Again, no different to the full-open connection case, where a "non-cooperating client" combined with a server happening to have no data to send, could tie up sever resources indefinitely.

Barring keepalive probes, or the 5-day timeout (also violating RFCs, but I'm not fighting that battle).

A dead client would fail to respond to keeplives, and I'd think them still usable in our half-open state.  Has this been considered?

Certainly a malicious client could still tie up resource - but, again, no different to the fully-open case.


> This won't happen with shutdown() but conntrack can't tell
> close/shutdown apart since same data is sent on wire.
> 
> conntrack follows similar rule, if one end wants to close, lower the timeout
> to not have conntrack entry stay around for 5 more days.

The client TCP endpoint, in the customer case, was in FIN_WAIT2.  Conntrack has implemented a different rule, not similar, by going closed.

The conntrack involved is on the client, so arguments dealing with protecting servers from non-cooperative clients are slightly questionable.


The point is that neither end wished to close.  Rather, one end wished to say that it would not be transmitting further.



On violating RFCs:

  This decision has removed previously available functionality, viz: a unidirectional data flow supported by a TCP connection in half-open state.

Polemic:  This is the attitude that makes Linux laughed at by more-serious operating systems.  I'm not sure whether you are agreeing with the decision, taking no position, or saying we're stuck with it and may as well duplicate it (a TCP implementation decision to remove functionality, although no such is visible in the customer case) in the conntrack implementation.  Whichever, we in GSS now get to explain to our customers that, as a company, we do not care about this part of the basic TCP standard.



Proposal:

Do not timeout a half-open connection unilaterally in conntrack after seeing a one-way FIN.  Instead, probe the connection with keepalives.

  Discussion: this may be better implemented in the TCP endpoint, with conntrack only monitoring.


CS theory meta-discussion:

This is an instance of the second major problem in computing: the invalidation of caches.  Conntrack is attempting to second-guess the state of a TCP endpoint.  Perhaps it should be solidly tied instead.

Comment 22 Marcelo Ricardo Leitner 2015-05-15 14:29:27 UTC

That's a good talk to have over a couple of beers but for this bug this is getting way beyond the scope.

Believe it or not, conntrack works like this for several years and works good for 99.99% of our user base, and we (as community) hardly see a report about this feature. And when we see it, it's normally just fixed/workarounded (call it whatever you like) by bumping close_wait timer.

Let me help you on explaining that to customer: it's a simplification that's been working well for more than 10 years and had its reasons to be done by then. Conntrack is used in lots of different scenarios and this way it currently is a solution that fits it all.

We cannot simply bump that default as this would mean increased memory usage for millions of other systems out there just to cope with a few cases, so please do it on your systems as needed.

No one is left unattended with this.

Your suggestion is way costy than this tuning and, as conntrack is working as expected, must be submitted as a RFE instead if you would like to pursuit it any longer.

Thank you,
Marcelo

Note You need to log in before you can comment on or make changes to this bug.