Bug 1215927
| Summary: | Incomplete connection tracking of half-closed tcp connections | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Michele Baldessari <michele> | ||||||
| Component: | kernel | Assignee: | Rashid Khan <rkhan> | ||||||
| kernel sub component: | Netfilter | QA Contact: | Xin Long <lxin> | ||||||
| Status: | CLOSED NOTABUG | Docs Contact: | |||||||
| Severity: | medium | ||||||||
| Priority: | medium | CC: | anrussel, fwestpha, jan.iven, jbrouer, jeharris, michele, mleitner, network-qe, rkhan | ||||||
| Version: | 6.6 | ||||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2015-05-15 14:29:27 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Michele Baldessari
2015-04-28 07:20:46 UTC
Created attachment 1019552 [details]
python server script
Created attachment 1019553 [details]
client script
Ugh, my Solution-finding-fu still isn't good. I found this similar solution https://access.redhat.com/solutions/342433 but didn't find one that I was looking for. I had a similar situation back in SEG, it was just a bit different. A NAT would stop working with a similar situation because the conntrack entry went away, due to that.. and as it was missing just the last reset pkt, it was unable to pick up the connection after the expiration. Michele, please confirm one thing: A RHEL 6.6 **client** with the the following iptables rules: ... iptables -A **INPUT** -m state --state NEW -p tcp **--dport** 22 -j ACCEPT Just so I understand on which side the firewall is running, on server or client? Because the rules are for server, but the text mentioned client.. Hey Marcelo, the rules are always for the client as I mentioned. There is the line for ssh as well because I had it on my system to be able to hop on it, but it is not relevant. Sorry if that confused things ;) You can assume the server has no iptables rules. On the client we basically allow all outgoing connections and only packets related to those outgoing connections. If you need a repro to see this live, just let me know thanks, Michele Cool, thanks bandini! That's exactly how I'm reproducing it in here. :) Hi Marcelo, I tried with tcp_loose = 1 but the behaviour is still the same: [root@rhel66b sfdc01433234]# cat /proc/sys/net/netfilter/nf_conntrack_tcp_loose 1 [root@rhel66b sfdc01433234]# uname -a Linux rhel66b.testlab 2.6.32-504.12.2.el6.x86_64 #1 SMP Sun Feb 1 12:14:02 EST 2015 x86_64 x86_64 x86_64 GNU/Linux I have not tried RHEL 7 though. Maybe something changed about this knob between rhel 6 and 7? Let me know if you need me to try out something else. thanks, Michele This does not look like a bug to me, conntrack seems to do the correct state transitions, namely ESTABLISHED -> FIN_WAIT (saw FIN in one direction) FIN_WAIT -> CLOSE_WAIT (saw ACK to this FIN in reply direction, i.e. proof other end got this fin) The expected transition to LAST_ACK doesn't happen since the other side is silent & doesn't close its side of the connection within the 60 second default timeout of CLOSE_WAIT state. Note that conntrack states don't map 1:1 to tcp state machine terminology since we are a observer. CLOSE_WAIT in tcp conntrack lingo means 'we know other side received FIN notice (we saw ack for it), so we expect it to also close its side of the connection soon. So the only thing that in my opinion could be debated is if the default 1 minute timeout is too low. But what would a good timeout look like? It would be easy to change the reproducer to work with 15 closewait timeout too... IMO the default 1 minute is quite ok and users that typically have a larger time delta where no packets are exchanged after one side has been closed can easily increase the timeout. net.netfilter.nf_conntrack_tcp_loose = 1 will work only if the resulting 'new' connection packet is permitted by the ruleset which is why it doesn't help for the reporters case (we hit REJECT rule). Marcelo, Jesper, whats your take on this? Thanks! (In reply to Florian Westphal from comment #10) > Note that conntrack states don't map 1:1 to tcp state machine terminology > since > we are a observer. CLOSE_WAIT in tcp conntrack lingo means 'we know > other side received FIN notice (we saw ack for it), so we expect it to > also close its side of the connection soon. That expectation is a bug. A half-open connection is a legitimate long-term state for a TCP connection per RFC793. (In reply to Jeremy Harris from comment #11) > (In reply to Florian Westphal from comment #10) > > Note that conntrack states don't map 1:1 to tcp state machine terminology > > since > > we are a observer. CLOSE_WAIT in tcp conntrack lingo means 'we know > > other side received FIN notice (we saw ack for it), so we expect it to > > also close its side of the connection soon. > > That expectation is a bug. A half-open connection is a legitimate long-term > state for a TCP connection per RFC793. Yes, but what is your suggstion? Any timeout we chose for this will be "too low" by TCP standard. So its about picking a *reasonable* value that will work in practice. So, in your opinion, what would a sane default value look like? 2 minutes? 5 minutes? 5 days? (In reply to Florian Westphal from comment #12) > Yes, but what is your suggstion? > > Any timeout we chose for this will be "too low" by TCP standard. > So its about picking a *reasonable* value that will work in practice. Not using a timeout at all, but properly tracking the actual TCP endpoint state by introducing more state/s in conntrack. (In reply to Jeremy Harris from comment #13) > (In reply to Florian Westphal from comment #12) > > Yes, but what is your suggstion? > > > > Any timeout we chose for this will be "too low" by TCP standard. > > So its about picking a *reasonable* value that will work in practice. > > Not using a timeout at all, but properly tracking the actual TCP endpoint > state by introducing more state/s in conntrack. We can only attempt to guess what the endpoint state is, which we do by tracking/observing packet flow. In this case, we see fin in one direction and ack in the reply direction. We can therefore infer that one direction has closed (fin) and the other is aware of it (acknowledged fin). In this scenario, there are no more packets after this, hence no more state transitions are possible purely from packet observations. Not using any timeout could mean that the conntrack entry may hang around forever. The only alternative that I see to using timeouts is to inject tcp probe packets periodically from the conntrack machine and see if we get RST from the peers. But even that might not work (never mind its active process and kind-of orthogonal to the (passive) connection tracking) in case other box was e.g. powered off. So I am afraid that there is not much that conntrack could do to improve behaviour. (In reply to Florian Westphal from comment #14) > (In reply to Jeremy Harris from comment #13) > > (In reply to Florian Westphal from comment #12) > > > Yes, but what is your suggstion? > > > > > > Any timeout we chose for this will be "too low" by TCP standard. > > > So its about picking a *reasonable* value that will work in practice. > > > > Not using a timeout at all, but properly tracking the actual TCP endpoint > > state by introducing more state/s in conntrack. > > We can only attempt to guess what the endpoint state is, which we do by > tracking/observing packet flow. > > In this case, we see fin in one direction and ack in the reply direction. > We can therefore infer that one direction has closed (fin) and the other is > aware of it (acknowledged fin). > > In this scenario, there are no more packets after this, hence no more > state transitions are possible purely from packet observations. That's the example I was missing, thx! This also applies to async routing, on which conntrack simply won't see one flow entirely. (In reply to Florian Westphal from comment #14) > In this case, we see fin in one direction and ack in the reply direction. > We can therefore infer that one direction has closed (fin) and the other is > aware of it (acknowledged fin). > > In this scenario, there are no more packets after this, hence no more > state transitions are possible purely from packet observations. I'm not sure where you get that "there are no more". Per TCP spec, we expect further packets. There may be a fin in the other direction, or there may be further data in the other direction; either can happen after an indefinite delay - but either can be observed. > Not using any timeout could mean that the conntrack entry may hang around > forever. This condition is no different to a fully-open connection that has gone quiescent. (In reply to Jeremy Harris from comment #16) > (In reply to Florian Westphal from comment #14) > > In this case, we see fin in one direction and ack in the reply direction. > > We can therefore infer that one direction has closed (fin) and the other is > > aware of it (acknowledged fin). > > > > In this scenario, there are no more packets after this, hence no more > > state transitions are possible purely from packet observations. > > I'm not sure where you get that "there are no more". Per TCP spec, we expect > further packets. Yes, but these might not occur within the default timeout period that we have by default (1 minute). > There may be a fin in the other direction, or there may be > further data in the other direction; either can happen after an indefinite > delay - but either can be observed. Yes, my point is that there is no guarantee that this will happen. So we can fix this by changing default timeout to 10 minutes or whatever but this would just "fix" things if further data is exchanged within that time period. And, as you've already mentioned, even a 5 day timeout would not be sufficient per tcp specs. But, we have to be able to cope with any possible scenario and that does include "no further packets", no matter what the current state is (or what we think the current state is). (Marcelo reminds me of async routing, yes, thats also something we need to be able to cope with). > > Not using any timeout could mean that the conntrack entry may hang around > > forever. > > This condition is no different to a fully-open connection that has gone > quiescent. True, fully-open (established state) conntrack entries that went silent are killed after default timeout of 5 days expires. (In reply to Florian Westphal from comment #17) > (In reply to Jeremy Harris from comment #16) > > (In reply to Florian Westphal from comment #14) > > > In this case, we see fin in one direction and ack in the reply direction. > > > We can therefore infer that one direction has closed (fin) and the other is > > > aware of it (acknowledged fin). > > > > > > In this scenario, there are no more packets after this, hence no more > > > state transitions are possible purely from packet observations. > > > > I'm not sure where you get that "there are no more". Per TCP spec, we expect > > further packets. > > Yes, but these might not occur within the default timeout period that > we have by default (1 minute). Irrelevant. They also might not, and in the customer case they do not, despite being present and correct. > > This condition is no different to a fully-open connection that has gone > > quiescent. > > True, fully-open (established state) conntrack entries that went silent are > killed after default timeout of 5 days expires. What is your rationale for treating the two cases differently? I shouldn't type in such a hurry. Sorry.
>> They also *might*, and in the customer case they *do*
(In reply to Jeremy Harris from comment #18) > (In reply to Florian Westphal from comment #17) > > (In reply to Jeremy Harris from comment #16) > > > (In reply to Florian Westphal from comment #14) > > > > In this case, we see fin in one direction and ack in the reply direction. > > > > We can therefore infer that one direction has closed (fin) and the other is > > > > aware of it (acknowledged fin). > > > > > > > > In this scenario, there are no more packets after this, hence no more > > > > state transitions are possible purely from packet observations. > > > > > > I'm not sure where you get that "there are no more". Per TCP spec, we expect > > > further packets. > > > > Yes, but these might not occur within the default timeout period that > > we have by default (1 minute). > > Irrelevant. They also *might*, and in the customer case they *do*, > despite being present and correct. > > > > This condition is no different to a fully-open connection that has gone > > > quiescent. > > > > True, fully-open (established state) conntrack entries that went silent are > > killed after default timeout of 5 days expires. > > What is your rationale for treating the two cases differently? Cooperation of endpoints. Not having seen any FIN/RST in either direction means that neither end has given up. Seeing a FIN, however, signals us that one side won't send any more data. Things get hairy here -- BSD and Linux both violate RFC in that after a close() operation, we send fin, wait for ack, and arm timewait timer in FIN_WAIT_2 state to kill the socket even if other end turns silent. Otherwise non-cooperative client could eat server resources indefinitely by not sending any data. This won't happen with shutdown() but conntrack can't tell close/shutdown apart since same data is sent on wire. conntrack follows similar rule, if one end wants to close, lower the timeout to not have conntrack entry stay around for 5 more days. (In reply to Florian Westphal from comment #20) > (In reply to Jeremy Harris from comment #18) > > (In reply to Florian Westphal from comment #17) > > > (In reply to Jeremy Harris from comment #16) > > > > (In reply to Florian Westphal from comment #14) > > > > > In this case, we see fin in one direction and ack in the reply direction. > > > > > We can therefore infer that one direction has closed (fin) and the other is > > > > > aware of it (acknowledged fin). > > > > > > > > > > In this scenario, there are no more packets after this, hence no more > > > > > state transitions are possible purely from packet observations. > > > > > > > > I'm not sure where you get that "there are no more". Per TCP spec, we expect > > > > further packets. > > > > > > Yes, but these might not occur within the default timeout period that > > > we have by default (1 minute). > > > > Irrelevant. They also *might*, and in the customer case they *do*, > > despite being present and correct. > > > > > > This condition is no different to a fully-open connection that has gone > > > > quiescent. > > > > > > True, fully-open (established state) conntrack entries that went silent are > > > killed after default timeout of 5 days expires. > > > > What is your rationale for treating the two cases differently? > > Cooperation of endpoints. Not having seen any FIN/RST in either direction > means > that neither end has given up. You're begging the question. Having seen a FIN in one direction (only) does not mean that either end has "given up". It says that the sending end will not be sending any further data. It specifically does not say that the end sending the FIN is no longer prepared to receive any data. The two endpoints are still cooperating. > Seeing a FIN, however, signals us that one side won't send any more data. > > Things get hairy here -- BSD and Linux both violate RFC in that after a > close() operation, we send fin, wait for ack, and arm timewait timer in > FIN_WAIT_2 state to kill the socket even if other end turns silent. No close() systemcall operation has been done in the customer case at hand. The client has done a shutdown(fd, SHUT_WR). The server has not done any such. Seeing a FIN in one direction does not signify that a close() was done; the above RFC violation is not relevant. > Otherwise non-cooperative client could eat server resources indefinitely > by not sending any data. Again, no different to the full-open connection case, where a "non-cooperating client" combined with a server happening to have no data to send, could tie up sever resources indefinitely. Barring keepalive probes, or the 5-day timeout (also violating RFCs, but I'm not fighting that battle). A dead client would fail to respond to keeplives, and I'd think them still usable in our half-open state. Has this been considered? Certainly a malicious client could still tie up resource - but, again, no different to the fully-open case. > This won't happen with shutdown() but conntrack can't tell > close/shutdown apart since same data is sent on wire. > > conntrack follows similar rule, if one end wants to close, lower the timeout > to not have conntrack entry stay around for 5 more days. The client TCP endpoint, in the customer case, was in FIN_WAIT2. Conntrack has implemented a different rule, not similar, by going closed. The conntrack involved is on the client, so arguments dealing with protecting servers from non-cooperative clients are slightly questionable. The point is that neither end wished to close. Rather, one end wished to say that it would not be transmitting further. On violating RFCs: This decision has removed previously available functionality, viz: a unidirectional data flow supported by a TCP connection in half-open state. Polemic: This is the attitude that makes Linux laughed at by more-serious operating systems. I'm not sure whether you are agreeing with the decision, taking no position, or saying we're stuck with it and may as well duplicate it (a TCP implementation decision to remove functionality, although no such is visible in the customer case) in the conntrack implementation. Whichever, we in GSS now get to explain to our customers that, as a company, we do not care about this part of the basic TCP standard. Proposal: Do not timeout a half-open connection unilaterally in conntrack after seeing a one-way FIN. Instead, probe the connection with keepalives. Discussion: this may be better implemented in the TCP endpoint, with conntrack only monitoring. CS theory meta-discussion: This is an instance of the second major problem in computing: the invalidation of caches. Conntrack is attempting to second-guess the state of a TCP endpoint. Perhaps it should be solidly tied instead. That's a good talk to have over a couple of beers but for this bug this is getting way beyond the scope. Believe it or not, conntrack works like this for several years and works good for 99.99% of our user base, and we (as community) hardly see a report about this feature. And when we see it, it's normally just fixed/workarounded (call it whatever you like) by bumping close_wait timer. Let me help you on explaining that to customer: it's a simplification that's been working well for more than 10 years and had its reasons to be done by then. Conntrack is used in lots of different scenarios and this way it currently is a solution that fits it all. We cannot simply bump that default as this would mean increased memory usage for millions of other systems out there just to cope with a few cases, so please do it on your systems as needed. No one is left unattended with this. Your suggestion is way costy than this tuning and, as conntrack is working as expected, must be submitted as a RFE instead if you would like to pursuit it any longer. Thank you, Marcelo |