191336 – Netfilter TCP retransmission bug causes connection tracking problems

Bug 191336 - Netfilter TCP retransmission bug causes connection tracking problems

Summary: Netfilter TCP retransmission bug causes connection tracking problems

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Thomas Graf
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-05-10 21:57 UTC by Steve Snodgrass
Modified:	2014-06-18 08:29 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-06-13 22:29:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Detailed description of problem including packet traces (6.42 KB, text/plain) 2006-05-10 21:57 UTC, Steve Snodgrass	no flags	Details
View All

Description Steve Snodgrass 2006-05-10 21:57:38 UTC

Description of problem: Linux 2.6 kernels before 2.6.11 contain a known bug in
the netfilter connection tracking code that is triggered by legitimate sequences
of ACK packets.  See the attachment.  The problem and patch are described here:

http://lists.netfilter.org/pipermail/netfilter-devel/2005-January/018241.html

Version-Release number of selected component (if applicable): 2.6.9-34.ELsmp,
but this bug is present in all currently-released RHEL4 kernels


How reproducible: Unfortunately not easy to reproduce unless you have the
ability the feed netfilter with arbitrary packet sequences


Steps to Reproduce: See the attachment
1 [details].
2.
3.
  
Actual results: The bug my cause netfilter to mark TCP ACK packets as NEW or
INVALID, which may causes session hangs/drops depending on the particular
firewall setup.


Expected results: Legitimate packets should get through.  :-)


Additional info: This may be the root cause of other outstanding bugs including
bug #161898 and bug #182012.

Comment 1 Steve Snodgrass 2006-05-10 21:57:38 UTC

Created attachment 128865 [details]
Detailed description of problem including packet traces

Comment 2 Steve Snodgrass 2006-05-15 20:38:29 UTC

FYI I'm willing to test a development kernel with this patch if I can get one.

Comment 3 James Ralston 2006-06-26 16:00:23 UTC

Just FYI, we're getting hit by this problem as well.

In fact, I suspect *many* sites are experiencing this problem, but they don't
realize it because they write off the occasional connection timeout as a fluke.

Comment 4 Andy Gospodarek 2006-07-11 00:43:45 UTC

Test builds are available here:

http://people.redhat.com/agospoda/bz/191336/

Please let us know if this resolves your issue.

Comment 8 Steve Snodgrass 2006-07-13 00:23:35 UTC

I'm deploying the test kernel on one of my firewall clusters, I'll let you know
how it goes.

Comment 9 Andy Gospodarek 2006-07-13 02:23:31 UTC

I have recently found a problem with these builds and though they will probably
work for you, this exact fix will not make its way into RHEL4.  I am going to
see if I can find a suitable solution and I will update you with new kernels
when that happens.  Feel free to continue running these kernels and let me know
if you see an improvement.

Comment 10 Steve Snodgrass 2006-07-21 13:12:08 UTC

So far it is working well, but it needs more time to be sure since I can't
reproduce this on demand.  What sort of problem did you find, is it related to
the patch or something else entirely?

Comment 11 Andy Gospodarek 2006-07-21 14:00:14 UTC

This patch manages to cause a change in kABI (something RH makes every attempt
NOT to do with updates), so a different patch that creates similar functionality
is needed before it will be acceptable for an update.

Comment 12 James Ralston 2006-07-24 17:22:06 UTC

That's unfortunate, because the patch author specifically commented that he
couldn't find any way to fix the problem without adding another member to the
ip_ct_tcp structure.  The only other alternative was to disable the
retransmission detection code altogether.

RHEL5 will fix this problem (because it will rebase to a 2.6 kernel that already
contains the patch), but I'm not sure what alternatives exist in the meantime
(other than rolling our own kernels, that is).

Comment 13 Steve Snodgrass 2006-07-25 00:57:19 UTC

Yes, the alternative the changing the data structure is to simply disable the
retransmission detection logic that triggers this bug.  The retransmit detection
is not actually needed for anything; it's an attempt to keep down the size of
the conntrack table; in my opinion disabling it is a much better alternative
than keeping this bug.  See this message (and replies) for some discussion about
this option:

http://lists.netfilter.org/pipermail/netfilter-devel/2005-January/018244.html

If this is acceptable, I will provide a patch to disable this behavior that does
not alter the kernel ABI.

Comment 14 Andy Gospodarek 2006-07-25 21:48:59 UTC

While rebuilding the kernel to disable the retransmission detection is one
option, it would seem that you could set the sysctl variable
net.ipv4.netfilter.ip_conntrack_tcp_max_retran (or procfile
/proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_max_retrans) to something higher
than '3' and you might accomplish a similar goal:

# sysctl net.ipv4.netfilter.ip_conntrack_tcp_max_retrans
net.ipv4.netfilter.ip_conntrack_tcp_max_retrans = 3

I realize that this option isn't quite as nice as the original patch, but if you
have a good guestimate about how many extra packets you are being falsely
detected as retransmits then you could try setting this value a little higher
than that and see what happens.  Have either of your tried this previously?

Comment 15 Steve Snodgrass 2006-07-27 13:35:30 UTC

OK, now I feel like an idiot, I don't know why I never noticed that before.  I
think because I saw "int ip_ct_tcp_max_retrans = 3;" in the kernel code I was
assuming this was a hardcoded value.  Except of course the code in
ip_conntrack_standalone.c ties this variable into sysctl/proc.  Now that you've
pointed this out, I think that setting this to some arbitrary high value (e.g.
100) could be a very acceptable workaround.

Thus far the test kernel has been working flawlessly for me, but I will revert
back to the regular kernel and set this value.

Comment 16 Steve Snodgrass 2006-07-27 13:40:21 UTC

You might want to consider bumping the default value of this parameter up a bit
in future 2.6.9-based kernel releases (perhaps in the 5-10 range) in an effort
to avoid triggering this bug on regular traffic.

Comment 17 Steve Snodgrass 2006-08-15 01:42:00 UTC

I've been running with "net.ipv4.netfilter.ip_conntrack_tcp_max_retrans = 10"
for some time now and it has been working well.  I think this is a good
workaround for this problem, though I still think you should consider changing
the default.

Comment 19 Andy Gospodarek 2006-08-24 15:56:25 UTC

Does this workaround work for everyone?  I'd like to close out this issue if
that is the case.

Comment 20 Steve Snodgrass 2006-09-07 17:47:14 UTC

I spoke a little too soon - I'm still seeing occasional IMAP session hangs, but
the frequency is much less than before.  I suspect that setting the retrans
value to 10 is still a little low, but I haven't captured a session yet to
confirm.  I'm going to increase it and see what happens.

Comment 21 Steve Snodgrass 2006-10-21 21:10:41 UTC

I've been running with a value of 20 for a while now and it seems to have
eliminated the problem.

Comment 22 James Ralston 2007-01-04 22:53:26 UTC

I've been using a value of 32 for some time now (because I like numbers that are
powers of 2), and it seems to have eliminated the connection drops.

It strikes me that netfilter could also be smarter here: if netfilter picks up a
connection (because ip_conntrack_tcp_loose is set to 1 or higher), netfilter
should implicitly set that connection as "liberal" (regardless of whether the
actual ip_conntrack_tcp_be_liberal value is 0 or 1), because netfilter did not
see the initial window scale negotiation, and can't reliably determine it.

From looking at net/netfilter/nf_conntrack_proto_tcp.c in 2.6.18-1.2869.fc6,
netfilter seems to try to figure out the window size whenever it picks up a
connection, but it doesn't mark the connection in order to let tcp_in_window()
know that the window scale might be wrong.

Generally speaking, one could set ip_conntrack_tcp_be_liberal to 1 globally, but
that's overkill--if netfilter saw the initial SYN/SYN+ACK, then netfilter
*should* reject packets outside of the window.  It's only in the case where
netfilter didn't see the initial SYN/SYN+ACK that it should assume a
ip_conntrack_tcp_be_liberal for that connection.

Comment 23 David Miller 2007-01-05 00:19:31 UTC

That's a good suggestion, I'll mention this to Patrick McHardy, the
active netfilter maintainer.

Comment 24 Bob Plankers 2007-05-02 22:11:02 UTC

Hey all,

Has there been a resolution to this at all? I am encountering this problem on a
number of machines now, dropping connections. Currently I am trying the
workaround to see if that helps.

Thanks.

Comment 25 Alex Tang 2007-09-05 17:22:57 UTC

Hi folks.

I have a question related to this bug.  

We've been seeing what i think is the same problem with RHEL-4.5,
(kernel-2.6.9-55.0.2, iptables-1.2.11-3.1.RHEL4).  

Our iptables rules look like this:

# Generated by iptables-save v1.2.11 on Wed Sep  5 10:38:31 2007
*raw
:PREROUTING ACCEPT [2673477:557179184]
:OUTPUT ACCEPT [2667955:561064344]
# Completed on Wed Sep  5 10:38:31 2007
# Generated by iptables-save v1.2.11 on Wed Sep  5 10:38:31 2007
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [2667955:561064344]
:RH-Firewall-1-INPUT - [0:0]
-A INPUT -j RH-Firewall-1-INPUT
-A FORWARD -j RH-Firewall-1-INPUT
-A RH-Firewall-1-INPUT -i lo -j ACCEPT
-A RH-Firewall-1-INPUT -p icmp -m icmp --icmp-type any -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A RH-Firewall-1-INPUT -s 10.22.242.11 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport 22 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport 25 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport 80 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m tcp -j REJECT --reject-with tcp-reset         
         
-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-port-unreachable  
COMMIT
# Completed on Wed Sep  5 10:38:31 2007
# Generated by iptables-save v1.2.11 on Wed Sep  5 10:38:31 2007
*nat
:PREROUTING ACCEPT [6902:2262800]
:POSTROUTING ACCEPT [2656:181837]
:OUTPUT ACCEPT [2656:181837]
COMMIT
# Completed on Wed Sep  5 10:38:31 2007


Our symptoms are that outbound SMTP connections to high latency servers are
seeing message failures due to TCP resets.

I was thinking that increasing the
net.ipv4.netfilter.ip_conntrack_tcp_max_retrans as per the comments in this bug
would solve the problem.

However, a colleague thinks that it has to do with the --reject-with-tcp-reset
rule, and has changed replaced the two REJECT rules with a single 

-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited

rule.

This seems to have fixed the sendmail issue, but i'm not sure how or why this
has "made things work", or if this opens us up to different problems.

Can you tell me if we would be better off increasing
net.ipv4.netfilter.ip_conntrack_tcp_max_retrans instead of these rule changes? 
And what benfits/problems may occur with each method.

Thanks.

Comment 26 James Ralston 2007-09-06 16:07:13 UTC

First: *NEVER* reject traffic from subnets you don't control; an attacker could
flood forged packets at you, and you'd wind up flooding the forged victim with
bogus TCP RSTs (or ICMP host-prohibited messages).  Just DROP
unexpected/unwanted traffic.

Second: since you already accept all ESTABLISHED and RELATED TCP traffic, you
should be using --syn on the TCP ACCEPT rules.

To answer your question: you need to see *why* netfilter is dropping the packets
first; enable invalid packet logging:

echo 255 >/proc/sys/net/ipv4/netfilter/ip_conntrack_log_invalid

Additionally, it might be helpful to LOG the packets before you DROP them
(although I suggest something like "-m limit --limit 1/second --limit-burst 25"
if you're going to do that).

Comment 27 Thomas Graf 2008-06-13 22:29:12 UTC

I'm closing this, since the proposed patch can't be included due to kABI reasons
and the workaround of setting ip_conntrack_tcp_max_retrans to 20 seems to work
quite well.

Comment 28 James Ralston 2008-06-13 23:44:57 UTC

Closing with CURRENTRELEASE would probably be more accurate, as the RHEL5 kernel
includes the netfilter patch that fixes the underlying problem (netfilter also
needs to look at the sequence number of the packet being ACKed, not just the
sequence number of the arriving packet).

Comment 29 Thomas Graf 2008-06-14 00:26:15 UTC

You are absolutely right, I should have stated that this issue has been fixed in
RHEL5. I only use CURRENTRELEASE if the fix actually went into a kernel version
of the same major release. I used WONTFIX in this case in order not to give the
impression that this bug is actually fixed in the current RHEL4 release.

Thanks again.

Note You need to log in before you can comment on or make changes to this bug.