Description of problem: Linux 2.6 kernels before 2.6.11 contain a known bug in the netfilter connection tracking code that is triggered by legitimate sequences of ACK packets. See the attachment. The problem and patch are described here: http://lists.netfilter.org/pipermail/netfilter-devel/2005-January/018241.html Version-Release number of selected component (if applicable): 2.6.9-34.ELsmp, but this bug is present in all currently-released RHEL4 kernels How reproducible: Unfortunately not easy to reproduce unless you have the ability the feed netfilter with arbitrary packet sequences Steps to Reproduce: See the attachment 1 [details]. 2. 3. Actual results: The bug my cause netfilter to mark TCP ACK packets as NEW or INVALID, which may causes session hangs/drops depending on the particular firewall setup. Expected results: Legitimate packets should get through. :-) Additional info: This may be the root cause of other outstanding bugs including bug #161898 and bug #182012.
Created attachment 128865 [details] Detailed description of problem including packet traces
FYI I'm willing to test a development kernel with this patch if I can get one.
Just FYI, we're getting hit by this problem as well. In fact, I suspect *many* sites are experiencing this problem, but they don't realize it because they write off the occasional connection timeout as a fluke.
Test builds are available here: http://people.redhat.com/agospoda/bz/191336/ Please let us know if this resolves your issue.
I'm deploying the test kernel on one of my firewall clusters, I'll let you know how it goes.
I have recently found a problem with these builds and though they will probably work for you, this exact fix will not make its way into RHEL4. I am going to see if I can find a suitable solution and I will update you with new kernels when that happens. Feel free to continue running these kernels and let me know if you see an improvement.
So far it is working well, but it needs more time to be sure since I can't reproduce this on demand. What sort of problem did you find, is it related to the patch or something else entirely?
This patch manages to cause a change in kABI (something RH makes every attempt NOT to do with updates), so a different patch that creates similar functionality is needed before it will be acceptable for an update.
That's unfortunate, because the patch author specifically commented that he couldn't find any way to fix the problem without adding another member to the ip_ct_tcp structure. The only other alternative was to disable the retransmission detection code altogether. RHEL5 will fix this problem (because it will rebase to a 2.6 kernel that already contains the patch), but I'm not sure what alternatives exist in the meantime (other than rolling our own kernels, that is).
Yes, the alternative the changing the data structure is to simply disable the retransmission detection logic that triggers this bug. The retransmit detection is not actually needed for anything; it's an attempt to keep down the size of the conntrack table; in my opinion disabling it is a much better alternative than keeping this bug. See this message (and replies) for some discussion about this option: http://lists.netfilter.org/pipermail/netfilter-devel/2005-January/018244.html If this is acceptable, I will provide a patch to disable this behavior that does not alter the kernel ABI.
While rebuilding the kernel to disable the retransmission detection is one option, it would seem that you could set the sysctl variable net.ipv4.netfilter.ip_conntrack_tcp_max_retran (or procfile /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_max_retrans) to something higher than '3' and you might accomplish a similar goal: # sysctl net.ipv4.netfilter.ip_conntrack_tcp_max_retrans net.ipv4.netfilter.ip_conntrack_tcp_max_retrans = 3 I realize that this option isn't quite as nice as the original patch, but if you have a good guestimate about how many extra packets you are being falsely detected as retransmits then you could try setting this value a little higher than that and see what happens. Have either of your tried this previously?
OK, now I feel like an idiot, I don't know why I never noticed that before. I think because I saw "int ip_ct_tcp_max_retrans = 3;" in the kernel code I was assuming this was a hardcoded value. Except of course the code in ip_conntrack_standalone.c ties this variable into sysctl/proc. Now that you've pointed this out, I think that setting this to some arbitrary high value (e.g. 100) could be a very acceptable workaround. Thus far the test kernel has been working flawlessly for me, but I will revert back to the regular kernel and set this value.
You might want to consider bumping the default value of this parameter up a bit in future 2.6.9-based kernel releases (perhaps in the 5-10 range) in an effort to avoid triggering this bug on regular traffic.
I've been running with "net.ipv4.netfilter.ip_conntrack_tcp_max_retrans = 10" for some time now and it has been working well. I think this is a good workaround for this problem, though I still think you should consider changing the default.
Does this workaround work for everyone? I'd like to close out this issue if that is the case.
I spoke a little too soon - I'm still seeing occasional IMAP session hangs, but the frequency is much less than before. I suspect that setting the retrans value to 10 is still a little low, but I haven't captured a session yet to confirm. I'm going to increase it and see what happens.
I've been running with a value of 20 for a while now and it seems to have eliminated the problem.
I've been using a value of 32 for some time now (because I like numbers that are powers of 2), and it seems to have eliminated the connection drops. It strikes me that netfilter could also be smarter here: if netfilter picks up a connection (because ip_conntrack_tcp_loose is set to 1 or higher), netfilter should implicitly set that connection as "liberal" (regardless of whether the actual ip_conntrack_tcp_be_liberal value is 0 or 1), because netfilter did not see the initial window scale negotiation, and can't reliably determine it. From looking at net/netfilter/nf_conntrack_proto_tcp.c in 2.6.18-1.2869.fc6, netfilter seems to try to figure out the window size whenever it picks up a connection, but it doesn't mark the connection in order to let tcp_in_window() know that the window scale might be wrong. Generally speaking, one could set ip_conntrack_tcp_be_liberal to 1 globally, but that's overkill--if netfilter saw the initial SYN/SYN+ACK, then netfilter *should* reject packets outside of the window. It's only in the case where netfilter didn't see the initial SYN/SYN+ACK that it should assume a ip_conntrack_tcp_be_liberal for that connection.
That's a good suggestion, I'll mention this to Patrick McHardy, the active netfilter maintainer.
Hey all, Has there been a resolution to this at all? I am encountering this problem on a number of machines now, dropping connections. Currently I am trying the workaround to see if that helps. Thanks.
Hi folks. I have a question related to this bug. We've been seeing what i think is the same problem with RHEL-4.5, (kernel-2.6.9-55.0.2, iptables-1.2.11-3.1.RHEL4). Our iptables rules look like this: # Generated by iptables-save v1.2.11 on Wed Sep 5 10:38:31 2007 *raw :PREROUTING ACCEPT [2673477:557179184] :OUTPUT ACCEPT [2667955:561064344] # Completed on Wed Sep 5 10:38:31 2007 # Generated by iptables-save v1.2.11 on Wed Sep 5 10:38:31 2007 *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [2667955:561064344] :RH-Firewall-1-INPUT - [0:0] -A INPUT -j RH-Firewall-1-INPUT -A FORWARD -j RH-Firewall-1-INPUT -A RH-Firewall-1-INPUT -i lo -j ACCEPT -A RH-Firewall-1-INPUT -p icmp -m icmp --icmp-type any -j ACCEPT -A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A RH-Firewall-1-INPUT -s 10.22.242.11 -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m tcp --dport 22 -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m tcp --dport 25 -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m tcp --dport 80 -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m tcp -j REJECT --reject-with tcp-reset -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-port-unreachable COMMIT # Completed on Wed Sep 5 10:38:31 2007 # Generated by iptables-save v1.2.11 on Wed Sep 5 10:38:31 2007 *nat :PREROUTING ACCEPT [6902:2262800] :POSTROUTING ACCEPT [2656:181837] :OUTPUT ACCEPT [2656:181837] COMMIT # Completed on Wed Sep 5 10:38:31 2007 Our symptoms are that outbound SMTP connections to high latency servers are seeing message failures due to TCP resets. I was thinking that increasing the net.ipv4.netfilter.ip_conntrack_tcp_max_retrans as per the comments in this bug would solve the problem. However, a colleague thinks that it has to do with the --reject-with-tcp-reset rule, and has changed replaced the two REJECT rules with a single -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited rule. This seems to have fixed the sendmail issue, but i'm not sure how or why this has "made things work", or if this opens us up to different problems. Can you tell me if we would be better off increasing net.ipv4.netfilter.ip_conntrack_tcp_max_retrans instead of these rule changes? And what benfits/problems may occur with each method. Thanks.
First: *NEVER* reject traffic from subnets you don't control; an attacker could flood forged packets at you, and you'd wind up flooding the forged victim with bogus TCP RSTs (or ICMP host-prohibited messages). Just DROP unexpected/unwanted traffic. Second: since you already accept all ESTABLISHED and RELATED TCP traffic, you should be using --syn on the TCP ACCEPT rules. To answer your question: you need to see *why* netfilter is dropping the packets first; enable invalid packet logging: echo 255 >/proc/sys/net/ipv4/netfilter/ip_conntrack_log_invalid Additionally, it might be helpful to LOG the packets before you DROP them (although I suggest something like "-m limit --limit 1/second --limit-burst 25" if you're going to do that).
I'm closing this, since the proposed patch can't be included due to kABI reasons and the workaround of setting ip_conntrack_tcp_max_retrans to 20 seems to work quite well.
Closing with CURRENTRELEASE would probably be more accurate, as the RHEL5 kernel includes the netfilter patch that fixes the underlying problem (netfilter also needs to look at the sequence number of the packet being ACKed, not just the sequence number of the arriving packet).
You are absolutely right, I should have stated that this issue has been fixed in RHEL5. I only use CURRENTRELEASE if the fix actually went into a kernel version of the same major release. I used WONTFIX in this case in order not to give the impression that this bug is actually fixed in the current RHEL4 release. Thanks again.