Description of problem: We use an iptables setup which has a rule allowing packets associated with "ESTABLISHED" connections through, followed by specific rules allowing "NEW" connections to particular ports, followed by fallthrough rules denying all traffic. Very infrequently, packets associated with established connections are not matched by the early-accept rule; since these packets are not initiating new connections, they fall through to the reject rules, leading to either spurious ICMP "prohibited" or TCP reset messages, depending on the particular reject rules. This behavior occurs with connections that are actively exchanging traffic; idle timeouts, TCP keepalives, etc. are not at issue. An example of a problem configuration is http://www-cse.ucsd.edu/~jbrown/reset/iptables-problem ; a workaround for this problem is to remove the "-m state --state NEW" qualifier, as seen in http://www-cse.ucsd.edu/~jbrown/reset/iptables-okay . With that qualifier removed, when the "ESTABLISHED" rule fails to match, the packets are still allowed in by the per-port accept rules. An example of a spurious TCP reset caused by the problem configuration is http://www-cse.ucsd.edu/~jbrown/reset/both . Observing the server end, the server transmits sequence 4167818615:4167818687, the client responds with 2733189498:2733189546 and ACKs the receipt of the server's packet (4167818687). At this point the server sends a spurious RST, followed immediately by a retransmit of 4167818615:4167818687 : it's sending data for a connection which it has just reset, and for which it has already received an ACK. We surmise that the ESTABLISHED rule incorrectly failed to match the packet carrying the ACK, leading to that packet being dropped and an RST being generated by the server's firewall. The server's own TCP stack never received the ACK and didn't actually reset the connection, so it dutifully re-transmitted the data. At that point, the client had already received the RST, so it responded with an RST of its own, and the connection was torn down. In the default iptables configuration of our RHEL4U2 installation -- http://www-cse.ucsd.edu/~jbrown/reset/iptables-orig -- the occasional failure of ESTABLISHED to match leads only to a spurious ICMP message, which seems to be ignored by clients for already-established connections, so no problems are observed by users. (We have confirmed via tcpdump that spurious ICMP messages are in fact generated.) Our addition of the TCP-reset reject rule led to this problem impacting users, as their connections would spontaneously be reset. Version-Release number of selected component (if applicable): RHEL 4 U 2 How reproducible: Unfortunately, this problem does not occur deterministically. We observe it at varying frequencies from different client networks, with frequencies ranging from once every few minutes to once every few days. Steps to Reproduce: 1. Install the given "problem" rule set on a server. 2. SSH in and work away. 3. Probabilistically observe your connection getting dropped. Actual results: "Read from remote host XXXX.SERVER.DOMAIN: Connection reset by peer" Expected results: (Connections not being reset) Additional info: kernel version "2.6.9-22.ELsmp #1 SMP", behavior observed both on single-processor and SMP machines.
(Possibly related to bug #112709)
This is a netfilter kernel problem, not a iptables userland problem. Assigning to kernel.
I am seeing the exact same problem here. Since we use stateful iptables firewall rules on a lot of our servers and this is causing a lot of hung connection problems, I have asked our RedHat Network representative to open a formal support ticket.
Jeff, if you're still fighting this take a look at bug #191336 and see if it sounds like it might explain your problem. The only issue with this bug is that it does require there to be a 5-minute idle at some point. I'd also be interested to know what your conntrack entry looks like after one of the random drops - in particular, does the number of packets match what you've seen in the session or does the conntrack appear to have been destroyed at some point.
It looks like there may be a few separate bugs affecting different people. In our case, after some more testing, we discovered that the problem we are having is this tcp_sack related connection tracking bug that is mentioned in this netfilter mailing list post, affecting kernels <= 2.6.11: https://lists.netfilter.org/pipermail/netfilter/2005-June/061101.html Disabling tcp_sack fixes the problem for us, although this is not a desirable solution for servers that handle a lot of network traffic or suffer from a lot of loss since it will increase the retransmits necessary to recover from any packet loss.
Are you still experiencing this problem?
I don't know if recent RHEL4 kernels still exhibit this bug. We long ago worked around this bug by re-structuring our firewall rules on production machines, changing them from the form: - accept ESTABLISHED - accept inbound NEW to ports x,y,z - reject others ...to the form: - accept ESTABLISHED - accept inbound to ports x,y,z - reject others With the removal of the "NEW" qualifier, when the "ESTABLISHED" test misses a packet for an existing connection, the per-port accept rules still allow them, so we no longer encounter spurious rejects. It's tricky to test conclusively whether it's been fixed, since the original bug was sporadic, and we never had a particular workload that would reliably reproduce it. Sorry.
I'm closing this bugzilla because I can't reproduce it myself and there is none left to reproduce it either.