Bug 182012 - iptables --state ESTABLISHED rule occasionally misses packets, leads to spurious rejects
Summary: iptables --state ESTABLISHED rule occasionally misses packets, leads to spuri...
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Thomas Graf
QA Contact: Brian Brock
URL: http://www-cse.ucsd.edu/~jbrown/reset/
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-02-19 01:53 UTC by Jeff Brown
Modified: 2014-06-18 08:28 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-11-03 12:57:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jeff Brown 2006-02-19 01:53:39 UTC
Description of problem:

We use an iptables setup which has a rule allowing packets associated with
"ESTABLISHED" connections through, followed by specific rules allowing "NEW"
connections to particular ports, followed by fallthrough rules denying all
traffic.  Very infrequently, packets associated with established connections are
not matched by the early-accept rule; since these packets are not initiating new
connections, they fall through to the reject rules, leading to either spurious
ICMP "prohibited" or TCP reset messages, depending on the particular reject rules.

This behavior occurs with connections that are actively exchanging traffic; idle
timeouts, TCP keepalives, etc. are not at issue.

An example of a problem configuration is
http://www-cse.ucsd.edu/~jbrown/reset/iptables-problem ; a workaround for this
problem is to remove the "-m state --state NEW" qualifier, as seen in
http://www-cse.ucsd.edu/~jbrown/reset/iptables-okay .  With that qualifier
removed, when the "ESTABLISHED" rule fails to match, the packets are still
allowed in by the per-port accept rules.

An example of a spurious TCP reset caused by the problem configuration is
http://www-cse.ucsd.edu/~jbrown/reset/both .  Observing the server end, the
server transmits sequence 4167818615:4167818687, the client responds with
2733189498:2733189546 and ACKs the receipt of the server's packet (4167818687).
 At this point the server sends a spurious RST, followed immediately by a
retransmit of 
4167818615:4167818687 : it's sending data for a connection which it has just
reset, and for which it has already received an ACK.  We surmise that the
ESTABLISHED rule incorrectly failed to match the packet carrying the ACK,
leading to that packet being dropped and an RST being generated by the server's
firewall.  The server's own TCP stack never received the ACK and didn't actually
reset the connection, so it dutifully re-transmitted the data.  At that point,
the client had already received the RST, so it responded with an RST of its own,
and the connection was torn down.

In the default iptables configuration of our RHEL4U2 installation --
http://www-cse.ucsd.edu/~jbrown/reset/iptables-orig -- the occasional failure of
ESTABLISHED to match leads only to a spurious ICMP message, which seems to be
ignored by clients for already-established connections, so no problems are
observed by users.  (We have confirmed via tcpdump that spurious ICMP messages
are in fact generated.)  Our addition of the TCP-reset reject rule led to this
problem impacting users, as their connections would spontaneously be reset.


Version-Release number of selected component (if applicable): RHEL 4 U 2


How reproducible: Unfortunately, this problem does not occur deterministically.
We observe it at varying frequencies from different client networks, with
frequencies ranging from once every few minutes to once every few days.


Steps to Reproduce:
1. Install the given "problem" rule set on a server.
2. SSH in and work away.
3. Probabilistically observe your connection getting dropped.
  
Actual results: "Read from remote host XXXX.SERVER.DOMAIN: Connection reset by peer"

Expected results: (Connections not being reset)


Additional info: kernel version "2.6.9-22.ELsmp #1 SMP", behavior observed both
on single-processor and SMP machines.

Comment 1 Jeff Brown 2006-02-19 02:07:19 UTC
(Possibly related to bug #112709)

Comment 2 Thomas Woerner 2006-04-28 11:59:31 UTC
This is a netfilter kernel problem, not a iptables userland problem.

Assigning to kernel.

Comment 3 Jason Smith 2006-05-10 18:58:16 UTC
I am seeing the exact same problem here.  Since we use stateful iptables
firewall rules on a lot of our servers and this is causing a lot of hung
connection problems, I have asked our RedHat Network representative to open a
formal support ticket.

Comment 4 Steve Snodgrass 2006-05-15 20:44:29 UTC
Jeff, if you're still fighting this take a look at bug #191336 and see if it
sounds like it might explain your problem.  The only issue with this bug is that
it does require there to be a 5-minute idle at some point.  I'd also be
interested to know what your conntrack entry looks like after one of the random
drops - in particular, does the number of packets match what you've seen in the
session or does the conntrack appear to have been destroyed at some point.

Comment 5 Jason Smith 2006-08-22 16:03:43 UTC
It looks like there may be a few separate bugs affecting different people.  In
our case, after some more testing, we discovered that the problem we are having
is this tcp_sack related connection tracking bug that is mentioned in this
netfilter mailing list post, affecting kernels <= 2.6.11:

https://lists.netfilter.org/pipermail/netfilter/2005-June/061101.html

Disabling tcp_sack fixes the problem for us, although this is not a desirable
solution for servers that handle a lot of network traffic or suffer from a lot
of loss since it will increase the retransmits necessary to recover from any
packet loss.

Comment 6 Thomas Graf 2008-06-13 20:52:41 UTC
Are you still experiencing this problem?

Comment 7 Jeff Brown 2008-06-13 21:28:40 UTC
I don't know if recent RHEL4 kernels still exhibit this bug.  We long
ago worked around this bug by re-structuring our firewall rules on
production machines, changing them from the form:

  - accept ESTABLISHED
  - accept inbound NEW to ports x,y,z
  - reject others

...to the form:

  - accept ESTABLISHED
  - accept inbound to ports x,y,z
  - reject others

With the removal of the "NEW" qualifier, when the "ESTABLISHED" test
misses a packet for an existing connection, the per-port accept rules
still allow them, so we no longer encounter spurious rejects.

It's tricky to test conclusively whether it's been fixed, since the
original bug was sporadic, and we never had a particular workload that
would reliably reproduce it.  Sorry.


Comment 8 Thomas Graf 2008-11-03 12:57:49 UTC
I'm closing this bugzilla because I can't reproduce it myself and there is none left to reproduce it either.


Note You need to log in before you can comment on or make changes to this bug.