Bug 174827

Summary: netfilter forgets connections in combination with gre and ipsec
Product: Red Hat Enterprise Linux 4 Reporter: Aleksandar Milivojevic <alex>
Component: kernelAssignee: Thomas Graf <tgraf>
Status: CLOSED CANTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron, rkhan
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-10 12:51:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Aleksandar Milivojevic 2005-12-02 15:51:06 UTC
Description of problem:
Long story short.  Netfilter's ip_conntrack module removes connections from
ip_conntrack table (cat /proc/net/ip_conntrack) for valid and open TCP
connections when GRE and IPsec are used.  This is not the same bug as problem
with IPSec in transport mode and Netfilter (but might be related).  The
Netfilter and IPSec in transport mode bug is about connections never getting
into ESTABLISHED state.  In this case, connections do get into ESTABLISHED
state, everything works for a while, and then suddenly ip_conntrack module
somehow looses track of them (the entry for connection gets removed somehow from
ip_conntrack table).

And now long story with much more details...

I have following configuration.  GRE tunnel between two hosts (vpn1 and vpn2). 
Then GRE tunnel is then encapsulated into IPSec tunnel.  I used IPSec in tunnel
mode to go around other known bugs in Netfilter (otherwise, transport mode would
work just fine).  Both vpn1 and vpn2 have firewall rules on them.  There are
static routes on both ends that route traffic for remote networks into the GRE
tunnel.  GRE is used for two things.  To get around Netfilter bugs (again) and
to have more "natural" routing and simpler firewall rules (there's virtual
interface I can route to).

The firewall rules used can be summarized like this:

iptables -P FORWARD DROP

iptables -N SSH
iptables -A SSH -i vpn0 -o eth0 -s a.b.c.d/m -d e.f.g.h/n -j ACCEPT
iptables -A SSH -j RETURN

iptables -N LOGFWD
iptables -A LOGFWD -j LOG --log-prefix "FORWARD "
iptables -A LOGFWD -j RETURN

iptables -A FORWARD -m state --state ESTABLISHED -j ACCEPT
iptables -A FORWARD -p icmp -m state --state RELATED -j ACCEPT
iptables -A FORWARD -p tcp --syn --sport 1024: --dport 22 -m state -state NEW -j SSH
iptables -A FORWARD -j LOGFWD

The problem that I'm experiencing is that when I coonect (for example, SSH
session) from vpn1 to host on network behind vpn2, the connection gets frozen
after some time (random time intervals as far as I can tell, sometimes it is
after several minutes, sometimes as long as one hour).

At first I was suspecting IPSec and automatic keying problems (because
connections would survive up to one hour, which is rekeying interval).  However,
after some debugging I ruled out this posibility (it happens even if no rekeying
was performed since connection was established).  I found what actually happens
is that Netfilter's connection tracking module (ip_conntrack) on vpn2 removes
the connection from its tables (checked using cat /proc/net/ip_conntrack), and
therefore all subsequent packets belonging to this connection are dropped.

For example, soon after establishing connection, both vpn1 and vpn2 had it in
the connection tracking table:

vpn1# cat /proc/net/ip_conntrack | grep 33763
tcp      6 431988 ESTABLISHED src=192.168.xxx.vpn1 dst=192.168.yyy.zzz
sport=33763 dport=22 packets=185 bytes=14814 src=192.168.yyy.zzz
dst=192.168.xxx.vpn1 sport=22 dport=33763 packets=137 bytes=33934 [ASSURED] use=1

vpn2# cat /proc/net/ip_conntrack | grep 33763
tcp      6 431995 ESTABLISHED src=192.168.xxx.vpn1 dst=192.168.yyy.zzz
sport=33763 dport=22 packets=171 bytes=13654 src=192.168.yyy.zzz
dst=192.168.xxx.vpn1 sport=22 dport=33763 packets=125 bytes=31726 [ASSURED] use=1


After some time, connection gets "frozen".  Checking vpn2 shows it removed the
entry from connection tracking tables, while vpn1 still has it:

vpn1# cat /proc/net/ip_conntrack | grep 33763
tcp      6 431988 ESTABLISHED src=192.168.xxx.vpn1 dst=192.168.yyy.zzz
sport=33763 dport=22 packets=185 bytes=14814 src=192.168.yyy.zzz
dst=192.168.xxx.vpn1 sport=22 dport=33763 packets=137 bytes=33934 [ASSURED] use=1

vpn2# cat /proc/net/ip_conntrack | grep 33763
[empty]

The log files show packets being dropped for this connection (what user sees is
that his connection got "frozen"):

Dec  1 13:02:01 vpn2 kernel: FORWARD IN=vpn0 OUT=eth0 SRC=192.168.xxx.vpn1
DST=192.168.yyy.zzz LEN=100 TOS=0x10 PREC=0x00 TTL=63 ID=56722 DF PROTO=TCP
SPT=33763 DPT=22 WINDOW=8354 RES=0x00 ACK PSH URGP=0
Dec  1 13:02:15 vpn2 kernel: FORWARD IN=vpn0 OUT=eth0 SRC=192.168.xxx.vpn1
DST=192.168.yyy.zzz LEN=100 TOS=0x10 PREC=0x00 TTL=63 ID=56724 DF PROTO=TCP
SPT=33763 DPT=22 WINDOW=8354 RES=0x00 ACK PSH URGP=0
Dec  1 13:02:43 vpn2 kernel: FORWARD IN=vpn0 OUT=eth0 SRC=192.168.xxx.vpn1
DST=192.168.yyy.zzz LEN=100 TOS=0x10 PREC=0x00 TTL=63 ID=56726 DF PROTO=TCP
SPT=33763 DPT=22 WINDOW=8354 RES=0x00 ACK PSH URGP=0
Dec  1 13:03:39 vpn2 kernel: FORWARD IN=vpn0 OUT=eth0 SRC=192.168.xxx.vpn1
DST=192.168.yyy.zzz LEN=100 TOS=0x10 PREC=0x00 TTL=63 ID=56728 DF PROTO=TCP
SPT=33763 DPT=22 WINDOW=8354 RES=0x00 ACK PSH URGP=0
Dec  1 13:05:32 vpn2 kernel: FORWARD IN=vpn0 OUT=eth0 SRC=192.168.xxx.vpn1
DST=192.168.yyy.zzz LEN=100 TOS=0x10 PREC=0x00 TTL=63 ID=56730 DF PROTO=TCP
SPT=33763 DPT=22 WINDOW=8354 RES=0x00 ACK PSH URGP=0
Dec  1 13:07:32 vpn2 kernel: FORWARD IN=vpn0 OUT=eth0 SRC=192.168.xxx.vpn1
DST=192.168.yyy.zzz LEN=100 TOS=0x10 PREC=0x00 TTL=63 ID=56732 DF PROTO=TCP
SPT=33763 DPT=22 WINDOW=8354 RES=0x00 ACK PSH URGP=0
Dec  1 13:09:32 vpn2 kernel: FORWARD IN=vpn0 OUT=eth0 SRC=192.168.xxx.vpn1
DST=192.168.yyy.zzz LEN=100 TOS=0x10 PREC=0x00 TTL=63 ID=56734 DF PROTO=TCP
SPT=33763 DPT=22 WINDOW=8354 RES=0x00 ACK PSH URGP=0
Dec  1 13:13:32 vpn2 kernel: FORWARD IN=vpn0 OUT=eth0 SRC=192.168.xxx.vpn1
DST=192.168.yyy.zzz LEN=100 TOS=0x10 PREC=0x00 TTL=63 ID=56738 DF PROTO=TCP
SPT=33763 DPT=22 WINDOW=8354 RES=0x00 ACK PSH URGP=0
Dec  1 13:15:36 vpn2 kernel: FORWARD IN=vpn0 OUT=eth0 SRC=192.168.xxx.vpn1
DST=192.168.yyy.zzz LEN=100 TOS=0x10 PREC=0x00 TTL=63 ID=56740 DF PROTO=TCP
SPT=33763 DPT=22 WINDOW=8354 RES=0x00 ACK PSH URGP=0
Dec  1 13:17:33 vpn2 kernel: FORWARD IN=vpn0 OUT=eth0 SRC=192.168.xxx.vpn1
DST=192.168.yyy.zzz LEN=100 TOS=0x10 PREC=0x00 TTL=63 ID=56742 DF PROTO=TCP
SPT=33763 DPT=22 WINDOW=8354 RES=0x00 ACK PSH URGP=0
Dec  1 13:19:32 vpn2 kernel: FORWARD IN=vpn0 OUT=eth0 SRC=192.168.xxx.vpn1
DST=192.168.yyy.zzz LEN=100 TOS=0x10 PREC=0x00 TTL=63 ID=56744 DF PROTO=TCP
SPT=33763 DPT=22 WINDOW=8354 RES=0x00 ACK PSH URGP=0
Dec  1 13:23:32 vpn2 kernel: FORWARD IN=vpn0 OUT=eth0 SRC=192.168.xxx.vpn1
DST=192.168.yyy.zzz LEN=100 TOS=0x10 PREC=0x00 TTL=63 ID=56748 DF PROTO=TCP
SPT=33763 DPT=22 WINDOW=8354 RES=0x00 ACK PSH URGP=0
Dec  1 13:25:32 vpn2 kernel: FORWARD IN=vpn0 OUT=eth0 SRC=192.168.xxx.vpn1
DST=192.168.yyy.zzz LEN=100 TOS=0x10 PREC=0x00 TTL=63 ID=56750 DF PROTO=TCP
SPT=33763 DPT=22 WINDOW=8354 RES=0x00 ACK PSH URGP=0
Dec  1 13:27:32 vpn2 kernel: FORWARD IN=vpn0 OUT=eth0 SRC=192.168.xxx.vpn1
DST=192.168.yyy.zzz LEN=100 TOS=0x10 PREC=0x00 TTL=63 ID=56752 DF PROTO=TCP
SPT=33763 DPT=22 WINDOW=8354 RES=0x00 ACK PSH URGP=0

Using tcpdump on both physical and GRE interfaces shows basically same thing
happening.  Packets going into GRE tunnel, leaving eth interface IPSec
encapsulated, arriving to the other end, going out of GRE tunnel interface, and
never getting to internal eth interface (towards the internal network where
destination host is located).  Basically, dropped by firewall rules after they
got out of GRE tunnel.

This is rather strange behaviour.  I don't see any reason why would ip_conntrack
module on vpn2 all the sudden remove valid connection from its connection
tracking tables.  This happens randomly, and I observed it on several different
VPN gateways that were configured this way.

The possible workaround (haven't tested it yet) might be removing the "--syn"
option from firewall rules.  A little bit less security on the firewall though.

Version-Release number of selected component (if applicable):
kernel-2.6.9-22.EL

How reproducible:
Always

Steps to Reproduce:
1. Create configuration similar do descirbed above
2. SSH into remote network from local VPN gateway
3. Wait for connection to "freeze"

Comment 1 Aleksandar Milivojevic 2005-12-05 16:59:24 UTC
I've asked about this problem on Netfilter mailing lists.  Here's the response I
got from Patrick McHardy:

   The problem is the handling of IPsec packets, not GRE. I'm working on
   a couple of patches to resolve this, hopefully I'll finish them in time
   for 2.6.16.


Comment 2 Thomas Graf 2012-05-10 12:51:45 UTC
RHEL4 has entered the Extended Life Phase. There will be no more minor releases.

I'm closing this bug due to inactivity.

Please reopen and provide an explanation if you need this issue to be addressed in RHEL4. Please note that only security and critical bugfixes are considered at this point.