Bug 1572983 - conntrack doesn't track packets in specific network namespace if those packets were processed by CT --notrack target in other network namespace
Summary: conntrack doesn't track packets in specific network namespace if those packet...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.5
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: 7.6
Assignee: Florian Westphal
QA Contact: yiche
URL:
Whiteboard:
: 1578889 (view as bug list)
Depends On:
Blocks: 1588458 1588938
TreeView+ depends on / blocked
 
Reported: 2018-04-29 12:40 UTC by Alex Stupnikov
Modified: 2020-09-21 09:27 UTC (History)
33 users (show)

Fixed In Version: kernel-3.10.0-898.el7
Doc Type: Bug Fix
Doc Text:
Previously, the connection tracking information was not cleared properly for packets forwarded to another network namespace. Packets that were marked with the "NOTRACK" target in one namespace were excluded from connection tracking even in the new namespace. Consequently, a loss of connectivity occasionally occurred, depending on the packet filtering ruleset of the other network namespaces. This update fixes the nf_reset() function to clear the connection tracking information properly. As a result, configuration properties related to connection tracking in one namespace do not leak into other namespaces, and the connectivity loss due to this behavior no longer occurs.
Clone Of:
: 1588458 1588938 (view as bug list)
Environment:
Last Closed: 2018-10-30 09:09:49 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3480181 0 None None None 2018-06-07 11:18:27 UTC
Red Hat Product Errata RHSA-2018:3083 0 None None None 2018-10-30 09:11:38 UTC

Description Alex Stupnikov 2018-04-29 12:40:10 UTC
Description of problem:

Red Hat OpenStack Platform uses multiple network namespaces to implement virtual networking infrastructure: routers, DHCP servers, firewalls, etc. Those namespaces are interconnected with OVS patches and internal interfaces.

We have a problems with one particular type of OpenStack router: Distributed Virtual Router (DVR). DVR is implemented with two namespaces on compute host: qrouter-UUID and fip-UUID.

fip-UUID is directly connected to external network, serves as a router between external network and another DVR namespace and sends proxy-ARP replies to ARP requests for floating IP address.

qrouter-UUID is directly connected to fip-UUID namespace and linux bridge that is used to emulate network connection to VM. qrouter-UUID implements a set of NAT rules that translate floating IP address to real IP address of VM.

At this moment it is impossible to use reference DVR implementation with RHOSP12, which may become a very critical issue as soon as some important customer will run up into it. The problem is described in the summary: OpenStack use stateless firewall in FIP namespace and there is following iptables rule in raw table:

-A PREROUTING -j neutron-l3-agent-PREROUTING
-A neutron-l3-agent-PREROUTING -j CT --notrack

I have used iptables counters and /proc/net/nf_conntrack data in qrouter-UUID namespace to troubleshoot this issue and observed the following things:

- traffic from external network to VM:
  
  - raw and mangle PREROUTING counters increased, nat counters are not.
  - connections are not shown in /proc/net/nf_conntrack
  
- traffic from VM to external network:
  
  - raw and mangle PREROUTING counters increased, nat counters are not.
  - connections in /proc/net/nf_conntrack are in UNREPLIED state

After notrack rule is removed from raw table of fip-UUID VM gets the whole network connectivity back.


PS. This rule was there for a very long time (at least 3 OpenStack releases, so it looks like this issue is caused by recent kernel change). 


How reproducible:

Deploy Red Hat OpenStack 12 with DVR, modify security groups, start VM, assign floating IP and try to ping external destinations (or initiate incoming connections from external network).


Actual results:

It is impossible.


Expected results:

It is possible.


Additional info:

It is stated that RHOSP 12 uses RHEL 7.4, so I have selected 7.4 version. Here is a list of installed kernel packages:

rpm -qa | grep ^kernel
kernel-3.10.0-862.el7.x86_64
kernel-tools-libs-3.10.0-862.el7.x86_64
kernel-tools-3.10.0-862.el7.x86_64

Comment 3 Alex Stupnikov 2018-04-29 12:57:05 UTC
Here is an OpenStack code that generated notrack rule:

https://github.com/openstack/neutron/blob/stable/pike/neutron/agent/l3/dvr_fip_ns.py#L208

Comment 4 Phil Sutter 2018-04-30 07:28:12 UTC
Hi Alex,

As you already suppose, this is a kernel issue and unrelated to libnetfilter_conntrack.

Florian, I'm assigning this to you since you can probably find the cause quickly. Feel free to reassign to me (or someone else) in case you're too busy.

Thanks, Phil

Comment 5 Florian Westphal 2018-04-30 09:17:16 UTC
Its regression coming from BZ 1317099 and is rhel7 specific.
skb_scrub_packet() calls nf_reset() but that only resets skb->nfct, not skb->nfctinfo (upstream, the latter no longer exists so skb->_nfct=0 will clear untracked too).

Comment 7 Andreas Karis 2018-05-16 22:57:02 UTC
Hi,

There's no reason to mark this bugzilla as private, I hence made it public.

- Andreas

Comment 8 Alex Stupnikov 2018-05-17 06:24:30 UTC
FYI. Currently the workaround is to downgrade kernel (tested and worked for original case)

Comment 9 Andreas Karis 2018-05-17 13:52:30 UTC
Older meaning 3.10.0-693.21.1.el7

Comment 12 Eric Garver 2018-05-24 13:25:31 UTC
*** Bug 1578889 has been marked as a duplicate of this bug. ***

Comment 17 Bruno Meneguele 2018-06-06 13:32:57 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 22 Bruno Meneguele 2018-06-07 19:58:38 UTC
Patch(es) available on kernel-3.10.0-898.el7

Comment 25 yiche 2018-06-14 07:31:24 UTC
Set two netns ns1 and ns2, add rule -j CT --notrack in ns1. check if DNAT in
ns2 works fine.

reproducerIPV4:
---------------------------------------------------------------------------
set -x
ip link del veth_s
ip -all netns del
ip netns add client
ip netns add ns1
ip netns add ns2
ip link add name eth1 netns client type veth peer name eth1 netns ns1
ip link add name eth2 netns ns1 type veth peer name eth1 netns ns2
ip link add name veth_s type veth peer name eth2 netns ns2
for ns in ns1 ns2; do
ip netns exec $ns brctl addbr br0
ip netns exec $ns ifconfig br0 up
ip netns exec $ns brctl addif br0 eth1
ip netns exec $ns brctl addif br0 eth2
done

ip netns exec client  ip -4 addr add 10.167.100.2/24 dev eth1
ip -4 addr add 10.167.100.1/24 dev veth_s
ip netns exec ns1 ip -4 addr add 10.167.100.254/24 dev br0
ip netns exec ns2 ip -4 addr add 10.167.100.253/24 dev br0

ip netns exec client ip link set lo up
ip link set lo up
ip netns exec ns1 ip link set lo up
ip netns exec ns2 ip link set lo up

ip netns exec ns1 ip link set eth1 up
ip netns exec ns1 ip link set eth2 up
ip netns exec ns1 ip link set br0 up
ip netns exec ns2 ip link set eth1 up
ip netns exec ns2 ip link set eth2 up
ip netns exec ns2 ip link set br0 up
ip netns exec client ip link set eth1 up
ip link set veth_s up

#have to do this
ip netns exec ns2 sysctl -w net.ipv4.ip_forward=1

#check topo
sleep 3
ip netns exec client ping -c3 10.167.100.1 || { echo "fail init"; exit 1; }

modprobe br_netfilter
echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables

#for ncat find route
ip netns exec ns1 ebtables -t nat -A PREROUTING -p arp --arp-ip-dst 10.167.100.4 -j arpreply --arpreply-mac 00:11:22:33:44:55

ip netns exec ns1 iptables -t raw -A PREROUTING -p tcp -j CT --notrack

ip netns exec ns2 iptables -t nat -A PREROUTING -d 10.167.100.4 -p tcp -j DNAT --to-destination 10.167.100.1:2001

ncat -4 -l 2001 &
sleep 2
ip netns exec ns2 conntrack -F
ip netns exec client ncat -4 --send-only 10.167.100.4 2000 <<<"abc123"
echo "$?"
ip netns exec ns2 conntrack -L -p tcp

pkill ncat
-----------------------------------------------------------------------------
Reproduce on kernel 3.10.0-862.el7.x86_64 (RHEL-7.5)
RESULT:
Ncat: Connection timed out.

conntrack item in netns ns2 is empty:
conntrack v1.4.4 (conntrack-tools): 0 flow entries have been shown.


Verify on kenel 3.10.0-898.el7.x86_64
RESULT:
Ncat success send "abc123"


++ ip netns exec ns2 conntrack -L -p tcp
tcp      6 119 TIME_WAIT src=10.167.100.2 dst=10.167.100.4 sport=46252 dport=2000 src=10.167.100.1 dst=10.167.100.2 sport=2001 dport=46252 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.

Comment 26 yiche 2018-06-14 07:42:00 UTC
Also have IPV6 reproducer

--------------------------------------------------------
set -x

ip netns exec client ip link del dev eth1
ip netns exec ns1 ip link del dev eth1
ip netns exec ns1 ip link del dev eth2
ip netns exec ns2 ip link del dev eth1
ip netns exec ns2 ip link del dev eth2
ip netns exec ns1 ip link del dev br0
ip netns exec ns2 ip link del dev br0
ip link del dev veth_bf2_s1
ip -all netns del

ip netns add client
ip netns add ns1
ip netns add ns2
ip link add name eth1 netns client type veth peer name eth1 netns ns1
ip link add name eth2 netns ns1 type veth peer name eth1 netns ns2
ip link add name veth_bf2_s1 type veth peer name eth2 netns ns2
for ns in ns1 ns2; do 
ip netns exec $ns brctl addbr br0 
ip netns exec $ns ifconfig br0 up
ip netns exec $ns brctl addif br0 eth1
ip netns exec $ns brctl addif br0 eth2
done

ip netns exec client  ip -6 addr add 2001:db8:ffff:100::2/64 dev eth1
ip -6 addr add 2001:db8:ffff:100::1/64 dev veth_bf2_s1
ip netns exec ns1 ip -6 addr add 2001:db8:ffff:100::fffe/64 dev br0
ip netns exec ns2 ip -6 addr add 2001:db8:ffff:100::fffd/64 dev br0

ip netns exec client ip link set lo up
ip link set lo up
ip netns exec ns1 ip link set lo up
ip netns exec ns2 ip link set lo up

ip netns exec ns1 ip link set eth1 up
ip netns exec ns1 ip link set eth2 up
ip netns exec ns1 ip link set br0 up
ip netns exec ns2 ip link set eth1 up
ip netns exec ns2 ip link set eth2 up
ip netns exec ns2 ip link set br0 up
ip netns exec client ip link set eth1 up
ip link set veth_bf2_s1 up
sleep 5
set +x
#check topo
ip netns exec client ping6 2001:db8:ffff:100::1 -c 3 || { echo "fail init"; exit 1; }
modprobe br_netfilter

echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables

#ip netns exec ns1 ebtables -tnat -A PREROUTING -p arp --arp-ip-dst 2001:db8:ffff:100::4 -j arpreply --arpreply-mac 00:11:22:33:44:55
ip netns exec client ip neigh add 2001:db8:ffff:100::4 lladdr 00:11:22:33:44:55 nud permanent dev eth1
ip netns exec ns1 ip6tables -t raw -A PREROUTING -p tcp -j CT --notrack

ip netns exec ns2 ip6tables -t nat -A PREROUTING -d 2001:db8:ffff:100::4 -p tcp -j DNAT --to-destination [2001:db8:ffff:100::1]:2001

ncat -6 -l 2001 &
sleep 2
ip netns exec ns2 conntrack -F
ip netns exec client ncat -6 --send-only 2001:db8:ffff:100::4 2000 <<<"abc123"
echo "$?"

pkill ncat
ip netns exec ns1 ebtables -t nat -L --Ln --Lc
ip netns exec ns2 ip6tables -t nat -L -n -v
ip netns exec ns2 conntrack -L -f ipv6 -p tcp
--------------------------------------------------------

Reproduce on 3.10.0-862.el7.x86_64 (RHEL-7.5)

Ncat: Connection timed out.
conntrack v1.4.4 (conntrack-tools): 0 flow entries have been shown.

Verify on kernel 3.10.0-898.el7.x86_64
ncat successful send "abc123"
conntrack:
tcp      6 119 TIME_WAIT src=2001:db8:ffff:100::2 dst=2001:db8:ffff:100::4 sport=51860 dport=2000 src=2001:db8:ffff:100::1 dst=2001:db8:ffff:100::2 sport=2001 dport=51860 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.

Comment 37 errata-xmlrpc 2018-10-30 09:09:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3083


Note You need to log in before you can comment on or make changes to this bug.