Red Hat Bugzilla – Bug 1572983
conntrack doesn't track packets in specific network namespace if those packets were processed by CT --notrack target in other network namespace
Last modified: 2018-10-30 05:16:14 EDT
Description of problem: Red Hat OpenStack Platform uses multiple network namespaces to implement virtual networking infrastructure: routers, DHCP servers, firewalls, etc. Those namespaces are interconnected with OVS patches and internal interfaces. We have a problems with one particular type of OpenStack router: Distributed Virtual Router (DVR). DVR is implemented with two namespaces on compute host: qrouter-UUID and fip-UUID. fip-UUID is directly connected to external network, serves as a router between external network and another DVR namespace and sends proxy-ARP replies to ARP requests for floating IP address. qrouter-UUID is directly connected to fip-UUID namespace and linux bridge that is used to emulate network connection to VM. qrouter-UUID implements a set of NAT rules that translate floating IP address to real IP address of VM. At this moment it is impossible to use reference DVR implementation with RHOSP12, which may become a very critical issue as soon as some important customer will run up into it. The problem is described in the summary: OpenStack use stateless firewall in FIP namespace and there is following iptables rule in raw table: -A PREROUTING -j neutron-l3-agent-PREROUTING -A neutron-l3-agent-PREROUTING -j CT --notrack I have used iptables counters and /proc/net/nf_conntrack data in qrouter-UUID namespace to troubleshoot this issue and observed the following things: - traffic from external network to VM: - raw and mangle PREROUTING counters increased, nat counters are not. - connections are not shown in /proc/net/nf_conntrack - traffic from VM to external network: - raw and mangle PREROUTING counters increased, nat counters are not. - connections in /proc/net/nf_conntrack are in UNREPLIED state After notrack rule is removed from raw table of fip-UUID VM gets the whole network connectivity back. PS. This rule was there for a very long time (at least 3 OpenStack releases, so it looks like this issue is caused by recent kernel change). How reproducible: Deploy Red Hat OpenStack 12 with DVR, modify security groups, start VM, assign floating IP and try to ping external destinations (or initiate incoming connections from external network). Actual results: It is impossible. Expected results: It is possible. Additional info: It is stated that RHOSP 12 uses RHEL 7.4, so I have selected 7.4 version. Here is a list of installed kernel packages: rpm -qa | grep ^kernel kernel-3.10.0-862.el7.x86_64 kernel-tools-libs-3.10.0-862.el7.x86_64 kernel-tools-3.10.0-862.el7.x86_64
Here is an OpenStack code that generated notrack rule: https://github.com/openstack/neutron/blob/stable/pike/neutron/agent/l3/dvr_fip_ns.py#L208
Hi Alex, As you already suppose, this is a kernel issue and unrelated to libnetfilter_conntrack. Florian, I'm assigning this to you since you can probably find the cause quickly. Feel free to reassign to me (or someone else) in case you're too busy. Thanks, Phil
Its regression coming from BZ 1317099 and is rhel7 specific. skb_scrub_packet() calls nf_reset() but that only resets skb->nfct, not skb->nfctinfo (upstream, the latter no longer exists so skb->_nfct=0 will clear untracked too).
Hi, There's no reason to mark this bugzilla as private, I hence made it public. - Andreas
FYI. Currently the workaround is to downgrade kernel (tested and worked for original case)
Older meaning 3.10.0-693.21.1.el7
*** Bug 1578889 has been marked as a duplicate of this bug. ***
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing
Patch(es) available on kernel-3.10.0-898.el7
Set two netns ns1 and ns2, add rule -j CT --notrack in ns1. check if DNAT in ns2 works fine. reproducerIPV4: --------------------------------------------------------------------------- set -x ip link del veth_s ip -all netns del ip netns add client ip netns add ns1 ip netns add ns2 ip link add name eth1 netns client type veth peer name eth1 netns ns1 ip link add name eth2 netns ns1 type veth peer name eth1 netns ns2 ip link add name veth_s type veth peer name eth2 netns ns2 for ns in ns1 ns2; do ip netns exec $ns brctl addbr br0 ip netns exec $ns ifconfig br0 up ip netns exec $ns brctl addif br0 eth1 ip netns exec $ns brctl addif br0 eth2 done ip netns exec client ip -4 addr add 10.167.100.2/24 dev eth1 ip -4 addr add 10.167.100.1/24 dev veth_s ip netns exec ns1 ip -4 addr add 10.167.100.254/24 dev br0 ip netns exec ns2 ip -4 addr add 10.167.100.253/24 dev br0 ip netns exec client ip link set lo up ip link set lo up ip netns exec ns1 ip link set lo up ip netns exec ns2 ip link set lo up ip netns exec ns1 ip link set eth1 up ip netns exec ns1 ip link set eth2 up ip netns exec ns1 ip link set br0 up ip netns exec ns2 ip link set eth1 up ip netns exec ns2 ip link set eth2 up ip netns exec ns2 ip link set br0 up ip netns exec client ip link set eth1 up ip link set veth_s up #have to do this ip netns exec ns2 sysctl -w net.ipv4.ip_forward=1 #check topo sleep 3 ip netns exec client ping -c3 10.167.100.1 || { echo "fail init"; exit 1; } modprobe br_netfilter echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables #for ncat find route ip netns exec ns1 ebtables -t nat -A PREROUTING -p arp --arp-ip-dst 10.167.100.4 -j arpreply --arpreply-mac 00:11:22:33:44:55 ip netns exec ns1 iptables -t raw -A PREROUTING -p tcp -j CT --notrack ip netns exec ns2 iptables -t nat -A PREROUTING -d 10.167.100.4 -p tcp -j DNAT --to-destination 10.167.100.1:2001 ncat -4 -l 2001 & sleep 2 ip netns exec ns2 conntrack -F ip netns exec client ncat -4 --send-only 10.167.100.4 2000 <<<"abc123" echo "$?" ip netns exec ns2 conntrack -L -p tcp pkill ncat ----------------------------------------------------------------------------- Reproduce on kernel 3.10.0-862.el7.x86_64 (RHEL-7.5) RESULT: Ncat: Connection timed out. conntrack item in netns ns2 is empty: conntrack v1.4.4 (conntrack-tools): 0 flow entries have been shown. Verify on kenel 3.10.0-898.el7.x86_64 RESULT: Ncat success send "abc123" ++ ip netns exec ns2 conntrack -L -p tcp tcp 6 119 TIME_WAIT src=10.167.100.2 dst=10.167.100.4 sport=46252 dport=2000 src=10.167.100.1 dst=10.167.100.2 sport=2001 dport=46252 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 conntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.
Also have IPV6 reproducer -------------------------------------------------------- set -x ip netns exec client ip link del dev eth1 ip netns exec ns1 ip link del dev eth1 ip netns exec ns1 ip link del dev eth2 ip netns exec ns2 ip link del dev eth1 ip netns exec ns2 ip link del dev eth2 ip netns exec ns1 ip link del dev br0 ip netns exec ns2 ip link del dev br0 ip link del dev veth_bf2_s1 ip -all netns del ip netns add client ip netns add ns1 ip netns add ns2 ip link add name eth1 netns client type veth peer name eth1 netns ns1 ip link add name eth2 netns ns1 type veth peer name eth1 netns ns2 ip link add name veth_bf2_s1 type veth peer name eth2 netns ns2 for ns in ns1 ns2; do ip netns exec $ns brctl addbr br0 ip netns exec $ns ifconfig br0 up ip netns exec $ns brctl addif br0 eth1 ip netns exec $ns brctl addif br0 eth2 done ip netns exec client ip -6 addr add 2001:db8:ffff:100::2/64 dev eth1 ip -6 addr add 2001:db8:ffff:100::1/64 dev veth_bf2_s1 ip netns exec ns1 ip -6 addr add 2001:db8:ffff:100::fffe/64 dev br0 ip netns exec ns2 ip -6 addr add 2001:db8:ffff:100::fffd/64 dev br0 ip netns exec client ip link set lo up ip link set lo up ip netns exec ns1 ip link set lo up ip netns exec ns2 ip link set lo up ip netns exec ns1 ip link set eth1 up ip netns exec ns1 ip link set eth2 up ip netns exec ns1 ip link set br0 up ip netns exec ns2 ip link set eth1 up ip netns exec ns2 ip link set eth2 up ip netns exec ns2 ip link set br0 up ip netns exec client ip link set eth1 up ip link set veth_bf2_s1 up sleep 5 set +x #check topo ip netns exec client ping6 2001:db8:ffff:100::1 -c 3 || { echo "fail init"; exit 1; } modprobe br_netfilter echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables #ip netns exec ns1 ebtables -tnat -A PREROUTING -p arp --arp-ip-dst 2001:db8:ffff:100::4 -j arpreply --arpreply-mac 00:11:22:33:44:55 ip netns exec client ip neigh add 2001:db8:ffff:100::4 lladdr 00:11:22:33:44:55 nud permanent dev eth1 ip netns exec ns1 ip6tables -t raw -A PREROUTING -p tcp -j CT --notrack ip netns exec ns2 ip6tables -t nat -A PREROUTING -d 2001:db8:ffff:100::4 -p tcp -j DNAT --to-destination [2001:db8:ffff:100::1]:2001 ncat -6 -l 2001 & sleep 2 ip netns exec ns2 conntrack -F ip netns exec client ncat -6 --send-only 2001:db8:ffff:100::4 2000 <<<"abc123" echo "$?" pkill ncat ip netns exec ns1 ebtables -t nat -L --Ln --Lc ip netns exec ns2 ip6tables -t nat -L -n -v ip netns exec ns2 conntrack -L -f ipv6 -p tcp -------------------------------------------------------- Reproduce on 3.10.0-862.el7.x86_64 (RHEL-7.5) Ncat: Connection timed out. conntrack v1.4.4 (conntrack-tools): 0 flow entries have been shown. Verify on kernel 3.10.0-898.el7.x86_64 ncat successful send "abc123" conntrack: tcp 6 119 TIME_WAIT src=2001:db8:ffff:100::2 dst=2001:db8:ffff:100::4 sport=51860 dport=2000 src=2001:db8:ffff:100::1 dst=2001:db8:ffff:100::2 sport=2001 dport=51860 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 conntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3083