Description of problem: There's a potential scenario in Kubernetes where a pod trying to talk to another pod and a service backed by that pod could result in a port collision and a conntrack will refuse to commit the connection. Consider the following UDP traffic flow between a client and server: client (10.129.2.7:5054) -> server (10.129.2.8:5088) This will create an entry in conntrack similar to this: udp,orig=(src=10.129.2.7,dst=10.129.2.8,sport=5054,dport=5088),reply=(src=10.129.2.8,dst=10.129.2.7,sport=5088,dport=5054),zone=85 Around the same time, the client sends a packet (using the same source port) to a Kubernetes service, who's backend is the same server. In OVN this is treated as a load_balancer, and let's assume the VIP on this LB is 172.30.9.90. client (10.129.2.7:5054) -> 172.30.9.90 (OVN LB DNAT) -> server (10.129.2.8:5088) This results in a conntrack entry similar to: udp,orig=(src=10.129.2.7,dst=172.30.9.90,sport=5054,dport=5088),reply=(src=10.129.2.8,dst=10.129.2.7,sport=5088,dport=5054),zone=84,labels=0x2 The problem here is the reply tuple is the same between both sessions. This will result in errors in OVS due to a failure to commit the conntrack entries because of collisions: 2021-03-12T08:12:40.670Z|00004|dpif(handler10)|WARN|system@ovs-system: execute ct(commit,zone=84,label=0/0x1),ct(zone=85),recirc(0x19590) failed (Invalid argument) on packet udp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:07,dl_dst=0a:58:0a:81:02:08,nw_src=10.129.2.7,nw_dst=10.129.2.8,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5054,tp_dst=5088 udp_csum:14c4 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x54),ct_tuple4(src=10.129.2.7,dst=10.129.2.8,proto=17,tp_src=5054,tp_dst=5088),in_port(16) mtu 0 2021-03-12T08:15:12.680Z|00008|dpif(handler13)|WARN|system@ovs-system: execute ct(commit,zone=84,label=0/0x1),ct(zone=85),recirc(0x19697) failed (Invalid argument) on packet udp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:07,dl_dst=0a:58:0a:81:02:08,nw_src=10.129.2.7,nw_dst=10.129.2.8,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5054,tp_dst=5088 udp_csum:d000 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x54),ct_tuple4(src=10.129.2.7,dst=10.129.2.8,proto=17,tp_src=5054,tp_dst=5088),in_port(16) mtu 0 In order to avoid this scenario, openshift-sdn has implemented a SNAT(0.0.0.0) CT flow that will ensure if there is a port collision, that the port is changed: https://github.com/openshift/sdn/pull/269/files#diff-6a28fd6ce020355ee396a0a925f38fd98f5ba7908d2593a65d67b1dd9b341532R117 The request here is implement a similar fix in OVN, so that if there are such port collisions in traffic flow that we workaround it.
Note this depends on OVS supporting the "null nat case", so setting a depends on for https://bugzilla.redhat.com/show_bug.cgi?id=1935663
This also depends on OVS userspace conntrack supporting "null SNAT": https://bugzilla.redhat.com/show_bug.cgi?id=1935666
Created attachment 1765201 [details] OVS flows and groups
OVN only reproducer: ovn-nbctl ls-del ls ovn-nbctl lr-del rtr ovn-nbctl lb-del lb-test ovn-nbctl \ -- lr-add rtr \ -- lrp-add rtr rtr-ls 00:00:00:00:01:00 42.42.42.1/24 \ -- ls-add ls \ -- lsp-add ls ls-rtr \ -- lsp-set-addresses ls-rtr 00:00:00:00:01:00 \ -- lsp-set-type ls-rtr router \ -- lsp-set-options ls-rtr router-port=rtr-ls \ -- lsp-add ls vm1 -- lsp-set-addresses vm1 00:00:00:00:00:01 \ -- lsp-add ls vm2 -- lsp-set-addresses vm2 00:00:00:00:00:02 \ -- lb-add lb-test 66.66.66.66:666 42.42.42.2:4242 tcp \ -- ls-lb-add ls lb-test ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal ovs-vsctl set Interface vm1 external_ids:iface-id=vm1 ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal ovs-vsctl set Interface vm2 external_ids:iface-id=vm2 ip netns add vm1 ip link set vm1 netns vm1 ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01 ip netns exec vm1 ip addr add 42.42.42.2/24 dev vm1 ip netns exec vm1 ip link set vm1 up ip netns exec vm1 ip r a default via 42.42.42.1 ip netns add vm2 ip link set vm2 netns vm2 ip netns exec vm2 ip link set vm2 address 00:00:00:00:00:02 ip netns exec vm2 ip addr add 42.42.42.3/24 dev vm2 ip netns exec vm2 ip link set vm2 up ip netns exec vm2 ip r a default via 42.42.42.1 # Start a TCP listener: ip netns exec vm1 nc -l -k -v 42.42.42.2 4242 # Start a load balanced (DNAT) connection: ip netns exec vm2 nc 66.66.66.66 666 -p 5555 # Start a "regular" (no NAT) connection to the backend (this fails): ip netns exec vm2 nc 42.42.42.2 4242 -p 5555 At this point OVS fails to commit the connection: 2021-03-22T10:27:34.763Z|00001|dpif(handler2)|WARN|system@ovs-system: execute ct(commit,zone=2,label=0/0x1),ct(zone=3),recirc(0x7e) failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.3,nw_dst=42.42.42.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5555,tp_dst=4242,tcp_flags=syn tcp_csum:8379 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x2),ct_tuple4(src=42.42.42.3,dst=42.42.42.2,proto=6,tp_src=5555,tp_dst=4242),in_port(1) mtu 0 Attached OVS flows and groups.
Fix sent as RFC upstream: http://patchwork.ozlabs.org/project/ovn/list/?series=245843&state=* The series is RFC because it still depends on not-yet merged OVS bits.
V1 sent for review: http://patchwork.ozlabs.org/project/ovn/list/?series=247077&state=*
V2 sent for review: http://patchwork.ozlabs.org/project/ovn/list/?series=247934&state=*
v3 sent for review: http://patchwork.ozlabs.org/project/ovn/list/?series=248327&state=*
Merged upstream to git master on 2021-06-18
Patch to bump the OVS submodule in OVN upstream: http://patchwork.ozlabs.org/project/ovn/list/?series=252717&state=* (We still need the OVS patch to be backported to the openvswitch package downstream before we can use the feature downstream).
reproducer: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=127.0.0.1 systemctl restart ovn-controller ovn-nbctl ls-del ls ovn-nbctl lr-del rtr ovn-nbctl lb-del lb-test ovn-nbctl \ -- lr-add rtr \ -- lrp-add rtr rtr-ls 00:00:00:00:01:00 42.42.42.1/24 4242::1/64 \ -- ls-add ls \ -- lsp-add ls ls-rtr \ -- lsp-set-addresses ls-rtr 00:00:00:00:01:00 \ -- lsp-set-type ls-rtr router \ -- lsp-set-options ls-rtr router-port=rtr-ls \ -- lsp-add ls vm1 -- lsp-set-addresses vm1 00:00:00:00:00:01 \ -- lsp-add ls vm2 -- lsp-set-addresses vm2 00:00:00:00:00:02 \ -- lb-add lb-test 66.66.66.66:666 42.42.42.2:4242 tcp \ -- lb-add lb-test [6666::6]:777 [4242::2]:7272 tcp \ -- ls-lb-add ls lb-test ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal ovs-vsctl set Interface vm1 external_ids:iface-id=vm1 ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal ovs-vsctl set Interface vm2 external_ids:iface-id=vm2 ip netns add vm1 ip link set vm1 netns vm1 ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01 ip netns exec vm1 ip addr add 42.42.42.2/24 dev vm1 ip netns exec vm1 ip addr add 4242::2/64 dev vm1 ip netns exec vm1 ip link set vm1 up ip netns exec vm1 ip r a default via 42.42.42.1 ip netns exec vm1 ip -6 route add default via 4242::1 ip netns add vm2 ip link set vm2 netns vm2 ip netns exec vm2 ip link set vm2 address 00:00:00:00:00:02 ip netns exec vm2 ip addr add 42.42.42.3/24 dev vm2 ip netns exec vm2 ip addr add 4242::3/64 dev vm2 ip netns exec vm2 ip link set vm2 up ip netns exec vm2 ip r a default via 42.42.42.1 ip netns exec vm2 ip -6 route add default via 4242::1 # Start a TCP listener: ip netns exec vm1 nc -l -k -v 42.42.42.2 4242 & nc_pid=$! sleep 2 # Start a load balanced (DNAT) connection: ip netns exec vm2 nc 66.66.66.66 666 -p 5555 <<< h # Start a "regular" (no NAT) connection to the backend (this fails): ip netns exec vm2 nc 42.42.42.2 4242 -p 5555 <<< h tail /var/log/openvswitch/ovs-vswitchd.log kill $nc_pid ip netns exec vm1 nc -l -k -v 4242::2 7272 & nc6_pid=$! sleep 2 ip netns exec vm2 nc 6666::6 777 -p 4444 <<< h ip netns exec vm2 nc 4242::2 7272 -p 4444 <<< h tail /var/log/openvswitch/ovs-vswitchd.log kill $nc6_pid reproduced on ovn-2021-21.06.0-12: [root@wsfd-advnetlab18 bz1939676]# rpm -qa | grep -E "openvswitch2.15|ovn-2021" python3-openvswitch2.15-2.15.0-32.el8fdp.x86_64 ovn-2021-central-21.06.0-12.el8fdp.x86_64 openvswitch2.15-2.15.0-32.el8fdp.x86_64 ovn-2021-21.06.0-12.el8fdp.x86_64 ovn-2021-host-21.06.0-12.el8fdp.x86_64 o + ip netns exec vm1 nc -l -k -v 42.42.42.2 4242 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Listening on 42.42.42.2:4242 + ip netns exec vm2 nc 66.66.66.66 666 -p 5555 Ncat: Connection from 42.42.42.3. Ncat: Connection from 42.42.42.3:5555. h + ip netns exec vm2 nc 42.42.42.2 4242 -p 5555 Ncat: Connection timed out. + tail /var/log/openvswitch/ovs-vswitchd.log 2021-08-12T06:51:58.984Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534 2021-08-12T06:51:58.985Z|00031|bridge|INFO|bridge br-int: using datapath ID 0000f2dc7985e3f3 2021-08-12T06:51:58.985Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt" 2021-08-12T06:51:58.997Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1 2021-08-12T06:51:59.029Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2 2021-08-12T06:52:01.538Z|00001|dpif(handler3)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0xd) failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.3,nw_dst=42.42.42.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5555,tp_dst=4242,tcp_flags=syn tcp_csum:4cea with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple4(src=42.42.42.3,dst=42.42.42.2,proto=6,tp_src=5555,tp_dst=4242),in_port(3) mtu 0 2021-08-12T06:52:08.412Z|00035|memory|INFO|159020 kB peak resident set size after 10.0 seconds 2021-08-12T06:52:08.413Z|00036|memory|INFO|handlers:35 ofconns:2 ports:3 revalidators:13 rules:289 udpif keys:24 2021-08-12T06:52:09.864Z|00037|connmgr|INFO|br-int<->unix#0: 286 flow_mods in the 1 s starting 10 s ago (285 adds, 1 deletes) + kill 244589 + nc6_pid=244604 + sleep 2 + ip netns exec vm1 nc -l -k -v 4242::2 7272 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Listening on 4242::2:7272 rep.sh: line 65: 244589 Terminated ip netns exec vm1 nc -l -k -v 42.42.42.2 4242 + ip netns exec vm2 nc 6666::6 777 -p 4444 Ncat: Connection from 4242::3. Ncat: Connection from 4242::3:4444. h + ip netns exec vm2 nc 4242::2 7272 -p 4444 Ncat: Connection timed out. + tail /var/log/openvswitch/ovs-vswitchd.log 2021-08-12T06:51:59.029Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2 2021-08-12T06:52:01.538Z|00001|dpif(handler3)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0xd) failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.3,nw_dst=42.42.42.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5555,tp_dst=4242,tcp_flags=syn tcp_csum:4cea with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple4(src=42.42.42.3,dst=42.42.42.2,proto=6,tp_src=5555,tp_dst=4242),in_port(3) mtu 0 2021-08-12T06:52:08.412Z|00035|memory|INFO|159020 kB peak resident set size after 10.0 seconds 2021-08-12T06:52:08.413Z|00036|memory|INFO|handlers:35 ofconns:2 ports:3 revalidators:13 rules:289 udpif keys:24 2021-08-12T06:52:09.864Z|00037|connmgr|INFO|br-int<->unix#0: 286 flow_mods in the 1 s starting 10 s ago (285 adds, 1 deletes) 2021-08-12T06:52:13.651Z|00001|dpif(handler2)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0xd) failed (Invalid argument) on packet tcp6,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,ipv6_src=4242::3,ipv6_dst=4242::2,ipv6_label=0x712c6,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=4444,tp_dst=7272,tcp_flags=syn tcp_csum:f2fe with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple6(src=4242::3,dst=4242::2,proto=6,src_port=4444,dst_port=7272),in_port(3) mtu 0 2021-08-12T06:52:20.741Z|00001|dpif(handler1)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0x1f) failed (Invalid argument) on packet tcp6,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,ipv6_src=4242::3,ipv6_dst=4242::2,ipv6_label=0x8464e,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=4444,tp_dst=7272,tcp_flags=syn tcp_csum:d74b with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple6(src=4242::3,dst=4242::2,proto=6,src_port=4444,dst_port=7272),in_port(3) mtu 0 Verified on ovn-2021-21.06.0-18: [root@wsfd-advnetlab18 bz1939676]# rpm -qa | grep -E "openvswitch2.15|ovn-2021" python3-openvswitch2.15-2.15.0-32.el8fdp.x86_64 ovn-2021-host-21.06.0-18.el8fdp.x86_64 openvswitch2.15-2.15.0-32.el8fdp.x86_64 ovn-2021-central-21.06.0-18.el8fdp.x86_64 ovn-2021-21.06.0-18.el8fdp.x86_64 + ip netns exec vm1 nc -l -k -v 42.42.42.2 4242 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Listening on 42.42.42.2:4242 + ip netns exec vm2 nc 66.66.66.66 666 -p 5555 Ncat: Connection from 42.42.42.3. Ncat: Connection from 42.42.42.3:5555. h + ip netns exec vm2 nc 42.42.42.2 4242 -p 5555 Ncat: Connection from 42.42.42.3. Ncat: Connection from 42.42.42.3:23557. h + tail /var/log/openvswitch/ovs-vswitchd.log 2021-08-12T06:54:12.743Z|00025|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label 2021-08-12T06:54:12.743Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat 2021-08-12T06:54:12.743Z|00027|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple 2021-08-12T06:54:12.743Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6 2021-08-12T06:54:12.743Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions 2021-08-12T06:54:12.833Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534 2021-08-12T06:54:12.833Z|00031|bridge|INFO|bridge br-int: using datapath ID 00001e939193077f 2021-08-12T06:54:12.834Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt" 2021-08-12T06:54:12.844Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1 2021-08-12T06:54:12.879Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2 + kill 245598 + nc6_pid=245605 + sleep 2 + ip netns exec vm1 nc -l -k -v 4242::2 7272 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Listening on 4242::2:7272 rep.sh: line 65: 245598 Terminated ip netns exec vm1 nc -l -k -v 42.42.42.2 4242 + ip netns exec vm2 nc 6666::6 777 -p 4444 Ncat: Connection from 4242::3. Ncat: Connection from 4242::3:4444. h + ip netns exec vm2 nc 4242::2 7272 -p 4444 Ncat: Connection from 4242::3. Ncat: Connection from 4242::3:41666. h + tail /var/log/openvswitch/ovs-vswitchd.log 2021-08-12T06:54:12.743Z|00025|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label 2021-08-12T06:54:12.743Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat 2021-08-12T06:54:12.743Z|00027|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple 2021-08-12T06:54:12.743Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6 2021-08-12T06:54:12.743Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions 2021-08-12T06:54:12.833Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534 2021-08-12T06:54:12.833Z|00031|bridge|INFO|bridge br-int: using datapath ID 00001e939193077f 2021-08-12T06:54:12.834Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt" 2021-08-12T06:54:12.844Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1 2021-08-12T06:54:12.879Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2
also verified on ovn2.13-20.12.0-173: [root@wsfd-advnetlab18 bz1939676]# rpm -qa | grep -E "openvswitch2.13|ovn2.13" ovn2.13-20.12.0-173.el8fdp.x86_64 ovn2.13-host-20.12.0-173.el8fdp.x86_64 ovn2.13-central-20.12.0-173.el8fdp.x86_64 + ip netns exec vm1 nc -l -k -v 42.42.42.2 4242 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Listening on 42.42.42.2:4242 + ip netns exec vm2 nc 66.66.66.66 666 -p 5555 Ncat: Connection from 42.42.42.3. Ncat: Connection from 42.42.42.3:5555. h + ip netns exec vm2 nc 42.42.42.2 4242 -p 5555 Ncat: Connection from 42.42.42.3. Ncat: Connection from 42.42.42.3:54791. h + tail /var/log/openvswitch/ovs-vswitchd.log 2021-08-12T06:56:11.010Z|00025|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label 2021-08-12T06:56:11.010Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat 2021-08-12T06:56:11.010Z|00027|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple 2021-08-12T06:56:11.010Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6 2021-08-12T06:56:11.010Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions 2021-08-12T06:56:11.107Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534 2021-08-12T06:56:11.107Z|00031|bridge|INFO|bridge br-int: using datapath ID 00009a276b4396fe 2021-08-12T06:56:11.107Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt" 2021-08-12T06:56:11.119Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1 2021-08-12T06:56:11.161Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2 + kill 247066 + nc6_pid=247071 + sleep 2 + ip netns exec vm1 nc -l -k -v 4242::2 7272 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Listening on 4242::2:7272 rep.sh: line 65: 247066 Terminated ip netns exec vm1 nc -l -k -v 42.42.42.2 4242 + ip netns exec vm2 nc 6666::6 777 -p 4444 Ncat: Connection from 4242::3. Ncat: Connection from 4242::3:4444. h + ip netns exec vm2 nc 4242::2 7272 -p 4444 Ncat: Connection from 4242::3. Ncat: Connection from 4242::3:8915. h + tail /var/log/openvswitch/ovs-vswitchd.log 2021-08-12T06:56:11.010Z|00025|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label 2021-08-12T06:56:11.010Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat 2021-08-12T06:56:11.010Z|00027|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple 2021-08-12T06:56:11.010Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6 2021-08-12T06:56:11.010Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions 2021-08-12T06:56:11.107Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534 2021-08-12T06:56:11.107Z|00031|bridge|INFO|bridge br-int: using datapath ID 00009a276b4396fe 2021-08-12T06:56:11.107Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt" 2021-08-12T06:56:11.119Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1 2021-08-12T06:56:11.161Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2
also verified on also verified on ovn2.13-20.12.0-173.el7
the issue occurred again on ovn-2021-21.06.0-24: [root@wsfd-advnetlab16 bz1939676]# rpm -qa | grep -E "openvswitch2.15|ovn-2021" openvswitch2.15-2.15.0-35.el8fdp.x86_64 ovn-2021-central-21.06.0-24.el8fdp.x86_64 python3-openvswitch2.15-2.15.0-35.el8fdp.x86_64 ovn-2021-21.06.0-24.el8fdp.x86_64 ovn-2021-host-21.06.0-24.el8fdp.x86_64 + ip netns exec vm2 nc 66.66.66.66 666 -p 5555 Ncat: Connection from 42.42.42.3. Ncat: Connection from 42.42.42.3:5555. h + ip netns exec vm2 nc 42.42.42.2 4242 -p 5555 Ncat: Connection timed out. + tail /var/log/openvswitch/ovs-vswitchd.log 2021-09-02T09:03:06.499Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534 2021-09-02T09:03:06.499Z|00031|bridge|INFO|bridge br-int: using datapath ID 0000f2f331e03706 2021-09-02T09:03:06.499Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt" 2021-09-02T09:03:06.511Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1 2021-09-02T09:03:06.543Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2 2021-09-02T09:03:09.065Z|00001|dpif(handler2)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0xc) failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.3,nw_dst=42.42.42.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5555,tp_dst=4242,tcp_flags=syn tcp_csum:68bd with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple4(src=42.42.42.3,dst=42.42.42.2,proto=6,tp_src=5555,tp_dst=4242),in_port(3) mtu 0 2021-09-02T09:03:15.897Z|00035|memory|INFO|159312 kB peak resident set size after 10.0 seconds 2021-09-02T09:03:15.897Z|00036|memory|INFO|handlers:35 ofconns:2 ports:3 revalidators:13 rules:287 udpif keys:19 2021-09-02T09:03:17.393Z|00037|connmgr|INFO|br-int<->unix#0: 284 flow_mods in the 1 s starting 10 s ago (283 adds, 1 deletes) it seems that https://bugzilla.redhat.com/show_bug.cgi?id=1992705 revert a patch related to this bug: 58683a42 (ovn-controller: Handle DNAT/no-NAT conntrack tuple collisions.)
Fix got reverted from older releases but is available (and will stay) in ovn21.09-21.09.0-17.el8fdp. Moving back to ON_QA and updating "Fixed in version" value.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:9044