Bug 1939676 - OVN should handle DNAT collisions by SNAT'ing port
Summary: OVN should handle DNAT collisions by SNAT'ing port
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: RHEL 8.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Dumitru Ceara
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On: 1935663 1935666 1992012
Blocks: 1939045
TreeView+ depends on / blocked
 
Reported: 2021-03-16 19:33 UTC by Tim Rozet
Modified: 2022-12-15 00:21 UTC (History)
13 users (show)

Fixed In Version: ovn21.09-21.09.0-17.el8fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-15 00:21:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
OVS flows and groups (59.99 KB, text/plain)
2021-03-22 10:34 UTC, Dumitru Ceara
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1135 0 None None None 2021-08-10 20:20:24 UTC
Red Hat Product Errata RHBA-2022:9044 0 None None None 2022-12-15 00:21:52 UTC

Internal Links: 1939045

Description Tim Rozet 2021-03-16 19:33:21 UTC
Description of problem:
There's a potential scenario in Kubernetes where a pod trying to talk to another pod and a service backed by that pod could result in a port collision and a conntrack will refuse to commit the connection. Consider the following UDP traffic flow between a client and server:

client (10.129.2.7:5054) -> server (10.129.2.8:5088)

This will create an entry in conntrack similar to this:

udp,orig=(src=10.129.2.7,dst=10.129.2.8,sport=5054,dport=5088),reply=(src=10.129.2.8,dst=10.129.2.7,sport=5088,dport=5054),zone=85

Around the same time, the client sends a packet (using the same source port) to a Kubernetes service, who's backend is the same server. In OVN this is treated as a load_balancer, and let's assume the VIP on this LB is 172.30.9.90.

client (10.129.2.7:5054) -> 172.30.9.90 (OVN LB DNAT) -> server (10.129.2.8:5088)

This results in a conntrack entry similar to:
udp,orig=(src=10.129.2.7,dst=172.30.9.90,sport=5054,dport=5088),reply=(src=10.129.2.8,dst=10.129.2.7,sport=5088,dport=5054),zone=84,labels=0x2

The problem here is the reply tuple is the same between both sessions. This will result in errors in OVS due to a failure to commit the conntrack entries because of collisions:
2021-03-12T08:12:40.670Z|00004|dpif(handler10)|WARN|system@ovs-system: execute ct(commit,zone=84,label=0/0x1),ct(zone=85),recirc(0x19590) failed (Invalid argument) on packet udp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:07,dl_dst=0a:58:0a:81:02:08,nw_src=10.129.2.7,nw_dst=10.129.2.8,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5054,tp_dst=5088 udp_csum:14c4
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x54),ct_tuple4(src=10.129.2.7,dst=10.129.2.8,proto=17,tp_src=5054,tp_dst=5088),in_port(16) mtu 0
2021-03-12T08:15:12.680Z|00008|dpif(handler13)|WARN|system@ovs-system: execute ct(commit,zone=84,label=0/0x1),ct(zone=85),recirc(0x19697) failed (Invalid argument) on packet udp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:07,dl_dst=0a:58:0a:81:02:08,nw_src=10.129.2.7,nw_dst=10.129.2.8,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5054,tp_dst=5088 udp_csum:d000
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x54),ct_tuple4(src=10.129.2.7,dst=10.129.2.8,proto=17,tp_src=5054,tp_dst=5088),in_port(16) mtu 0

In order to avoid this scenario, openshift-sdn has implemented a SNAT(0.0.0.0) CT flow that will ensure if there is a port collision, that the port is changed:
https://github.com/openshift/sdn/pull/269/files#diff-6a28fd6ce020355ee396a0a925f38fd98f5ba7908d2593a65d67b1dd9b341532R117

The request here is implement a similar fix in OVN, so that if there are such port collisions in traffic flow that we workaround it.

Comment 1 Tim Rozet 2021-03-16 19:37:34 UTC
Note this depends on OVS supporting the "null nat case", so setting a depends on for https://bugzilla.redhat.com/show_bug.cgi?id=1935663

Comment 3 Dumitru Ceara 2021-03-17 15:35:11 UTC
This also depends on OVS userspace conntrack supporting "null SNAT": https://bugzilla.redhat.com/show_bug.cgi?id=1935666

Comment 6 Dumitru Ceara 2021-03-22 10:34:19 UTC
Created attachment 1765201 [details]
OVS flows and groups

Comment 7 Dumitru Ceara 2021-03-22 10:35:36 UTC
OVN only reproducer:

ovn-nbctl ls-del ls
ovn-nbctl lr-del rtr
ovn-nbctl lb-del lb-test

ovn-nbctl \
    -- lr-add rtr \
    -- lrp-add rtr rtr-ls 00:00:00:00:01:00 42.42.42.1/24 \
    -- ls-add ls \
    -- lsp-add ls ls-rtr \
    -- lsp-set-addresses ls-rtr 00:00:00:00:01:00 \
    -- lsp-set-type ls-rtr router \
    -- lsp-set-options ls-rtr router-port=rtr-ls \
    -- lsp-add ls vm1 -- lsp-set-addresses vm1 00:00:00:00:00:01 \
    -- lsp-add ls vm2 -- lsp-set-addresses vm2 00:00:00:00:00:02 \
    -- lb-add lb-test 66.66.66.66:666 42.42.42.2:4242 tcp \
    -- ls-lb-add ls lb-test

ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
ovs-vsctl set Interface vm2 external_ids:iface-id=vm2

ip netns add vm1
ip link set vm1 netns vm1
ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01
ip netns exec vm1 ip addr add 42.42.42.2/24 dev vm1
ip netns exec vm1 ip link set vm1 up
ip netns exec vm1 ip r a default via 42.42.42.1
ip netns add vm2
ip link set vm2 netns vm2
ip netns exec vm2 ip link set vm2 address 00:00:00:00:00:02
ip netns exec vm2 ip addr add 42.42.42.3/24 dev vm2
ip netns exec vm2 ip link set vm2 up
ip netns exec vm2 ip r a default via 42.42.42.1

# Start a TCP listener:
ip netns exec vm1 nc -l -k -v 42.42.42.2 4242

# Start a load balanced (DNAT) connection:
ip netns exec vm2 nc 66.66.66.66 666 -p 5555

# Start a "regular" (no NAT) connection to the backend (this fails):
ip netns exec vm2 nc 42.42.42.2 4242 -p 5555


At this point OVS fails to commit the connection:
2021-03-22T10:27:34.763Z|00001|dpif(handler2)|WARN|system@ovs-system: execute ct(commit,zone=2,label=0/0x1),ct(zone=3),recirc(0x7e) failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.3,nw_dst=42.42.42.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5555,tp_dst=4242,tcp_flags=syn tcp_csum:8379
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x2),ct_tuple4(src=42.42.42.3,dst=42.42.42.2,proto=6,tp_src=5555,tp_dst=4242),in_port(1) mtu 0


Attached OVS flows and groups.

Comment 9 Dumitru Ceara 2021-05-26 13:40:13 UTC
Fix sent as RFC upstream: http://patchwork.ozlabs.org/project/ovn/list/?series=245843&state=*

The series is RFC because it still depends on not-yet merged OVS bits.

Comment 10 Dumitru Ceara 2021-06-03 15:07:32 UTC
V1 sent for review: http://patchwork.ozlabs.org/project/ovn/list/?series=247077&state=*

Comment 11 Dumitru Ceara 2021-06-09 12:14:18 UTC
V2 sent for review: http://patchwork.ozlabs.org/project/ovn/list/?series=247934&state=*

Comment 12 Dumitru Ceara 2021-06-11 10:27:50 UTC
v3 sent for review: http://patchwork.ozlabs.org/project/ovn/list/?series=248327&state=*

Comment 13 Dan Williams 2021-07-01 04:34:19 UTC
Merged upstream to git master on 2021-06-18

Comment 14 Dumitru Ceara 2021-07-09 09:24:22 UTC
Patch to bump the OVS submodule in OVN upstream: http://patchwork.ozlabs.org/project/ovn/list/?series=252717&state=*

(We still need the OVS patch to be backported to the openvswitch package downstream before we can use the feature downstream).

Comment 19 Jianlin Shi 2021-08-12 06:54:50 UTC
reproducer:

systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641                         
ovn-sbctl set-connection ptcp:6642                 
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=127.0.0.1
systemctl restart ovn-controller    
                                               
ovn-nbctl ls-del ls                                  
ovn-nbctl lr-del rtr
ovn-nbctl lb-del lb-test 
                                                           
ovn-nbctl \                                        
    -- lr-add rtr \                             
    -- lrp-add rtr rtr-ls 00:00:00:00:01:00 42.42.42.1/24 4242::1/64 \
    -- ls-add ls \                             
    -- lsp-add ls ls-rtr \                           
    -- lsp-set-addresses ls-rtr 00:00:00:00:01:00 \
    -- lsp-set-type ls-rtr router \
    -- lsp-set-options ls-rtr router-port=rtr-ls \
    -- lsp-add ls vm1 -- lsp-set-addresses vm1 00:00:00:00:00:01 \
    -- lsp-add ls vm2 -- lsp-set-addresses vm2 00:00:00:00:00:02 \
    -- lb-add lb-test 66.66.66.66:666 42.42.42.2:4242 tcp \
    -- lb-add lb-test [6666::6]:777 [4242::2]:7272 tcp \
    -- ls-lb-add ls lb-test                        

ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal    
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
ovs-vsctl set Interface vm2 external_ids:iface-id=vm2

ip netns add vm1
ip link set vm1 netns vm1
ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01
ip netns exec vm1 ip addr add 42.42.42.2/24 dev vm1
ip netns exec vm1 ip addr add 4242::2/64 dev vm1
ip netns exec vm1 ip link set vm1 up
ip netns exec vm1 ip r a default via 42.42.42.1
ip netns exec vm1 ip -6 route add default via 4242::1
ip netns add vm2                          
ip link set vm2 netns vm2
ip netns exec vm2 ip link set vm2 address 00:00:00:00:00:02
ip netns exec vm2 ip addr add 42.42.42.3/24 dev vm2
ip netns exec vm2 ip addr add 4242::3/64 dev vm2
ip netns exec vm2 ip link set vm2 up
ip netns exec vm2 ip r a default via 42.42.42.1
ip netns exec vm2 ip -6 route add default via 4242::1

# Start a TCP listener:
ip netns exec vm1 nc -l -k -v 42.42.42.2 4242 &
nc_pid=$!
sleep 2

# Start a load balanced (DNAT) connection:
ip netns exec vm2 nc 66.66.66.66 666 -p 5555 <<< h

# Start a "regular" (no NAT) connection to the backend (this fails):
ip netns exec vm2 nc 42.42.42.2 4242 -p 5555 <<< h

tail /var/log/openvswitch/ovs-vswitchd.log

kill $nc_pid

ip netns exec vm1 nc -l -k -v 4242::2 7272 &
nc6_pid=$!
sleep 2

ip netns exec vm2 nc 6666::6 777 -p 4444 <<< h
ip netns exec vm2 nc 4242::2 7272 -p 4444 <<< h
tail /var/log/openvswitch/ovs-vswitchd.log
kill $nc6_pid

reproduced on ovn-2021-21.06.0-12:

[root@wsfd-advnetlab18 bz1939676]# rpm -qa | grep -E "openvswitch2.15|ovn-2021"
python3-openvswitch2.15-2.15.0-32.el8fdp.x86_64
ovn-2021-central-21.06.0-12.el8fdp.x86_64
openvswitch2.15-2.15.0-32.el8fdp.x86_64
ovn-2021-21.06.0-12.el8fdp.x86_64
ovn-2021-host-21.06.0-12.el8fdp.x86_64
o
+ ip netns exec vm1 nc -l -k -v 42.42.42.2 4242
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Listening on 42.42.42.2:4242
+ ip netns exec vm2 nc 66.66.66.66 666 -p 5555
Ncat: Connection from 42.42.42.3.
Ncat: Connection from 42.42.42.3:5555.
h
+ ip netns exec vm2 nc 42.42.42.2 4242 -p 5555
Ncat: Connection timed out.
+ tail /var/log/openvswitch/ovs-vswitchd.log
2021-08-12T06:51:58.984Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534
2021-08-12T06:51:58.985Z|00031|bridge|INFO|bridge br-int: using datapath ID 0000f2dc7985e3f3
2021-08-12T06:51:58.985Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2021-08-12T06:51:58.997Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1
2021-08-12T06:51:59.029Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2
2021-08-12T06:52:01.538Z|00001|dpif(handler3)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0xd) failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.3,nw_dst=42.42.42.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5555,tp_dst=4242,tcp_flags=syn tcp_csum:4cea
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple4(src=42.42.42.3,dst=42.42.42.2,proto=6,tp_src=5555,tp_dst=4242),in_port(3) mtu 0
2021-08-12T06:52:08.412Z|00035|memory|INFO|159020 kB peak resident set size after 10.0 seconds
2021-08-12T06:52:08.413Z|00036|memory|INFO|handlers:35 ofconns:2 ports:3 revalidators:13 rules:289 udpif keys:24
2021-08-12T06:52:09.864Z|00037|connmgr|INFO|br-int<->unix#0: 286 flow_mods in the 1 s starting 10 s ago (285 adds, 1 deletes)
+ kill 244589
+ nc6_pid=244604
+ sleep 2
+ ip netns exec vm1 nc -l -k -v 4242::2 7272
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Listening on 4242::2:7272
rep.sh: line 65: 244589 Terminated              ip netns exec vm1 nc -l -k -v 42.42.42.2 4242
+ ip netns exec vm2 nc 6666::6 777 -p 4444
Ncat: Connection from 4242::3.
Ncat: Connection from 4242::3:4444.
h
+ ip netns exec vm2 nc 4242::2 7272 -p 4444
Ncat: Connection timed out.
+ tail /var/log/openvswitch/ovs-vswitchd.log
2021-08-12T06:51:59.029Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2
2021-08-12T06:52:01.538Z|00001|dpif(handler3)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0xd) failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.3,nw_dst=42.42.42.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5555,tp_dst=4242,tcp_flags=syn tcp_csum:4cea
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple4(src=42.42.42.3,dst=42.42.42.2,proto=6,tp_src=5555,tp_dst=4242),in_port(3) mtu 0
2021-08-12T06:52:08.412Z|00035|memory|INFO|159020 kB peak resident set size after 10.0 seconds
2021-08-12T06:52:08.413Z|00036|memory|INFO|handlers:35 ofconns:2 ports:3 revalidators:13 rules:289 udpif keys:24
2021-08-12T06:52:09.864Z|00037|connmgr|INFO|br-int<->unix#0: 286 flow_mods in the 1 s starting 10 s ago (285 adds, 1 deletes)
2021-08-12T06:52:13.651Z|00001|dpif(handler2)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0xd) failed (Invalid argument) on packet tcp6,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,ipv6_src=4242::3,ipv6_dst=4242::2,ipv6_label=0x712c6,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=4444,tp_dst=7272,tcp_flags=syn tcp_csum:f2fe
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple6(src=4242::3,dst=4242::2,proto=6,src_port=4444,dst_port=7272),in_port(3) mtu 0
2021-08-12T06:52:20.741Z|00001|dpif(handler1)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0x1f) failed (Invalid argument) on packet tcp6,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,ipv6_src=4242::3,ipv6_dst=4242::2,ipv6_label=0x8464e,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=4444,tp_dst=7272,tcp_flags=syn tcp_csum:d74b
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple6(src=4242::3,dst=4242::2,proto=6,src_port=4444,dst_port=7272),in_port(3) mtu 0

Verified on ovn-2021-21.06.0-18:

[root@wsfd-advnetlab18 bz1939676]# rpm -qa | grep -E "openvswitch2.15|ovn-2021"
python3-openvswitch2.15-2.15.0-32.el8fdp.x86_64
ovn-2021-host-21.06.0-18.el8fdp.x86_64
openvswitch2.15-2.15.0-32.el8fdp.x86_64
ovn-2021-central-21.06.0-18.el8fdp.x86_64
ovn-2021-21.06.0-18.el8fdp.x86_64

+ ip netns exec vm1 nc -l -k -v 42.42.42.2 4242
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Listening on 42.42.42.2:4242
+ ip netns exec vm2 nc 66.66.66.66 666 -p 5555
Ncat: Connection from 42.42.42.3.
Ncat: Connection from 42.42.42.3:5555.
h
+ ip netns exec vm2 nc 42.42.42.2 4242 -p 5555
Ncat: Connection from 42.42.42.3.
Ncat: Connection from 42.42.42.3:23557.
h
+ tail /var/log/openvswitch/ovs-vswitchd.log
2021-08-12T06:54:12.743Z|00025|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label
2021-08-12T06:54:12.743Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat
2021-08-12T06:54:12.743Z|00027|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple
2021-08-12T06:54:12.743Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6
2021-08-12T06:54:12.743Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions
2021-08-12T06:54:12.833Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534
2021-08-12T06:54:12.833Z|00031|bridge|INFO|bridge br-int: using datapath ID 00001e939193077f
2021-08-12T06:54:12.834Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2021-08-12T06:54:12.844Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1
2021-08-12T06:54:12.879Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2
+ kill 245598
+ nc6_pid=245605
+ sleep 2
+ ip netns exec vm1 nc -l -k -v 4242::2 7272
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Listening on 4242::2:7272
rep.sh: line 65: 245598 Terminated              ip netns exec vm1 nc -l -k -v 42.42.42.2 4242
+ ip netns exec vm2 nc 6666::6 777 -p 4444
Ncat: Connection from 4242::3.
Ncat: Connection from 4242::3:4444.
h
+ ip netns exec vm2 nc 4242::2 7272 -p 4444
Ncat: Connection from 4242::3.
Ncat: Connection from 4242::3:41666.
h
+ tail /var/log/openvswitch/ovs-vswitchd.log
2021-08-12T06:54:12.743Z|00025|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label
2021-08-12T06:54:12.743Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat
2021-08-12T06:54:12.743Z|00027|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple
2021-08-12T06:54:12.743Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6
2021-08-12T06:54:12.743Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions
2021-08-12T06:54:12.833Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534
2021-08-12T06:54:12.833Z|00031|bridge|INFO|bridge br-int: using datapath ID 00001e939193077f
2021-08-12T06:54:12.834Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2021-08-12T06:54:12.844Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1
2021-08-12T06:54:12.879Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2

Comment 20 Jianlin Shi 2021-08-12 06:57:06 UTC
also verified on ovn2.13-20.12.0-173:

[root@wsfd-advnetlab18 bz1939676]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
ovn2.13-20.12.0-173.el8fdp.x86_64
ovn2.13-host-20.12.0-173.el8fdp.x86_64
ovn2.13-central-20.12.0-173.el8fdp.x86_64

+ ip netns exec vm1 nc -l -k -v 42.42.42.2 4242
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Listening on 42.42.42.2:4242
+ ip netns exec vm2 nc 66.66.66.66 666 -p 5555
Ncat: Connection from 42.42.42.3.
Ncat: Connection from 42.42.42.3:5555.
h
+ ip netns exec vm2 nc 42.42.42.2 4242 -p 5555
Ncat: Connection from 42.42.42.3.
Ncat: Connection from 42.42.42.3:54791.
h
+ tail /var/log/openvswitch/ovs-vswitchd.log
2021-08-12T06:56:11.010Z|00025|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label
2021-08-12T06:56:11.010Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat
2021-08-12T06:56:11.010Z|00027|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple
2021-08-12T06:56:11.010Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6
2021-08-12T06:56:11.010Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions
2021-08-12T06:56:11.107Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534
2021-08-12T06:56:11.107Z|00031|bridge|INFO|bridge br-int: using datapath ID 00009a276b4396fe
2021-08-12T06:56:11.107Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2021-08-12T06:56:11.119Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1
2021-08-12T06:56:11.161Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2
+ kill 247066
+ nc6_pid=247071
+ sleep 2
+ ip netns exec vm1 nc -l -k -v 4242::2 7272
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Listening on 4242::2:7272
rep.sh: line 65: 247066 Terminated              ip netns exec vm1 nc -l -k -v 42.42.42.2 4242
+ ip netns exec vm2 nc 6666::6 777 -p 4444
Ncat: Connection from 4242::3.
Ncat: Connection from 4242::3:4444.
h
+ ip netns exec vm2 nc 4242::2 7272 -p 4444
Ncat: Connection from 4242::3.
Ncat: Connection from 4242::3:8915.
h
+ tail /var/log/openvswitch/ovs-vswitchd.log
2021-08-12T06:56:11.010Z|00025|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label
2021-08-12T06:56:11.010Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat
2021-08-12T06:56:11.010Z|00027|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple
2021-08-12T06:56:11.010Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6
2021-08-12T06:56:11.010Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions
2021-08-12T06:56:11.107Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534
2021-08-12T06:56:11.107Z|00031|bridge|INFO|bridge br-int: using datapath ID 00009a276b4396fe
2021-08-12T06:56:11.107Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2021-08-12T06:56:11.119Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1
2021-08-12T06:56:11.161Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2

Comment 21 Jianlin Shi 2021-08-12 08:32:42 UTC
also verified on also verified on ovn2.13-20.12.0-173.el7

Comment 22 Jianlin Shi 2021-09-02 09:06:32 UTC
the issue occurred again on ovn-2021-21.06.0-24:

[root@wsfd-advnetlab16 bz1939676]# rpm -qa | grep -E "openvswitch2.15|ovn-2021"
openvswitch2.15-2.15.0-35.el8fdp.x86_64
ovn-2021-central-21.06.0-24.el8fdp.x86_64
python3-openvswitch2.15-2.15.0-35.el8fdp.x86_64
ovn-2021-21.06.0-24.el8fdp.x86_64
ovn-2021-host-21.06.0-24.el8fdp.x86_64

+ ip netns exec vm2 nc 66.66.66.66 666 -p 5555
Ncat: Connection from 42.42.42.3.
Ncat: Connection from 42.42.42.3:5555.
h
+ ip netns exec vm2 nc 42.42.42.2 4242 -p 5555                                                        
Ncat: Connection timed out.
+ tail /var/log/openvswitch/ovs-vswitchd.log                                                          
2021-09-02T09:03:06.499Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534        
2021-09-02T09:03:06.499Z|00031|bridge|INFO|bridge br-int: using datapath ID 0000f2f331e03706          
2021-09-02T09:03:06.499Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2021-09-02T09:03:06.511Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1               
2021-09-02T09:03:06.543Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2               
2021-09-02T09:03:09.065Z|00001|dpif(handler2)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0xc) failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.3,nw_dst=42.42.42.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5555,tp_dst=4242,tcp_flags=syn tcp_csum:68bd
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple4(src=42.42.42.3,dst=42.42.42.2,proto=6,tp_src=5555,tp_dst=4242),in_port(3) mtu 0
2021-09-02T09:03:15.897Z|00035|memory|INFO|159312 kB peak resident set size after 10.0 seconds        
2021-09-02T09:03:15.897Z|00036|memory|INFO|handlers:35 ofconns:2 ports:3 revalidators:13 rules:287 udpif keys:19
2021-09-02T09:03:17.393Z|00037|connmgr|INFO|br-int<->unix#0: 284 flow_mods in the 1 s starting 10 s ago (283 adds, 1 deletes)

it seems that https://bugzilla.redhat.com/show_bug.cgi?id=1992705 revert a patch related to this bug: 58683a42 (ovn-controller: Handle DNAT/no-NAT conntrack tuple collisions.)

Comment 23 Dumitru Ceara 2021-09-06 09:02:10 UTC
Fix got reverted from older releases but is available (and will stay) in ovn21.09-21.09.0-17.el8fdp.

Moving back to ON_QA and updating "Fixed in version" value.

Comment 30 errata-xmlrpc 2022-12-15 00:21:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:9044


Note You need to log in before you can comment on or make changes to this bug.