Bug 1939676 - OVN should handle DNAT collisions by SNAT'ing port
Summary: OVN should handle DNAT collisions by SNAT'ing port
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: RHEL 8.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Dumitru Ceara
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On: 1935663 1935666 1992012
Blocks: 1939045
TreeView+ depends on / blocked
 
Reported: 2021-03-16 19:33 UTC by Tim Rozet
Modified: 2022-07-26 12:44 UTC (History)
12 users (show)

Fixed In Version: ovn21.09-21.09.0-17.el8fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
OVS flows and groups (59.99 KB, text/plain)
2021-03-22 10:34 UTC, Dumitru Ceara
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1135 0 None None None 2021-08-10 20:20:24 UTC

Internal Links: 1939045

Description Tim Rozet 2021-03-16 19:33:21 UTC
Description of problem:
There's a potential scenario in Kubernetes where a pod trying to talk to another pod and a service backed by that pod could result in a port collision and a conntrack will refuse to commit the connection. Consider the following UDP traffic flow between a client and server:

client (10.129.2.7:5054) -> server (10.129.2.8:5088)

This will create an entry in conntrack similar to this:

udp,orig=(src=10.129.2.7,dst=10.129.2.8,sport=5054,dport=5088),reply=(src=10.129.2.8,dst=10.129.2.7,sport=5088,dport=5054),zone=85

Around the same time, the client sends a packet (using the same source port) to a Kubernetes service, who's backend is the same server. In OVN this is treated as a load_balancer, and let's assume the VIP on this LB is 172.30.9.90.

client (10.129.2.7:5054) -> 172.30.9.90 (OVN LB DNAT) -> server (10.129.2.8:5088)

This results in a conntrack entry similar to:
udp,orig=(src=10.129.2.7,dst=172.30.9.90,sport=5054,dport=5088),reply=(src=10.129.2.8,dst=10.129.2.7,sport=5088,dport=5054),zone=84,labels=0x2

The problem here is the reply tuple is the same between both sessions. This will result in errors in OVS due to a failure to commit the conntrack entries because of collisions:
2021-03-12T08:12:40.670Z|00004|dpif(handler10)|WARN|system@ovs-system: execute ct(commit,zone=84,label=0/0x1),ct(zone=85),recirc(0x19590) failed (Invalid argument) on packet udp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:07,dl_dst=0a:58:0a:81:02:08,nw_src=10.129.2.7,nw_dst=10.129.2.8,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5054,tp_dst=5088 udp_csum:14c4
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x54),ct_tuple4(src=10.129.2.7,dst=10.129.2.8,proto=17,tp_src=5054,tp_dst=5088),in_port(16) mtu 0
2021-03-12T08:15:12.680Z|00008|dpif(handler13)|WARN|system@ovs-system: execute ct(commit,zone=84,label=0/0x1),ct(zone=85),recirc(0x19697) failed (Invalid argument) on packet udp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:07,dl_dst=0a:58:0a:81:02:08,nw_src=10.129.2.7,nw_dst=10.129.2.8,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5054,tp_dst=5088 udp_csum:d000
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x54),ct_tuple4(src=10.129.2.7,dst=10.129.2.8,proto=17,tp_src=5054,tp_dst=5088),in_port(16) mtu 0

In order to avoid this scenario, openshift-sdn has implemented a SNAT(0.0.0.0) CT flow that will ensure if there is a port collision, that the port is changed:
https://github.com/openshift/sdn/pull/269/files#diff-6a28fd6ce020355ee396a0a925f38fd98f5ba7908d2593a65d67b1dd9b341532R117

The request here is implement a similar fix in OVN, so that if there are such port collisions in traffic flow that we workaround it.

Comment 1 Tim Rozet 2021-03-16 19:37:34 UTC
Note this depends on OVS supporting the "null nat case", so setting a depends on for https://bugzilla.redhat.com/show_bug.cgi?id=1935663

Comment 3 Dumitru Ceara 2021-03-17 15:35:11 UTC
This also depends on OVS userspace conntrack supporting "null SNAT": https://bugzilla.redhat.com/show_bug.cgi?id=1935666

Comment 6 Dumitru Ceara 2021-03-22 10:34:19 UTC
Created attachment 1765201 [details]
OVS flows and groups

Comment 7 Dumitru Ceara 2021-03-22 10:35:36 UTC
OVN only reproducer:

ovn-nbctl ls-del ls
ovn-nbctl lr-del rtr
ovn-nbctl lb-del lb-test

ovn-nbctl \
    -- lr-add rtr \
    -- lrp-add rtr rtr-ls 00:00:00:00:01:00 42.42.42.1/24 \
    -- ls-add ls \
    -- lsp-add ls ls-rtr \
    -- lsp-set-addresses ls-rtr 00:00:00:00:01:00 \
    -- lsp-set-type ls-rtr router \
    -- lsp-set-options ls-rtr router-port=rtr-ls \
    -- lsp-add ls vm1 -- lsp-set-addresses vm1 00:00:00:00:00:01 \
    -- lsp-add ls vm2 -- lsp-set-addresses vm2 00:00:00:00:00:02 \
    -- lb-add lb-test 66.66.66.66:666 42.42.42.2:4242 tcp \
    -- ls-lb-add ls lb-test

ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
ovs-vsctl set Interface vm2 external_ids:iface-id=vm2

ip netns add vm1
ip link set vm1 netns vm1
ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01
ip netns exec vm1 ip addr add 42.42.42.2/24 dev vm1
ip netns exec vm1 ip link set vm1 up
ip netns exec vm1 ip r a default via 42.42.42.1
ip netns add vm2
ip link set vm2 netns vm2
ip netns exec vm2 ip link set vm2 address 00:00:00:00:00:02
ip netns exec vm2 ip addr add 42.42.42.3/24 dev vm2
ip netns exec vm2 ip link set vm2 up
ip netns exec vm2 ip r a default via 42.42.42.1

# Start a TCP listener:
ip netns exec vm1 nc -l -k -v 42.42.42.2 4242

# Start a load balanced (DNAT) connection:
ip netns exec vm2 nc 66.66.66.66 666 -p 5555

# Start a "regular" (no NAT) connection to the backend (this fails):
ip netns exec vm2 nc 42.42.42.2 4242 -p 5555


At this point OVS fails to commit the connection:
2021-03-22T10:27:34.763Z|00001|dpif(handler2)|WARN|system@ovs-system: execute ct(commit,zone=2,label=0/0x1),ct(zone=3),recirc(0x7e) failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.3,nw_dst=42.42.42.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5555,tp_dst=4242,tcp_flags=syn tcp_csum:8379
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x2),ct_tuple4(src=42.42.42.3,dst=42.42.42.2,proto=6,tp_src=5555,tp_dst=4242),in_port(1) mtu 0


Attached OVS flows and groups.

Comment 9 Dumitru Ceara 2021-05-26 13:40:13 UTC
Fix sent as RFC upstream: http://patchwork.ozlabs.org/project/ovn/list/?series=245843&state=*

The series is RFC because it still depends on not-yet merged OVS bits.

Comment 10 Dumitru Ceara 2021-06-03 15:07:32 UTC
V1 sent for review: http://patchwork.ozlabs.org/project/ovn/list/?series=247077&state=*

Comment 11 Dumitru Ceara 2021-06-09 12:14:18 UTC
V2 sent for review: http://patchwork.ozlabs.org/project/ovn/list/?series=247934&state=*

Comment 12 Dumitru Ceara 2021-06-11 10:27:50 UTC
v3 sent for review: http://patchwork.ozlabs.org/project/ovn/list/?series=248327&state=*

Comment 13 Dan Williams 2021-07-01 04:34:19 UTC
Merged upstream to git master on 2021-06-18

Comment 14 Dumitru Ceara 2021-07-09 09:24:22 UTC
Patch to bump the OVS submodule in OVN upstream: http://patchwork.ozlabs.org/project/ovn/list/?series=252717&state=*

(We still need the OVS patch to be backported to the openvswitch package downstream before we can use the feature downstream).

Comment 19 Jianlin Shi 2021-08-12 06:54:50 UTC
reproducer:

systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641                         
ovn-sbctl set-connection ptcp:6642                 
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=127.0.0.1
systemctl restart ovn-controller    
                                               
ovn-nbctl ls-del ls                                  
ovn-nbctl lr-del rtr
ovn-nbctl lb-del lb-test 
                                                           
ovn-nbctl \                                        
    -- lr-add rtr \                             
    -- lrp-add rtr rtr-ls 00:00:00:00:01:00 42.42.42.1/24 4242::1/64 \
    -- ls-add ls \                             
    -- lsp-add ls ls-rtr \                           
    -- lsp-set-addresses ls-rtr 00:00:00:00:01:00 \
    -- lsp-set-type ls-rtr router \
    -- lsp-set-options ls-rtr router-port=rtr-ls \
    -- lsp-add ls vm1 -- lsp-set-addresses vm1 00:00:00:00:00:01 \
    -- lsp-add ls vm2 -- lsp-set-addresses vm2 00:00:00:00:00:02 \
    -- lb-add lb-test 66.66.66.66:666 42.42.42.2:4242 tcp \
    -- lb-add lb-test [6666::6]:777 [4242::2]:7272 tcp \
    -- ls-lb-add ls lb-test                        

ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal    
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
ovs-vsctl set Interface vm2 external_ids:iface-id=vm2

ip netns add vm1
ip link set vm1 netns vm1
ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01
ip netns exec vm1 ip addr add 42.42.42.2/24 dev vm1
ip netns exec vm1 ip addr add 4242::2/64 dev vm1
ip netns exec vm1 ip link set vm1 up
ip netns exec vm1 ip r a default via 42.42.42.1
ip netns exec vm1 ip -6 route add default via 4242::1
ip netns add vm2                          
ip link set vm2 netns vm2
ip netns exec vm2 ip link set vm2 address 00:00:00:00:00:02
ip netns exec vm2 ip addr add 42.42.42.3/24 dev vm2
ip netns exec vm2 ip addr add 4242::3/64 dev vm2
ip netns exec vm2 ip link set vm2 up
ip netns exec vm2 ip r a default via 42.42.42.1
ip netns exec vm2 ip -6 route add default via 4242::1

# Start a TCP listener:
ip netns exec vm1 nc -l -k -v 42.42.42.2 4242 &
nc_pid=$!
sleep 2

# Start a load balanced (DNAT) connection:
ip netns exec vm2 nc 66.66.66.66 666 -p 5555 <<< h

# Start a "regular" (no NAT) connection to the backend (this fails):
ip netns exec vm2 nc 42.42.42.2 4242 -p 5555 <<< h

tail /var/log/openvswitch/ovs-vswitchd.log

kill $nc_pid

ip netns exec vm1 nc -l -k -v 4242::2 7272 &
nc6_pid=$!
sleep 2

ip netns exec vm2 nc 6666::6 777 -p 4444 <<< h
ip netns exec vm2 nc 4242::2 7272 -p 4444 <<< h
tail /var/log/openvswitch/ovs-vswitchd.log
kill $nc6_pid

reproduced on ovn-2021-21.06.0-12:

[root@wsfd-advnetlab18 bz1939676]# rpm -qa | grep -E "openvswitch2.15|ovn-2021"
python3-openvswitch2.15-2.15.0-32.el8fdp.x86_64
ovn-2021-central-21.06.0-12.el8fdp.x86_64
openvswitch2.15-2.15.0-32.el8fdp.x86_64
ovn-2021-21.06.0-12.el8fdp.x86_64
ovn-2021-host-21.06.0-12.el8fdp.x86_64
o
+ ip netns exec vm1 nc -l -k -v 42.42.42.2 4242
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Listening on 42.42.42.2:4242
+ ip netns exec vm2 nc 66.66.66.66 666 -p 5555
Ncat: Connection from 42.42.42.3.
Ncat: Connection from 42.42.42.3:5555.
h
+ ip netns exec vm2 nc 42.42.42.2 4242 -p 5555
Ncat: Connection timed out.
+ tail /var/log/openvswitch/ovs-vswitchd.log
2021-08-12T06:51:58.984Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534
2021-08-12T06:51:58.985Z|00031|bridge|INFO|bridge br-int: using datapath ID 0000f2dc7985e3f3
2021-08-12T06:51:58.985Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2021-08-12T06:51:58.997Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1
2021-08-12T06:51:59.029Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2
2021-08-12T06:52:01.538Z|00001|dpif(handler3)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0xd) failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.3,nw_dst=42.42.42.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5555,tp_dst=4242,tcp_flags=syn tcp_csum:4cea
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple4(src=42.42.42.3,dst=42.42.42.2,proto=6,tp_src=5555,tp_dst=4242),in_port(3) mtu 0
2021-08-12T06:52:08.412Z|00035|memory|INFO|159020 kB peak resident set size after 10.0 seconds
2021-08-12T06:52:08.413Z|00036|memory|INFO|handlers:35 ofconns:2 ports:3 revalidators:13 rules:289 udpif keys:24
2021-08-12T06:52:09.864Z|00037|connmgr|INFO|br-int<->unix#0: 286 flow_mods in the 1 s starting 10 s ago (285 adds, 1 deletes)
+ kill 244589
+ nc6_pid=244604
+ sleep 2
+ ip netns exec vm1 nc -l -k -v 4242::2 7272
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Listening on 4242::2:7272
rep.sh: line 65: 244589 Terminated              ip netns exec vm1 nc -l -k -v 42.42.42.2 4242
+ ip netns exec vm2 nc 6666::6 777 -p 4444
Ncat: Connection from 4242::3.
Ncat: Connection from 4242::3:4444.
h
+ ip netns exec vm2 nc 4242::2 7272 -p 4444
Ncat: Connection timed out.
+ tail /var/log/openvswitch/ovs-vswitchd.log
2021-08-12T06:51:59.029Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2
2021-08-12T06:52:01.538Z|00001|dpif(handler3)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0xd) failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.3,nw_dst=42.42.42.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5555,tp_dst=4242,tcp_flags=syn tcp_csum:4cea
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple4(src=42.42.42.3,dst=42.42.42.2,proto=6,tp_src=5555,tp_dst=4242),in_port(3) mtu 0
2021-08-12T06:52:08.412Z|00035|memory|INFO|159020 kB peak resident set size after 10.0 seconds
2021-08-12T06:52:08.413Z|00036|memory|INFO|handlers:35 ofconns:2 ports:3 revalidators:13 rules:289 udpif keys:24
2021-08-12T06:52:09.864Z|00037|connmgr|INFO|br-int<->unix#0: 286 flow_mods in the 1 s starting 10 s ago (285 adds, 1 deletes)
2021-08-12T06:52:13.651Z|00001|dpif(handler2)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0xd) failed (Invalid argument) on packet tcp6,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,ipv6_src=4242::3,ipv6_dst=4242::2,ipv6_label=0x712c6,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=4444,tp_dst=7272,tcp_flags=syn tcp_csum:f2fe
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple6(src=4242::3,dst=4242::2,proto=6,src_port=4444,dst_port=7272),in_port(3) mtu 0
2021-08-12T06:52:20.741Z|00001|dpif(handler1)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0x1f) failed (Invalid argument) on packet tcp6,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,ipv6_src=4242::3,ipv6_dst=4242::2,ipv6_label=0x8464e,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=4444,tp_dst=7272,tcp_flags=syn tcp_csum:d74b
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple6(src=4242::3,dst=4242::2,proto=6,src_port=4444,dst_port=7272),in_port(3) mtu 0

Verified on ovn-2021-21.06.0-18:

[root@wsfd-advnetlab18 bz1939676]# rpm -qa | grep -E "openvswitch2.15|ovn-2021"
python3-openvswitch2.15-2.15.0-32.el8fdp.x86_64
ovn-2021-host-21.06.0-18.el8fdp.x86_64
openvswitch2.15-2.15.0-32.el8fdp.x86_64
ovn-2021-central-21.06.0-18.el8fdp.x86_64
ovn-2021-21.06.0-18.el8fdp.x86_64

+ ip netns exec vm1 nc -l -k -v 42.42.42.2 4242
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Listening on 42.42.42.2:4242
+ ip netns exec vm2 nc 66.66.66.66 666 -p 5555
Ncat: Connection from 42.42.42.3.
Ncat: Connection from 42.42.42.3:5555.
h
+ ip netns exec vm2 nc 42.42.42.2 4242 -p 5555
Ncat: Connection from 42.42.42.3.
Ncat: Connection from 42.42.42.3:23557.
h
+ tail /var/log/openvswitch/ovs-vswitchd.log
2021-08-12T06:54:12.743Z|00025|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label
2021-08-12T06:54:12.743Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat
2021-08-12T06:54:12.743Z|00027|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple
2021-08-12T06:54:12.743Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6
2021-08-12T06:54:12.743Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions
2021-08-12T06:54:12.833Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534
2021-08-12T06:54:12.833Z|00031|bridge|INFO|bridge br-int: using datapath ID 00001e939193077f
2021-08-12T06:54:12.834Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2021-08-12T06:54:12.844Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1
2021-08-12T06:54:12.879Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2
+ kill 245598
+ nc6_pid=245605
+ sleep 2
+ ip netns exec vm1 nc -l -k -v 4242::2 7272
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Listening on 4242::2:7272
rep.sh: line 65: 245598 Terminated              ip netns exec vm1 nc -l -k -v 42.42.42.2 4242
+ ip netns exec vm2 nc 6666::6 777 -p 4444
Ncat: Connection from 4242::3.
Ncat: Connection from 4242::3:4444.
h
+ ip netns exec vm2 nc 4242::2 7272 -p 4444
Ncat: Connection from 4242::3.
Ncat: Connection from 4242::3:41666.
h
+ tail /var/log/openvswitch/ovs-vswitchd.log
2021-08-12T06:54:12.743Z|00025|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label
2021-08-12T06:54:12.743Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat
2021-08-12T06:54:12.743Z|00027|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple
2021-08-12T06:54:12.743Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6
2021-08-12T06:54:12.743Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions
2021-08-12T06:54:12.833Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534
2021-08-12T06:54:12.833Z|00031|bridge|INFO|bridge br-int: using datapath ID 00001e939193077f
2021-08-12T06:54:12.834Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2021-08-12T06:54:12.844Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1
2021-08-12T06:54:12.879Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2

Comment 20 Jianlin Shi 2021-08-12 06:57:06 UTC
also verified on ovn2.13-20.12.0-173:

[root@wsfd-advnetlab18 bz1939676]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
ovn2.13-20.12.0-173.el8fdp.x86_64
ovn2.13-host-20.12.0-173.el8fdp.x86_64
ovn2.13-central-20.12.0-173.el8fdp.x86_64

+ ip netns exec vm1 nc -l -k -v 42.42.42.2 4242
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Listening on 42.42.42.2:4242
+ ip netns exec vm2 nc 66.66.66.66 666 -p 5555
Ncat: Connection from 42.42.42.3.
Ncat: Connection from 42.42.42.3:5555.
h
+ ip netns exec vm2 nc 42.42.42.2 4242 -p 5555
Ncat: Connection from 42.42.42.3.
Ncat: Connection from 42.42.42.3:54791.
h
+ tail /var/log/openvswitch/ovs-vswitchd.log
2021-08-12T06:56:11.010Z|00025|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label
2021-08-12T06:56:11.010Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat
2021-08-12T06:56:11.010Z|00027|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple
2021-08-12T06:56:11.010Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6
2021-08-12T06:56:11.010Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions
2021-08-12T06:56:11.107Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534
2021-08-12T06:56:11.107Z|00031|bridge|INFO|bridge br-int: using datapath ID 00009a276b4396fe
2021-08-12T06:56:11.107Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2021-08-12T06:56:11.119Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1
2021-08-12T06:56:11.161Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2
+ kill 247066
+ nc6_pid=247071
+ sleep 2
+ ip netns exec vm1 nc -l -k -v 4242::2 7272
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Listening on 4242::2:7272
rep.sh: line 65: 247066 Terminated              ip netns exec vm1 nc -l -k -v 42.42.42.2 4242
+ ip netns exec vm2 nc 6666::6 777 -p 4444
Ncat: Connection from 4242::3.
Ncat: Connection from 4242::3:4444.
h
+ ip netns exec vm2 nc 4242::2 7272 -p 4444
Ncat: Connection from 4242::3.
Ncat: Connection from 4242::3:8915.
h
+ tail /var/log/openvswitch/ovs-vswitchd.log
2021-08-12T06:56:11.010Z|00025|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label
2021-08-12T06:56:11.010Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat
2021-08-12T06:56:11.010Z|00027|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple
2021-08-12T06:56:11.010Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6
2021-08-12T06:56:11.010Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions
2021-08-12T06:56:11.107Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534
2021-08-12T06:56:11.107Z|00031|bridge|INFO|bridge br-int: using datapath ID 00009a276b4396fe
2021-08-12T06:56:11.107Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2021-08-12T06:56:11.119Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1
2021-08-12T06:56:11.161Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2

Comment 21 Jianlin Shi 2021-08-12 08:32:42 UTC
also verified on also verified on ovn2.13-20.12.0-173.el7

Comment 22 Jianlin Shi 2021-09-02 09:06:32 UTC
the issue occurred again on ovn-2021-21.06.0-24:

[root@wsfd-advnetlab16 bz1939676]# rpm -qa | grep -E "openvswitch2.15|ovn-2021"
openvswitch2.15-2.15.0-35.el8fdp.x86_64
ovn-2021-central-21.06.0-24.el8fdp.x86_64
python3-openvswitch2.15-2.15.0-35.el8fdp.x86_64
ovn-2021-21.06.0-24.el8fdp.x86_64
ovn-2021-host-21.06.0-24.el8fdp.x86_64

+ ip netns exec vm2 nc 66.66.66.66 666 -p 5555
Ncat: Connection from 42.42.42.3.
Ncat: Connection from 42.42.42.3:5555.
h
+ ip netns exec vm2 nc 42.42.42.2 4242 -p 5555                                                        
Ncat: Connection timed out.
+ tail /var/log/openvswitch/ovs-vswitchd.log                                                          
2021-09-02T09:03:06.499Z|00030|bridge|INFO|bridge br-int: added interface br-int on port 65534        
2021-09-02T09:03:06.499Z|00031|bridge|INFO|bridge br-int: using datapath ID 0000f2f331e03706          
2021-09-02T09:03:06.499Z|00032|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2021-09-02T09:03:06.511Z|00033|bridge|INFO|bridge br-int: added interface vm1 on port 1               
2021-09-02T09:03:06.543Z|00034|bridge|INFO|bridge br-int: added interface vm2 on port 2               
2021-09-02T09:03:09.065Z|00001|dpif(handler2)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),ct(zone=2,nat),recirc(0xc) failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,nw_src=42.42.42.3,nw_dst=42.42.42.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5555,tp_dst=4242,tcp_flags=syn tcp_csum:68bd
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple4(src=42.42.42.3,dst=42.42.42.2,proto=6,tp_src=5555,tp_dst=4242),in_port(3) mtu 0
2021-09-02T09:03:15.897Z|00035|memory|INFO|159312 kB peak resident set size after 10.0 seconds        
2021-09-02T09:03:15.897Z|00036|memory|INFO|handlers:35 ofconns:2 ports:3 revalidators:13 rules:287 udpif keys:19
2021-09-02T09:03:17.393Z|00037|connmgr|INFO|br-int<->unix#0: 284 flow_mods in the 1 s starting 10 s ago (283 adds, 1 deletes)

it seems that https://bugzilla.redhat.com/show_bug.cgi?id=1992705 revert a patch related to this bug: 58683a42 (ovn-controller: Handle DNAT/no-NAT conntrack tuple collisions.)

Comment 23 Dumitru Ceara 2021-09-06 09:02:10 UTC
Fix got reverted from older releases but is available (and will stay) in ovn21.09-21.09.0-17.el8fdp.

Moving back to ON_QA and updating "Fixed in version" value.


Note You need to log in before you can comment on or make changes to this bug.