Description of problem: The recent change to drop invalid packets in post SNAT [0] stage seems to uncover an issue with the SNAT stage when the packets are marked as invalid when commited to CT. Version-Release number of selected component (if applicable): main and 23.03 How reproducible: 100% Steps to Reproduce: 1. Run the "testing virtual port with floating IP" Additional info: ICMP traffic flow dump: recirc_id(0),in_port(4),ct_state(-inv-trk),ct_mark(0/0x2),eth(src=00:00:00:00:01:11,dst=00:00:00:0f:01:01),eth_type(0x0800),ipv4(src=10.0.0.88,dst=172.0.0.99,proto=1,ttl=64,frag=no),icmp(type=0/0xfe), packets:6, bytes:588, used:0.207s, actions:set(eth(src=10:54:00:00:00:88,dst=0a:0a:b6:fc:03:01)),set(ipv4(ttl=63)),ct(zone=3,nat),recirc(0x7) recirc_id(0x7),in_port(4),eth(src=10:54:00:00:00:88),eth_type(0x0800),ipv4(src=10.0.0.88,frag=no), packets:5, bytes:490, used:0.207s, actions:ct(commit,zone=3,nat(src=172.0.0.88)),recirc(0x8) recirc_id(0x8),in_port(4),ct_state(+inv+trk),eth(),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=0/0xfe), packets:5, bytes:490, used:0.207s, actions:drop TCP traffic flow dump: recirc_id(0),in_port(4),ct_state(-inv-trk),ct_mark(0/0x2),eth(src=00:00:00:00:01:11,dst=00:00:00:0f:01:01),eth_type(0x0800),ipv4(src=10.0.0.88,dst=172.0.0.99,proto=6,ttl=64,frag=no), packets:3, bytes:162, used:3.913s, flags:R., actions:set(eth(src=10:54:00:00:00:88,dst=0a:0a:b6:fc:03:01)),set(ipv4(ttl=63)),ct(zone=1,nat),recirc(0x7) recirc_id(0x7),in_port(4),eth(src=10:54:00:00:00:88),eth_type(0x0800),ipv4(src=10.0.0.88,frag=no), packets:2, bytes:108, used:3.913s, flags:R., actions:ct(commit,zone=1,nat(src=172.0.0.88)),recirc(0x8) recirc_id(0x8),in_port(4),ct_state(+inv+trk),eth(),eth_type(0x0800),ipv4(proto=6,frag=no), packets:2, bytes:108, used:3.913s, flags:R., actions:drop [0] https://github.com/ovn-org/ovn/commit/e3bc68c3be6967916674119b14fe2bef081ac6ad
Bumping severity to "high" as this is a regression in 23.03 and upstream main branch.
I tried with following script: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=127.0.0.1 systemctl restart ovn-controller ovs-vsctl add-br br-ex ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=provider:br-ex ovn-nbctl lr-add lr1 ovn-nbctl ls-add public1 ovn-nbctl ls-add ls1 ovn-nbctl lsp-add ls1 ls1p1 \ -- lsp-set-addresses ls1p1 "00:00:00:00:01:11 10.0.0.11" ovn-nbctl lsp-add ls1 ls1p2 \ -- lsp-set-addresses ls1p2 "00:00:00:00:01:12 10.0.0.12" ovn-nbctl lsp-add ls1 ls1-to-lr1 \ -- lsp-set-type ls1-to-lr1 router \ -- lsp-set-options ls1-to-lr1 router-port=lr1-to-ls1 \ -- lsp-set-addresses ls1-to-lr1 router ovn-nbctl lsp-add ls1 vip \ -- lsp-set-addresses vip "00:00:00:00:01:88 10.0.0.88" \ -- lsp-set-type vip virtual \ -- set logical_switch_port vip options:virtual-ip=10.0.0.88 \ -- set logical_switch_port vip options:virtual-parents=ls1p1,ls1p2 ovn-nbctl lrp-add lr1 lr1-to-ls1 "00:00:00:0f:01:01" 10.0.0.1/24 \ -- lrp-add lr1 lr1-to-public1 "00:00:00:0f:02:01" 172.0.0.1/24 \ -- lrp-set-gateway-chassis lr1-to-public1 hv1 10 ovn-nbctl lr-nat-add lr1 dnat_and_snat 172.0.0.88 10.0.0.88 vip 10:54:00:00:00:88 ovn-nbctl lsp-add public1 public1-to-lr1 \ -- lsp-set-type public1-to-lr1 router \ -- lsp-set-options public1-to-lr1 router-port=lr1-to-public1 \ -- lsp-set-addresses public1-to-lr1 router \ -- lsp-add public1 ln1 \ -- lsp-set-type ln1 localnet \ -- lsp-set-options ln1 network_name=provider \ -- lsp-set-addresses ln1 unknown ovn-nbctl --wait=hv sync ovs-vsctl add-port br-int ls1p1 -- set interface ls1p1 type=internal external_ids:iface-id=ls1p1 ip netns add ls1p1 ip link set ls1p1 netns ls1p1 ip netns exec ls1p1 ip link set ls1p1 address 00:00:00:00:01:11 ip netns exec ls1p1 ip link set ls1p1 up ip netns exec ls1p1 ip addr add 10.0.0.11/24 dev ls1p1 ip netns exec ls1p1 ip route add default via 10.0.0.1 ovs-vsctl add-port br-int ls1p2 -- set interface ls1p2 type=internal external_ids:iface-id=ls1p2 ip netns add ls1p2 ip link set ls1p2 netns ls1p2 ip netns exec ls1p2 ip link set ls1p2 address 00:00:00:00:01:12 ip netns exec ls1p2 ip link set ls1p2 up ip netns exec ls1p2 ip addr add 10.0.0.12/24 dev ls1p2 ip netns exec ls1p2 ip route add default via 10.0.0.1 ovs-vsctl add-port br-ex ns_ext1 -- set interface ns_ext1 type=internal ip netns add ns_ext1 ip link set ns_ext1 netns ns_ext1 ip netns exec ns_ext1 ip link set ns_ext1 up ip netns exec ns_ext1 ip addr add 172.0.0.99/24 dev ns_ext1 ip netns exec ns_ext1 ip route add default via 172.0.0.1 ip netns exec ls1p1 ip addr del 10.0.0.11/24 dev ls1p1 ip netns exec ls1p1 ip addr add 10.0.0.88/24 dev ls1p1 ip netns exec ls1p1 ip route add default via 10.0.0.1 ip netns exec ls1p1 arping -U -c 1 -w 2 -I ls1p1 10.0.0.88 sleep 5 ovn-sbctl find port_binding logical_port=vip ip netns exec ns_ext1 ping -c 3 -i 0.3 -w 2 10.0.0.88 ip netns exec ns_ext1 ping -c 3 -i 0.3 -w 2 172.0.0.88 ip netns exec ls1p1 ip addr del 10.0.0.88/24 dev ls1p1 ip netns exec ls1p2 ip addr del 10.0.0.12/24 dev ls1p2 ip netns exec ls1p2 ip addr add 10.0.0.88/24 dev ls1p2 ip netns exec ls1p2 ip route add default via 10.0.0.1 ip netns exec ls1p2 arping -U -c 1 -w 2 -I ls1p2 10.0.0.88 sleep 5 ovn-sbctl find port_binding logical_port=vip ip netns exec ns_ext1 ping -c 3 -i 0.3 -w 2 10.0.0.88 ip netns exec ns_ext1 ping -c 3 -i 0.3 -w 2 172.0.0.88 and the test passed on ovn23.03-23.03.0-4.el9: [root@ibm-x3650m4-01-vm-10 bz2183529]# rpm -qa | grep -E "openvswitch3.1|ovn23.03" openvswitch3.1-3.1.0-46.el9fdp.x86_64 ovn23.03-23.03.0-4.el9fdp.x86_64 ovn23.03-central-23.03.0-4.el9fdp.x86_64 ovn23.03-host-23.03.0-4.el9fdp.x86_64 + ip netns exec ls1p1 arping -U -c 1 -w 2 -I ls1p1 10.0.0.88 ARPING 10.0.0.88 from 10.0.0.88 ls1p1 Sent 1 probes (1 broadcast(s)) Received 0 response(s) + sleep 5 + ovn-sbctl find port_binding logical_port=vip _uuid : ed02c1b0-14db-46cb-bc50-f40fe94b5dee additional_chassis : [] additional_encap : [] chassis : 59360339-b754-450f-8ada-60663bc51c54 datapath : d907be97-e7f1-48d3-96cf-4109fb542647 encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : vip mac : ["00:00:00:00:01:88 10.0.0.88"] mirror_rules : [] nat_addresses : [] options : {virtual-ip="10.0.0.88", virtual-parents="ls1p1,ls1p2"} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 4 type : virtual up : true virtual_parent : ls1p1 + ip netns exec ns_ext1 ping -c 3 -i 0.3 -w 2 10.0.0.88 PING 10.0.0.88 (10.0.0.88) 56(84) bytes of data. 64 bytes from 10.0.0.88: icmp_seq=1 ttl=63 time=53.5 ms 64 bytes from 10.0.0.88: icmp_seq=2 ttl=63 time=0.619 ms 64 bytes from 10.0.0.88: icmp_seq=3 ttl=63 time=0.152 ms --- 10.0.0.88 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 603ms rtt min/avg/max/mdev = 0.152/18.099/53.526/25.051 ms + ip netns exec ns_ext1 ping -c 3 -i 0.3 -w 2 172.0.0.88 PING 172.0.0.88 (172.0.0.88) 56(84) bytes of data. 64 bytes from 172.0.0.88: icmp_seq=1 ttl=63 time=0.779 ms 64 bytes from 172.0.0.88: icmp_seq=2 ttl=63 time=0.650 ms 64 bytes from 172.0.0.88: icmp_seq=3 ttl=63 time=0.142 ms --- 172.0.0.88 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 611ms rtt min/avg/max/mdev = 0.142/0.523/0.779/0.274 ms + ip netns exec ls1p1 ip addr del 10.0.0.88/24 dev ls1p1 + ip netns exec ls1p2 ip addr del 10.0.0.12/24 dev ls1p2 + ip netns exec ls1p2 ip addr add 10.0.0.88/24 dev ls1p2 + ip netns exec ls1p2 ip route add default via 10.0.0.1 + ip netns exec ls1p2 arping -U -c 1 -w 2 -I ls1p2 10.0.0.88 ARPING 10.0.0.88 from 10.0.0.88 ls1p2 Sent 1 probes (1 broadcast(s)) Received 0 response(s) + sleep 5 + ovn-sbctl find port_binding logical_port=vip _uuid : ed02c1b0-14db-46cb-bc50-f40fe94b5dee additional_chassis : [] additional_encap : [] chassis : 59360339-b754-450f-8ada-60663bc51c54 datapath : d907be97-e7f1-48d3-96cf-4109fb542647 encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : vip mac : ["00:00:00:00:01:88 10.0.0.88"] mirror_rules : [] nat_addresses : [] options : {virtual-ip="10.0.0.88", virtual-parents="ls1p1,ls1p2"} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 4 type : virtual up : true virtual_parent : ls1p2 + ip netns exec ns_ext1 ping -c 3 -i 0.3 -w 2 10.0.0.88 PING 10.0.0.88 (10.0.0.88) 56(84) bytes of data. 64 bytes from 10.0.0.88: icmp_seq=1 ttl=63 time=0.784 ms 64 bytes from 10.0.0.88: icmp_seq=2 ttl=63 time=0.130 ms 64 bytes from 10.0.0.88: icmp_seq=3 ttl=63 time=0.127 ms --- 10.0.0.88 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 605ms rtt min/avg/max/mdev = 0.127/0.347/0.784/0.309 ms + ip netns exec ns_ext1 ping -c 3 -i 0.3 -w 2 172.0.0.88 PING 172.0.0.88 (172.0.0.88) 56(84) bytes of data. 64 bytes from 172.0.0.88: icmp_seq=1 ttl=63 time=0.454 ms 64 bytes from 172.0.0.88: icmp_seq=2 ttl=63 time=0.143 ms 64 bytes from 172.0.0.88: icmp_seq=3 ttl=63 time=0.155 ms --- 172.0.0.88 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 610ms rtt min/avg/max/mdev = 0.143/0.250/0.454/0.143 ms Ales, I failed to reproduce the issue on ovn23.03-23.03.0-4.el9, could you help on this? thx so much
Hi Jianlin, Lorenzo confirmed my suspicion and this whole BZ should be covered by the revert [0]. So there isn't anything for you to do as it didn't hit any rpm version AFAIK. Thanks, Ales [0] https://github.com/ovn-org/ovn/commit/0c71712b
the issue doesn't exist on ovn23.03-23.03.0-106.el9 as well: [root@ibm-x3650m4-01-vm-10 bz2183529]# rpm -qa | grep -E "openvswitch3.1|ovn23.03" openvswitch3.1-3.1.0-46.el9fdp.x86_64 ovn23.03-23.03.0-106.el9fdp.x86_64 ovn23.03-central-23.03.0-106.el9fdp.x86_64 ovn23.03-host-23.03.0-106.el9fdp.x86_64 + ip netns exec ls1p2 arping -U -c 1 -w 2 -I ls1p2 10.0.0.88 ARPING 10.0.0.88 from 10.0.0.88 ls1p2 Sent 1 probes (1 broadcast(s)) Received 0 response(s) + sleep 5 + ovn-sbctl find port_binding logical_port=vip _uuid : 6fe773a1-a182-4f47-8065-f7603f8da6b4 additional_chassis : [] additional_encap : [] chassis : 5bdc68a9-ebba-41f6-a8be-245c5f8c45c1 datapath : 721989c7-8a58-4680-8435-18c484f039b0 encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : vip mac : ["00:00:00:00:01:88 10.0.0.88"] mirror_rules : [] nat_addresses : [] options : {virtual-ip="10.0.0.88", virtual-parents="ls1p1,ls1p2"} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 4 type : virtual up : true virtual_parent : ls1p2 + ip netns exec ns_ext1 ping -c 3 -i 0.3 -w 2 10.0.0.88 PING 10.0.0.88 (10.0.0.88) 56(84) bytes of data. 64 bytes from 10.0.0.88: icmp_seq=1 ttl=63 time=0.937 ms 64 bytes from 10.0.0.88: icmp_seq=2 ttl=63 time=0.109 ms 64 bytes from 10.0.0.88: icmp_seq=3 ttl=63 time=0.110 ms --- 10.0.0.88 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 601ms rtt min/avg/max/mdev = 0.109/0.385/0.937/0.390 ms + ip netns exec ns_ext1 ping -c 3 -i 0.3 -w 2 172.0.0.88 PING 172.0.0.88 (172.0.0.88) 56(84) bytes of data. 64 bytes from 172.0.0.88: icmp_seq=1 ttl=63 time=0.360 ms 64 bytes from 172.0.0.88: icmp_seq=2 ttl=63 time=0.124 ms 64 bytes from 172.0.0.88: icmp_seq=3 ttl=63 time=0.111 ms --- 172.0.0.88 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 609ms rtt min/avg/max/mdev = 0.111/0.198/0.360/0.114 ms
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn23.03 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:5305