Description of problem: In a distributed routing scenario, if a VM (VM1) is connected to the aggregation switch (public) and tries to connect to another VM (VM2) connected to a switch through a floating IP (dnat_and_snat with external_mac and logical_port set), if VM2 resides on a different chassis then ARP requests don't reach the chassis of VM2. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. On a two chassis (hv1 and hv2) physical topology configure the following logical topology: ovn-nbctl ls-add sw-agg ovn-nbctl lsp-add sw-agg sw-agg-ext \ -- lsp-set-addresses sw-agg-ext 00:00:00:00:00:01 ovn-nbctl lsp-add sw-agg sw-rtr1 \ -- lsp-set-type sw-rtr1 router \ -- lsp-set-addresses sw-rtr1 00:00:00:00:01:00 \ -- lsp-set-options sw-rtr1 router-port=rtr1-sw ovn-nbctl lsp-add sw-agg sw-agg-ln ovn-nbctl lsp-set-addresses sw-agg-ln unknown ovn-nbctl lsp-set-type sw-agg-ln localnet ovn-nbctl lsp-set-options sw-agg-ln network_name=phys ovn-nbctl lr-add rtr1 ovn-nbctl lrp-add rtr1 rtr1-sw 00:00:00:00:01:00 10.0.0.1/24 10::1/64 ovn-nbctl lrp-add rtr1 rtr1-sw1 00:00:01:00:00:00 20.0.0.1/24 20::1/64 ovn-nbctl lrp-set-gateway-chassis rtr1-sw hv1 20 ovn-nbctl lr-nat-add rtr1 dnat_and_snat 10.0.0.122 20.0.0.12 sw1-p2 00:00:00:02:00:00 2. Configure the underlying physical network on both hv1 and hv2: ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys 3. Bind sw-agg-ext to an OVS port on hv1. 4. Bind sw1-p2 to an OVS port on hv2. 5. Send ARP request from sw-agg-ext for 10.0.0.122. Actual results: ARP requests don't reach hv2 and are not replied to. Expected results: ARP requests reach hv2 and get replied to. The neighbor entry is populated on sw-agg-ext. Additional info: Originally reported upstream: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-March/049856.html
reproduced on version: # rpm -qa|grep ovn ovn2.13-2.13.0-4.el8fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-basic-1.0-23.noarch ovn2.13-central-2.13.0-4.el8fdp.x86_64 ovn2.13-host-2.13.0-4.el8fdp.x86_64 set the env as below: topo: s3-----------r1-------public------localnet | | hv0vm0 hv1vm0 # ovn-nbctl show switch 350adf54-a2e0-4b34-94f8-34e05c4c7aca (s3) port hv0_vm00_vnet1 addresses: ["00:de:ad:00:00:01 172.16.103.11"] port s3_r1 type: router addresses: ["00:de:ad:ff:01:03 172.16.103.1"] router-port: r1_s3 port hv0_vm01_vnet1 addresses: ["00:de:ad:00:01:01 172.16.103.12"] switch 90dcc1b5-8b5d-4bcc-bf11-e4be61ced168 (public) port public_r1 type: router router-port: r1_public port ln_p1 type: localnet addresses: ["unknown"] port hv1_vm00_vnet1 addresses: ["00:de:ad:01:00:01 172.16.102.11"] router 826d7a1c-1268-4cd2-8772-a72c3b142336 (r1) port r1_public mac: "00:de:ad:ff:01:02" networks: ["172.16.102.1/24"] gateway chassis: [hv0] port r1_s3 mac: "00:de:ad:ff:01:03" networks: ["172.16.103.1/24"] nat 27138ed3-6fe6-4828-8682-f16793c03034 external ip: "172.16.102.201" logical ip: "172.16.103.11" type: "dnat_and_snat" # ovs-vsctl show 54955998-12e7-4415-8fb0-69dc705bfa0f Bridge br-int fail_mode: secure Port "hv1_vm00_vnet1" Interface "hv1_vm00_vnet1" Port br-int Interface br-int type: internal Port "ovn-hv0-0" Interface "ovn-hv0-0" type: geneve options: {csum="true", key=flow, remote_ip="20.0.10.26"} Port "patch-br-int-to-ln_p1" Interface "patch-br-int-to-ln_p1" type: patch options: {peer="patch-ln_p1-to-br-int"} Bridge nat_test Port nat_test Interface nat_test type: internal Port "enp4s0d1" Interface "enp4s0d1" Port "patch-ln_p1-to-br-int" Interface "patch-ln_p1-to-br-int" type: patch options: {peer="patch-br-int-to-ln_p1"} ovs_version: "2.11.0" after set the env, ovn-nbctl lrp-set-gateway-chassis r1_public hv0 20 then, ping from hv1vm0 to hv0vm0;failed # ip nei flush all;ping 172.16.102.201 -c10 PING 172.16.102.201 (172.16.102.201) 56(84) bytes of data. From 172.16.102.11 icmp_seq=1 Destination Host Unreachable From 172.16.102.11 icmp_seq=2 Destination Host Unreachable From 172.16.102.11 icmp_seq=3 Destination Host Unreachable From 172.16.102.11 icmp_seq=4 Destination Host Unreachable From 172.16.102.11 icmp_seq=5 Destination Host Unreachable From 172.16.102.11 icmp_seq=6 Destination Host Unreachable From 172.16.102.11 icmp_seq=7 Destination Host Unreachable From 172.16.102.11 icmp_seq=8 Destination Host Unreachable From 172.16.102.11 icmp_seq=9 Destination Host Unreachable From 172.16.102.11 icmp_seq=10 Destination Host Unreachable --- 172.16.102.201 ping statistics --- 10 packets transmitted, 0 received, +10 errors, 100% packet loss, time 9001ms verified on version: # rpm -qa|grep ovn ovn2.13-2.13.0-7.el8fdp.x86_64 ovn2.13-host-2.13.0-7.el8fdp.x86_64 ovn2.13-central-2.13.0-7.el8fdp.x86_64 # ip nei flush all;ping 172.16.102.201 -c10 PING 172.16.102.201 (172.16.102.201) 56(84) bytes of data. 64 bytes from 172.16.102.201: icmp_seq=1 ttl=64 time=2.17 ms 64 bytes from 172.16.102.201: icmp_seq=2 ttl=64 time=0.384 ms 64 bytes from 172.16.102.201: icmp_seq=3 ttl=64 time=1.31 ms 64 bytes from 172.16.102.201: icmp_seq=4 ttl=64 time=0.520 ms 64 bytes from 172.16.102.201: icmp_seq=5 ttl=64 time=0.453 ms 64 bytes from 172.16.102.201: icmp_seq=6 ttl=64 time=0.447 ms 64 bytes from 172.16.102.201: icmp_seq=7 ttl=64 time=0.405 ms 64 bytes from 172.16.102.201: icmp_seq=8 ttl=64 time=0.477 ms 64 bytes from 172.16.102.201: icmp_seq=9 ttl=64 time=0.398 ms 64 bytes from 172.16.102.201: icmp_seq=10 ttl=64 time=0.483 ms --- 172.16.102.201 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9002ms rtt min/avg/max/mdev = 0.384/0.705/2.176/0.555 ms
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1434