Bug 1816616

Summary: [OVN] ARP requests not forwarded to chassis owning the logical port behind FIP in DVR scenarios.
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Dumitru Ceara <dceara>
Component: ovn2.13Assignee: Dumitru Ceara <dceara>
Status: CLOSED ERRATA QA Contact: ying xu <yinxu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: FDP 20.ACC: ctrautma, jishi, mmichels, ralongi, yinxu
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1816617 1816620 (view as bug list) Environment:
Last Closed: 2020-04-14 08:21:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1816617, 1816620    

Description Dumitru Ceara 2020-03-24 11:41:29 UTC
Description of problem:

In a distributed routing scenario, if a VM (VM1) is connected to the aggregation switch (public) and tries to connect to another VM (VM2) connected to a switch through a floating IP (dnat_and_snat with external_mac and logical_port set), if VM2 resides on a different chassis then ARP requests don't reach the chassis of VM2.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. On a two chassis (hv1 and hv2) physical topology configure the following logical topology:

ovn-nbctl ls-add sw-agg
ovn-nbctl lsp-add sw-agg sw-agg-ext \
    -- lsp-set-addresses sw-agg-ext 00:00:00:00:00:01

ovn-nbctl lsp-add sw-agg sw-rtr1                   \
    -- lsp-set-type sw-rtr1 router                 \
    -- lsp-set-addresses sw-rtr1 00:00:00:00:01:00 \
    -- lsp-set-options sw-rtr1 router-port=rtr1-sw

ovn-nbctl lsp-add sw-agg sw-agg-ln
ovn-nbctl lsp-set-addresses sw-agg-ln unknown
ovn-nbctl lsp-set-type sw-agg-ln localnet
ovn-nbctl lsp-set-options sw-agg-ln network_name=phys

ovn-nbctl lr-add rtr1
ovn-nbctl lrp-add rtr1 rtr1-sw 00:00:00:00:01:00 10.0.0.1/24 10::1/64

ovn-nbctl lrp-add rtr1 rtr1-sw1 00:00:01:00:00:00 20.0.0.1/24 20::1/64

ovn-nbctl lrp-set-gateway-chassis rtr1-sw hv1 20

ovn-nbctl lr-nat-add rtr1 dnat_and_snat 10.0.0.122 20.0.0.12 sw1-p2 00:00:00:02:00:00

2. Configure the underlying physical network on both hv1 and hv2:
ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys

3. Bind sw-agg-ext to an OVS port on hv1.

4. Bind sw1-p2 to an OVS port on hv2.

5. Send ARP request from sw-agg-ext for 10.0.0.122.

Actual results:
ARP requests don't reach hv2 and are not replied to.

Expected results:
ARP requests reach hv2 and get replied to. The neighbor entry is populated on sw-agg-ext.

Additional info:
Originally reported upstream: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-March/049856.html

Comment 5 ying xu 2020-03-30 04:05:20 UTC
reproduced on version:
# rpm -qa|grep ovn
ovn2.13-2.13.0-4.el8fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-basic-1.0-23.noarch
ovn2.13-central-2.13.0-4.el8fdp.x86_64
ovn2.13-host-2.13.0-4.el8fdp.x86_64

set the env as below:
topo:
s3-----------r1-------public------localnet
|                       |
hv0vm0                 hv1vm0

# ovn-nbctl show
switch 350adf54-a2e0-4b34-94f8-34e05c4c7aca (s3)
    port hv0_vm00_vnet1
        addresses: ["00:de:ad:00:00:01 172.16.103.11"]
    port s3_r1
        type: router
        addresses: ["00:de:ad:ff:01:03 172.16.103.1"]
        router-port: r1_s3
    port hv0_vm01_vnet1
        addresses: ["00:de:ad:00:01:01 172.16.103.12"]
switch 90dcc1b5-8b5d-4bcc-bf11-e4be61ced168 (public)
    port public_r1
        type: router
        router-port: r1_public
    port ln_p1
        type: localnet
        addresses: ["unknown"]
    port hv1_vm00_vnet1
        addresses: ["00:de:ad:01:00:01 172.16.102.11"]
router 826d7a1c-1268-4cd2-8772-a72c3b142336 (r1)
    port r1_public
        mac: "00:de:ad:ff:01:02"
        networks: ["172.16.102.1/24"]
        gateway chassis: [hv0]
    port r1_s3
        mac: "00:de:ad:ff:01:03"
        networks: ["172.16.103.1/24"]
    nat 27138ed3-6fe6-4828-8682-f16793c03034
        external ip: "172.16.102.201"
        logical ip: "172.16.103.11"
        type: "dnat_and_snat"
# ovs-vsctl show
54955998-12e7-4415-8fb0-69dc705bfa0f
    Bridge br-int
        fail_mode: secure
        Port "hv1_vm00_vnet1"
            Interface "hv1_vm00_vnet1"
        Port br-int
            Interface br-int
                type: internal
        Port "ovn-hv0-0"
            Interface "ovn-hv0-0"
                type: geneve
                options: {csum="true", key=flow, remote_ip="20.0.10.26"}
        Port "patch-br-int-to-ln_p1"
            Interface "patch-br-int-to-ln_p1"
                type: patch
                options: {peer="patch-ln_p1-to-br-int"}
    Bridge nat_test
        Port nat_test
            Interface nat_test
                type: internal
        Port "enp4s0d1"
            Interface "enp4s0d1"
        Port "patch-ln_p1-to-br-int"
            Interface "patch-ln_p1-to-br-int"
                type: patch
                options: {peer="patch-br-int-to-ln_p1"}
    ovs_version: "2.11.0"
after set the env,
ovn-nbctl lrp-set-gateway-chassis r1_public hv0 20

then, ping from hv1vm0 to hv0vm0;failed
# ip nei flush all;ping 172.16.102.201 -c10
PING 172.16.102.201 (172.16.102.201) 56(84) bytes of data.
From 172.16.102.11 icmp_seq=1 Destination Host Unreachable
From 172.16.102.11 icmp_seq=2 Destination Host Unreachable
From 172.16.102.11 icmp_seq=3 Destination Host Unreachable
From 172.16.102.11 icmp_seq=4 Destination Host Unreachable
From 172.16.102.11 icmp_seq=5 Destination Host Unreachable
From 172.16.102.11 icmp_seq=6 Destination Host Unreachable
From 172.16.102.11 icmp_seq=7 Destination Host Unreachable
From 172.16.102.11 icmp_seq=8 Destination Host Unreachable
From 172.16.102.11 icmp_seq=9 Destination Host Unreachable
From 172.16.102.11 icmp_seq=10 Destination Host Unreachable

--- 172.16.102.201 ping statistics ---
10 packets transmitted, 0 received, +10 errors, 100% packet loss, time 9001ms

verified on version:
# rpm -qa|grep ovn
ovn2.13-2.13.0-7.el8fdp.x86_64
ovn2.13-host-2.13.0-7.el8fdp.x86_64
ovn2.13-central-2.13.0-7.el8fdp.x86_64

# ip nei flush all;ping 172.16.102.201 -c10
PING 172.16.102.201 (172.16.102.201) 56(84) bytes of data.
64 bytes from 172.16.102.201: icmp_seq=1 ttl=64 time=2.17 ms
64 bytes from 172.16.102.201: icmp_seq=2 ttl=64 time=0.384 ms
64 bytes from 172.16.102.201: icmp_seq=3 ttl=64 time=1.31 ms
64 bytes from 172.16.102.201: icmp_seq=4 ttl=64 time=0.520 ms
64 bytes from 172.16.102.201: icmp_seq=5 ttl=64 time=0.453 ms
64 bytes from 172.16.102.201: icmp_seq=6 ttl=64 time=0.447 ms
64 bytes from 172.16.102.201: icmp_seq=7 ttl=64 time=0.405 ms
64 bytes from 172.16.102.201: icmp_seq=8 ttl=64 time=0.477 ms
64 bytes from 172.16.102.201: icmp_seq=9 ttl=64 time=0.398 ms
64 bytes from 172.16.102.201: icmp_seq=10 ttl=64 time=0.483 ms

--- 172.16.102.201 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9002ms
rtt min/avg/max/mdev = 0.384/0.705/2.176/0.555 ms

Comment 7 errata-xmlrpc 2020-04-14 08:21:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1434