Bug 1836963

Summary: [OVN][DVR] Impossible to ping internet addresses from vm with FIP
Product: Red Hat OpenStack Reporter: Eran Kuris <ekuris>
Component: python-networking-ovnAssignee: Jakub Libosvar <jlibosva>
Status: CLOSED ERRATA QA Contact: Eran Kuris <ekuris>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.1 (Train)CC: agreentr, ahasson, amcleod, amuller, apevec, batkisso, bcafarel, ctrautma, dalvarez, dsevosty, fiezzi, itbrown, jamsmith, jishi, jlibosva, jpretori, lhh, lmartins, lorenzo.bianconi, majopela, njohnston, rheinzma, rsafrono, sclewis, scohen, stephenm, tbarron, yinxu
Target Milestone: gaKeywords: Regression, TestBlockerForLayeredProduct, Tracking, Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn2.13-2.13.0-37.el8fdp.x86_64 Doc Type: Known Issue
Doc Text:
Because of a core OVN bug, virtual machines with floating IP (FIP) addresses cannot route to other networks in an ML2/OVN deployment with distributed virtual routing (DVR) enabled. Core OVN sets a bad next hop when routing SNAT IPv4 traffic from a VM with a floating ip with DVR enabled. Instead of the gateway IP, OVN sets the destination IP. As a result, the router sends an ARP request for an unknown IP instead of routing the request to the gateway. + Workaround: Before you deploy a new overcloud with ML2/OVN, disable DVR by setting `NeutronEnableDVR: false` in an environment file. If you have ML2/OVN in an existing deployment, complete the following steps: + 1) Set enable_distributed_floating_ips to 'False' in the `neutron.conf` file: + (undercloud) [stack@undercloud-0 ~]$ ansible -i /usr/bin/tripleo-ansible-inventory -m shell -b -a "crudini --set /var/lib/config-data/puppet-generated/neutron/etc/neutron/plugins/ml2/ml2_conf.ini ovn enable_distributed_floating_ip False" Controller + 2) Restart neutron server containers: + (undercloud) [stack@undercloud-0 ~]$ ansible -i /usr/bin/tripleo-ansible-inventory -m shell -b -a "podman restart neutron_api" Controller + 3) Centralize all of the FIP traffic through gateway nodes. Run the following command on any overcloud node: + $ export NB=$(sudo ovs-vsctl get open . external_ids:ovn-remote | sed -e 's/\"//g' | sed -e 's/6642/6641/g') $ alias ovn-nbctl='sudo podman exec ovn_controller ovn-nbctl --db=$NB' $ for fip in $(ovn-nbctl --bare --columns _uuid find nat type=dnat_and_snat); do ovn-nbctl clear NAT $fip external_mac; done + When the fix is available in RHOSP 16.1.1, you can re-enable distributed FIP traffic: + 1) Set `enable_distributed_floating_ips` back to 'True' in the `neutron.conf` file: + (undercloud) [stack@undercloud-0 ~]$ ansible -i /usr/bin/tripleo-ansible-inventory -m shell -b -a "crudini --set /var/lib/config-data/puppet-generated/neutron/etc/neutron/plugins/ml2/ml2_conf.ini ovn enable_distributed_floating_ip True" Controller + 2) Restart neutron server containers: + (undercloud) [stack@undercloud-0 ~]$ ansible -i /usr/bin/tripleo-ansible-inventory -m shell -b -a "podman restart neutron_api" Controller + 3) Trigger the update in all of the FIPs. Run the following command on any overcloud node: + $ export NB=$(sudo ovs-vsctl get open . external_ids:ovn-remote | sed -e 's/\"//g' | sed -e 's/6642/6641/g') $ alias ovn-nbctl='sudo podman exec ovn_controller ovn-nbctl --db=$NB' $ for i in $(ovn-nbctl --bare --columns logical_port find nat type=dnat_and_snat); do ovn-nbctl set logical_switch_port $i up=false; done + [NOTE] Disabling DVR causes traffic to be centralized. All L3 traffic travels through the Controller/Networker nodes. This might affect scale, data plane performance, and throughput.
Story Points: ---
Clone Of: 1834433
: 1836998 1837558 (view as bug list) Environment:
Last Closed: 2020-07-29 07:52:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1836976    
Bug Blocks:    

Comment 1 Daniel Alvarez Sanchez 2020-05-18 14:55:24 UTC
*** Bug 1836964 has been marked as a duplicate of this bug. ***

Comment 6 Jakub Libosvar 2020-06-10 13:08:26 UTC
This is a tracker bug to make sure our images use the correct OVN version in openstack - setting ovn2.13-2.13.0-33.el7fdn to fixed in version although this bug is reported against python-networking-ovn

Comment 9 Allan Greentree 2020-06-22 19:11:31 UTC
Hello, Just added customer case where they are asking when this fix will be released for RHOSP 16.1 on RHEL 8.1 ?

Thanks, Allan Greentree
Senior Technical Support Engineer
Red Hat OpenStack

Comment 22 Eran Kuris 2020-07-19 07:41:45 UTC
Dont we need to move this bug to on_qa ?

Comment 25 Roman Safronov 2020-07-20 11:36:02 UTC
Tested on puddle RHOS-16.1-RHEL-8-20200714.n.0 with ovn2.13-2.13.0-37.el8fdp.x86_64.

It is possible to ping internet ip addresses from a vm with floating ip.

Comment 27 errata-xmlrpc 2020-07-29 07:52:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148