Bug 1666684

Summary: [RFE] OVN - SRIOV support without Neutron DHCP Agent
Product: Red Hat OpenStack Reporter: Daniel Alvarez Sanchez <dalvarez>
Component: python-networking-ovnAssignee: Lucas Alvares Gomes <lmartins>
Status: CLOSED ERRATA QA Contact: Eduardo Olivares <eolivare>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: amcleod, apevec, atragler, dalvarez, dcadzow, ekuris, fbaudin, fiezzi, gregraka, haili, jamsmith, jlibosva, jschluet, lhh, lmartins, majopela, mburns, nusiddiq, nwolf, ovs-qe, ovs-team, qding, spower, sputhenp, tfreger
Target Milestone: z1Keywords: FutureFeature, Reopened, ZStream
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: docs-accepted
Fixed In Version: python-networking-ovn-7.1.1-0.20200417072234.659ddf5.el8ost Doc Type: Enhancement
Doc Text:
In this release, you can use SR-IOV in an ML2/OVN deployment with native OVN DHCP. SR-IOV in an ML2/OVN deployment no longer requires the Networking service (neutron) DHCP agent. + When virtual machines boot on hypervisors that support SR-IOV NICs, the OVN controllers on the controller or network nodes can reply to the DHCP, internal DNS, and IPv6 router solicitation requests from the virtual machine. + This feature was available as a technology preview in RHOSP 16.1.0. Now it is a supported feature. + The following limitations apply to the feature in this release: *All external ports are scheduled on a single gateway node because there is only one HA Chassis Group for all of the ports. *North/south routing on VLAN tenant networks does not work with SR-IOV because the external ports are not co-located with the logical router’s gateway ports. See https://bugs.launchpad.net/neutron/+bug/1875852.
Story Points: ---
Clone Of: 1666673 Environment:
Last Closed: 2020-08-27 15:20:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1828191, 1828834, 1828889, 1828941, 1829293, 1844487, 1847924    
Bug Blocks: 1700220, 1700229, 1773514, 1832970    

Comment 1 Daniel Alvarez Sanchez 2019-01-16 11:15:59 UTC
This feature [0] being implemented in core OVN needs its counterpart in networking-ovn.
Given how that patch is designed, I'm summarizing what needs to be done in networking-ovn:

[0] https://patchwork.ozlabs.org/patch/1025421/

Comment 2 Daniel Alvarez Sanchez 2019-01-16 11:24:51 UTC
This feature [0] being implemented in core OVN needs its counterpart in networking-ovn.
Given how that patch is designed, I'm summarizing what needs to be done in networking-ovn:

- When a SRIOV port is created in Neutron, networking-ovn has to create an OVN 'external' port.
- At this point, networking-ovn has to figure out if the port belongs to a subnet which is connected to a router with a gw port:
  - If so, set the requested-chassis option to the chassis where the gw port is scheduled with highest priority.
  - Else, schedule it to any network/controller node (would be better if we could pin a subnet to a certain node).
- If the chassis hosting a gw port as mater goes down, core OVN will automatically failover the port (BFD monitoring) to the next highest prio chassis available. However, the external port is not moved so we need to monitor the event and move all external ports scheduled there (same when the chassis comes back).
- If a subnet which was not connected to a gw router gets connected to one, move all the 'external' ports to the chassis where the gw router is highest prio.

The reason we want the external port to live in the same chassis as the associated gw port is that otherwise, the MAC address of the router port will flap in the ToR as it'll be advertised by both the chassis hosting the gw port and the chassis hosting the external port for the SRIOV instance.

It's not a trivial change and it would be best if the HA/scheduling could happen at core OVN level.

[0] https://patchwork.ozlabs.org/patch/1025421/

Comment 3 Miguel Angel Ajo 2019-01-16 13:46:45 UTC
Is there anything we could do in core OVN itself to make sure the requested chassis goes along with the router master instead of manually tracking it from networking-ovn?

This is equivalent to the old neutron dhcp-agent failover, but also worse, because if the controller fails to move the requested chassis along it could also flap the router mac.

ovn-controller is able to determine itself when it's the master (or the slave for an specific router/router-port).

Comment 22 spower 2020-07-14 18:55:12 UTC
This issue has conditional approval for 16.1 Z1 release, it must be in the first compose and tested before release of 16.1.1. If not, we will move to TM=Z2.

Comment 23 spower 2020-07-17 13:34:38 UTC
Bot should be giving this the rhos-16.1 flag as it has devel/QE/PM acks?

Comment 26 spower 2020-07-20 15:39:01 UTC
Eran, 16.1 GA is closed to Blockers Only now with strict criteria. To have something included it has to go through the blockers Process. 16.1.1 is targeted for mid/late Aug.

Comment 29 Lon Hohberger 2020-07-29 10:48:30 UTC
According to our records, this should be resolved by python-networking-ovn-7.2.1-0.20200611111150.18fabca.el8ost.  This build is available now.

Comment 40 errata-xmlrpc 2020-08-27 15:20:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (openstack-neutron bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3568