Bug 1647536 - [RFE] Optional NUMA affinity for SR-IOV devices
Summary: [RFE] Optional NUMA affinity for SR-IOV devices
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: All
OS: Linux
low
high
Target Milestone: Alpha
: ---
Assignee: smooney
QA Contact: James Parker
URL: https://blueprints.launchpad.net/nova...
Whiteboard:
Depends On: 1366208 1446311
Blocks: 1188000 1419231 1419948 1422243 1427361 1442136 1561961 1650606 1653846 1756916 1757886 1775575 1775576 1783354 1791991
TreeView+ depends on / blocked
 
Reported: 2018-11-07 17:22 UTC by Stephen Finucane
Modified: 2023-12-02 04:25 UTC (History)
36 users (show)

Fixed In Version: openstack-nova-21.1.0-0.20200425164546.347d656.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1446311
: 1757886 1775576 (view as bug list)
Environment:
Last Closed: 2022-10-20 10:20:06 UTC
Target Upstream Version: Ussuri
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 674072 0 'None' MERGED support pci numa affinity policies in flavor and image 2021-01-27 11:50:01 UTC
Red Hat Issue Tracker OSP-3140 0 None None None 2022-03-13 16:17:27 UTC
Red Hat Knowledge Base (Solution) 2533751 0 Learn more None Starting instances fail while sriov card is on a different numa node 2019-07-27 14:48:55 UTC
Red Hat Knowledge Base (Solution) 3721651 0 None None About NUMA locality with nova and SR-IOV in Red Hat OpenStack Platform 10 and 13 2019-07-27 14:46:45 UTC
Red Hat Knowledge Base (Solution) 4308231 0 None None None 2019-07-27 14:44:42 UTC

Comment 9 Stephen Finucane 2019-08-02 14:13:46 UTC
As noted in [1], this RFE is necessary to close a gap where it's possible to configure a NUMA affinity policy for PCI passthrough devices but not SR-IOV devices. To restate what's described there, NUMA policies are currently configured as part of the PCI alias configuration in 'nova.conf', and by requesting a PCI device using the given alias you also get the NUMA affinity policy associated with that alias. However, SR-IOV devices are not typically attached to an instance using PCI aliases but rather by configuring a neutron port and attaching that on instance boot. This means the PCI alias-based approach is of no use for SR-IOV devices.

There are two possible approaches we can pursue to resolve this. The first approach is to use flavor extra specs and image metadata to configure instance-wide PCI policies that would apply to all PCI devices attached to the instance including SR-IOV devices. This was the approach first proposed in the 'share-pci-between-numa-nodes' blueprint [2], before this was modified to use PCI aliases instead [3]. The other approach is to provide a new QoS policy in neutron that nova could consume. This was the approach that was discussed and essentially approved at the most recent Denver PTG. The flavor/image-based approach has the advantage of being much simpler to implement and mostly backportable, but it is very broad and prevents us from specifying NUMA affinity policies on a per port basis. The neutron QoS policy approach, by comparison, involves API and object changes in neutron, which make it more difficult to implement and prevent us from backporting it, but it does allow for very fine grained control over the affinity policy of each device.

We propose pursuing both approaches in succession. We will first pursue the flavor extra spec/image metadata-based approach for OSP 16, backporting this to OSP 13 once complete. In a later cycle, we will pursue the neutron QoS policy-based approach. This BZ is tracking the first approach.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1446311#c13
[2] https://review.opendev.org/#/c/361140/30/
[3] https://review.opendev.org/#/c/555000/3/

Comment 26 spower 2022-06-03 14:30:24 UTC
This RFE was not marked MVP for OSP 17.0 and so will be moved to OSP 17.1 for verification and docs. Contact rhos-trac if a tech preview is needed for OSP 17.0

Comment 32 Red Hat Bugzilla 2023-12-02 04:25:02 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.