Bug 1647536 - [RFE] Optional NUMA affinity for SR-IOV devices [NEEDINFO]
Summary: [RFE] Optional NUMA affinity for SR-IOV devices
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: All
OS: Linux
Target Milestone: beta
: ---
Assignee: smooney
QA Contact: nova-maint
URL: https://blueprints.launchpad.net/nova...
Depends On: 1366208 1446311
Blocks: 1188000 1419948 1422243 1427361 1442136 1653846 1756916 1791991 1419231 1561961 1650606 1757886 1775575 1775576 1783354
TreeView+ depends on / blocked
Reported: 2018-11-07 17:22 UTC by Stephen Finucane
Modified: 2020-02-14 18:36 UTC (History)
39 users (show)

Fixed In Version: openstack-nova-17.0.13-2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1446311
: 1757886 1775576 (view as bug list)
Last Closed:
Target Upstream Version:
lyarwood: needinfo? (smooney)

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
OpenStack gerrit 674072 'None' MERGED support pci numa affinity policies in flavor and image 2020-04-01 22:21:23 UTC
Red Hat Knowledge Base (Solution) 2533751 Learn more None Starting instances fail while sriov card is on a different numa node 2019-07-27 14:48:55 UTC
Red Hat Knowledge Base (Solution) 3721651 None None About NUMA locality with nova and SR-IOV in Red Hat OpenStack Platform 10 and 13 2019-07-27 14:46:45 UTC
Red Hat Knowledge Base (Solution) 4308231 None None None 2019-07-27 14:44:42 UTC

Comment 9 Stephen Finucane 2019-08-02 14:13:46 UTC
As noted in [1], this RFE is necessary to close a gap where it's possible to configure a NUMA affinity policy for PCI passthrough devices but not SR-IOV devices. To restate what's described there, NUMA policies are currently configured as part of the PCI alias configuration in 'nova.conf', and by requesting a PCI device using the given alias you also get the NUMA affinity policy associated with that alias. However, SR-IOV devices are not typically attached to an instance using PCI aliases but rather by configuring a neutron port and attaching that on instance boot. This means the PCI alias-based approach is of no use for SR-IOV devices.

There are two possible approaches we can pursue to resolve this. The first approach is to use flavor extra specs and image metadata to configure instance-wide PCI policies that would apply to all PCI devices attached to the instance including SR-IOV devices. This was the approach first proposed in the 'share-pci-between-numa-nodes' blueprint [2], before this was modified to use PCI aliases instead [3]. The other approach is to provide a new QoS policy in neutron that nova could consume. This was the approach that was discussed and essentially approved at the most recent Denver PTG. The flavor/image-based approach has the advantage of being much simpler to implement and mostly backportable, but it is very broad and prevents us from specifying NUMA affinity policies on a per port basis. The neutron QoS policy approach, by comparison, involves API and object changes in neutron, which make it more difficult to implement and prevent us from backporting it, but it does allow for very fine grained control over the affinity policy of each device.

We propose pursuing both approaches in succession. We will first pursue the flavor extra spec/image metadata-based approach for OSP 16, backporting this to OSP 13 once complete. In a later cycle, we will pursue the neutron QoS policy-based approach. This BZ is tracking the first approach.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1446311#c13
[2] https://review.opendev.org/#/c/361140/30/
[3] https://review.opendev.org/#/c/555000/3/

Note You need to log in before you can comment on or make changes to this bug.