Bug 1446311 - [RFE] Optional NUMA affinity for PCI devices
Summary: [RFE] Optional NUMA affinity for PCI devices
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: All
OS: Linux
high
high
Target Milestone: Upstream M2
: 13.0 (Queens)
Assignee: Stephen Finucane
QA Contact: awaugama
URL: https://blueprints.launchpad.net/nova...
Whiteboard: upstream_milestone_none upstream_defi...
: 1663653 (view as bug list)
Depends On: 1366208
Blocks: 1188000 1419948 1422243 1427361 1442136 1647536 1791991 1419231 1561961 1650606 1757886 1775575 1775576 1783354
TreeView+ depends on / blocked
 
Reported: 2017-04-27 16:15 UTC by Stephen Finucane
Modified: 2020-02-14 18:33 UTC (History)
33 users (show)

Fixed In Version: openstack-nova-17.0.1-0.20180302144923.9ace6ed.el7ost
Doc Type: Technology Preview
Doc Text:
This release adds support for PCI device NUMA affinity policies, which are configured as part of the “[pci]alias” configuration options. Three policies are supported: “required” (must have) “legacy” (default; must have, if available) “preferred” (nice to have) In all cases, strict NUMA affinity is provided, if possible. These policies allow you to configure how strict your NUMA affinity should be per PCI alias to maximize resource utilization. The key difference between the policies is how much NUMA affinity you're willing to forsake before failing to schedule. When the “preferred” policy is configured for a PCI device, nova uses CPUs on a different NUMA node from the NUMA node of the PCI device, if it is available. This results in increased resource utilization, but performance is reduced for these instances.
Clone Of: 1366208
: 1561961 1647536 (view as bug list)
Environment:
Last Closed: 2019-02-05 10:42:20 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Launchpad 1614882 None None None 2017-04-27 16:15:19 UTC
OpenStack gerrit 361140 'None' MERGED PCI NUMA Policies 2020-04-01 06:46:47 UTC
Red Hat Product Errata RHEA-2018:2086 None None None 2018-06-27 13:33:17 UTC

Comment 1 Stephen Finucane 2017-04-27 16:17:30 UTC
This RFE focuses on making NUMA affinity for SR-IOV/PCI devices optional. The spec missed the Pike deadline so this has been deferred to Queens.

Comment 2 Stephen Finucane 2017-08-30 16:02:26 UTC
As (hopefully) noted previously, this is being taken care of by a Mirantis guy. I plan to keep an eye on this this cycle and step in if necessary.

Comment 8 errata-xmlrpc 2018-06-27 13:31:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086

Comment 13 Stephen Finucane 2018-11-07 17:20:15 UTC
A mistake was made during implementation of this feature. While this RFE specifically called out support for optional NUMA affinity for SR-IOV devices, what was implemented upstream was support for optional NUMA affinity of standard PCI passthrough devices. These are handled differently. SR-IOV devices are created by neutron  and attached as network devices. For example:

  openstack port create ...
  openstack server create --nic port-id=$port_id ...

PCI passthrough devices, by comparison, are attached by specifying PCI aliases in the flavor and attached by nova at boot time:

  openstack flavor set m1.large --property "pci_passthrough:alias"="a1:2"
  openstack server create --flavor m1.large ...

The feature, as currently implemented, allows PCI policies to be defined in the alias configuration in 'nova.conf' and therefore only supports the latter type of attachment.

Clearly some additional work is required here, however, given that the feature as implemented has use (FPGAs jump to mind), we should build upon what's been done rather than replace it. As a result, I'm going to clone this BZ. The cloned BZ will focus on closing the SR-IOV gap, while this BZ will be renamed to handle the PCI passthrough case that has already been addressed.

Comment 14 Stephen Finucane 2019-01-11 11:50:00 UTC
*** Bug 1663653 has been marked as a duplicate of this bug. ***

Comment 15 Vinayak 2019-01-16 14:02:28 UTC
Can you please review https://access.redhat.com/support/cases/#/case/02255851? This issue was observed in RH OSP13. Has it been fixed now?

Comment 16 Stephen Finucane 2019-02-05 10:42:20 UTC
(In reply to Vinayak from comment #15)
> Can you please review
> https://access.redhat.com/support/cases/#/case/02255851? This issue was
> observed in RH OSP13. Has it been fixed now?

I fail to see what hugepage allocation issues have to do with this feature. Could you elaborate (via a new bug), please?


Note You need to log in before you can comment on or make changes to this bug.