Bug 1913513 - [OSP17] NUMA instance spawn fails on get_best_cpu_topology when there is no 'threads' preference
Summary: [OSP17] NUMA instance spawn fails on get_best_cpu_topology when there is no '...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: beta
: 17.0
Assignee: Stephen Finucane
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks: 1915096
TreeView+ depends on / blocked
 
Reported: 2021-01-07 00:14 UTC by melanie witt
Modified: 2024-10-01 17:16 UTC (History)
13 users (show)

Fixed In Version: openstack-nova-23.2.1-0.20220606130355.68cad8f.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1915096 (view as bug list)
Environment:
Last Closed: 2022-09-21 12:13:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1910466 0 None None None 2021-01-07 00:14:44 UTC
OpenStack gerrit 769601 0 None master: MERGED nova: Test numa and vcpu topologies bug: #1910466 (I333b3d85deed971678141307dd06545e308cf989) 2022-06-13 18:53:30 UTC
OpenStack gerrit 769614 0 None master: MERGED nova: Fix max cpu topologies with numa affinity (Ia81a0fdbd950b51dbcc70c65ba492549a224ce2b) 2022-06-13 18:53:36 UTC
Red Hat Issue Tracker OSP-5438 0 None None None 2022-06-13 19:35:22 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:14:33 UTC

Description melanie witt 2021-01-07 00:14:44 UTC
Copied from the upstream bug I opened:

Seen downstream in a customer environment where a NUMA instance fails driver.spawn during get_best_cpu_topology when (1) there was no preference for cpu threads in the flavor and (2) the only possible topologies have > 1 cpu threads:

2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] Instance failed to spawn: IndexError: list index out of range
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] Traceback (most recent call last):
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2273, in _build_resources
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] yield resources
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2053, in _build_and_run_instance
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] block_device_info=block_device_info)
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3133, in spawn
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] mdevs=mdevs)
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5447, in _get_guest_xml
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] context, mdevs)
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5202, in _get_guest_config
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] instance.numa_topology)
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3923, in _get_guest_cpu_config
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] flavor, image_meta, numa_topology=instance_numa_topology)
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 625, in get_best_cpu_topology
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] allow_threads, numa_topology)[0]
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] IndexError: list index out of range

Note that the IndexError ^ is a separate unrelated bug that happens because an empty list [] is being returned after filtering for NUMA threads. This bug is about the empty list after NUMA threads filtering.

In this example failure we have a request for vcpus=8 with limits cores=2, sockets=2, and threads=8:

2020-12-27 23:45:12.322 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Getting desirable topologi
es for flavor Flavor(created_at=2020-11-23T08:41:17Z,deleted=False,deleted_at=None,description=None,disabled=False,ephemeral_gb=0,extra_specs={hw:cpu_max_cores='2',hw:cpu_max_sockets='2',hw:cpu_max_thread
s='8',hw:cpu_policy='dedicated',hw:mem_page_size='large',hw:numa_cpus.0='0,1,2,3',hw:numa_cpus.1='4,5,6,7',hw:numa_mem.0='16384',hw:numa_mem.1='16384',hw:numa_mempolicy='strict',hw:numa_nodes='2'},flavori
d='2060ed99-654c-4309-88b8-9bceeb794ba3',id=176,is_public=True,memory_mb=32768,name='test',projects=<?>,root_gb=40,rxtx_factor=1.0,swap=0,updated_at=None,vcpu_weight=0,vcpus=8) and image_meta ImageMeta(ch
ecksum='157e26aac48c1a02c08d07f4f7a6d1b6',container_format='bare',created_at=2020-03-08T16:12:44Z,direct_url=<?>,disk_format='qcow2',id=7812d228-07c8-4a8f-9878-459a0093cc34,min_disk=0,min_ram=0,name='UAG_
generic-180104',owner='7a2328acc0a6451c8b23fb8184932506',properties=ImageMetaProps,protected=<?>,size=769468416,status='active',tags=<?>,updated_at=2020-03-23T06:11:54Z,virtual_size=<?>,visibility=<?>), a
llow threads: True _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:567
2020-12-27 23:45:12.323 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Flavor limits 2:2:8 _get_c
pu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:313
2020-12-27 23:45:12.323 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Image limits 2:2:8 _get_cpu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:324
2020-12-27 23:45:12.324 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Flavor pref -1:-1:-1 _get_cpu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:347
2020-12-27 23:45:12.324 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Image pref -1:-1:-1 _get_cpu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:366
2020-12-27 23:45:12.325 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Chosen -1:-1:-1 limits 2:2:8 _get_cpu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:395
2020-12-27 23:45:12.325 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Topology preferred VirtCPUTopology(cores=-1,sockets=-1,threads=-1), maximum VirtCPUTopology(cores=2,sockets=2,threads=8) _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:571
2020-12-27 23:45:12.325 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Build topologies for 8 vcpu(s) 2:2:8 _get_possible_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:434
2020-12-27 23:45:12.326 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Got 4 possible topologies _get_possible_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:461
2020-12-27 23:45:12.326 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Possible topologies [VirtCPUTopology(cores=2,sockets=2,threads=2), VirtCPUTopology(cores=1,sockets=2,threads=4), VirtCPUTopology(cores=2,sockets=1,threads=4), VirtCPUTopology(cores=1,sockets=1,threads=8)] _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:576
2020-12-27 23:45:12.326 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Filtering topologies best for 1 threads _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:594
2020-12-27 23:45:12.327 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Remaining possible topologies [] _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:599
2020-12-27 23:45:12.327 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Sorted desired topologies [] _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:602
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] [instance: 890325b1-4d73
-4a9c-86c0-c4e811690c3f] Instance failed to spawn: IndexError: list index out of range

This is showing that the flavor and image specified no preference for sockets, cores, and threads (they are all -1) and 8 vcpus are required. The possible topologies that would satisfy the request for 8 vcpus and stay within the limits of 2 max cores, 2 max sockets, and 8 max threads are: [VirtCPUTopology(cores=2,sockets=2,threads=2), VirtCPUTopology(cores=1,sockets=2,threads=4), VirtCPUTopology(cores=2,sockets=1,threads=4), VirtCPUTopology(cores=1,sockets=1,threads=8)] . When there is no preference for the number of threads, the code will use a value of 1 for the desired number of threads. It will then filter for the closest number of threads that does not exceed the desired number of threads. Because only 4 or 8 threads could satisfy the request for 8 vcpus and 4 and 8 are greater than 1, all of the possible topologies were filtered out, leaving an empty list and the request could not be fulfilled.

Because the request expressed no preference for number of threads, one of the 4 possible cpu topologies should have been chosen instead of filtering all of the topologies out and returning an empty list. We will need to fix the logic around how requests without threads preference are handled.

Comment 16 errata-xmlrpc 2022-09-21 12:13:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.