Bug 1915098 - [OSP13] NUMA instance spawn fails on get_best_cpu_topology when there is no 'threads' preference
Summary: [OSP13] NUMA instance spawn fails on get_best_cpu_topology when there is no '...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Stephen Finucane
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On: 1915097
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-11 23:59 UTC by melanie witt
Modified: 2024-10-01 17:18 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1915097
Environment:
Last Closed: 2023-07-11 21:05:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 769614 0 None MERGED Fix max cpu topologies with numa affinity 2022-08-03 15:39:02 UTC
Red Hat Issue Tracker OSP-2132 0 None None None 2022-03-25 18:58:42 UTC

Comment 3 Nilesh 2021-02-17 01:17:21 UTC
* Do we have any workaround for the same. 

2021-02-12 22:49:09.761 1 DEBUG nova.virt.hardware [req-e108194b-f631-487f-bcf0-44b22edff115 aaab549c565f47d2a37ad83120ca1206 3c35142d74e947f0a3dd2780e335792c - default defa
ult] Sorted desired topologies [] _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:602
2021-02-12 22:49:09.761 1 ERROR nova.compute.manager [req-e108194b-f631-487f-bcf0-44b22edff115 aaab549c565f47d2a37ad83120ca1206 3c35142d74e947f0a3dd2780e335792c - default de
fault] [instance: 59137f57-641e-4568-b598-e7a827a01081] Instance failed to spawn: IndexError: list index out of range

2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da] Traceback (most recent call last):
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2273, in _build_resources
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da]     yield resources
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2053, in _build_and_run_instance
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da]     block_device_info=block_device_info)
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3110, in spawn
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da]     mdevs=mdevs)
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5429, in _get_guest_xml
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da]     context, mdevs)
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5184, in _get_guest_config
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da]     instance.numa_topology)
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3905, in _get_guest_cpu_config
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da]     flavor, image_meta, numa_topology=instance_numa_topology)
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da]   File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 625, in get_best_cpu_topology
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da]     allow_threads, numa_topology)[0]
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da] IndexError: list index out of range
2021-02-12 23:59:37.342 1 ERROR nova.compute.manager [instance: 2913b133-60cc-4205-9405-0be0c6ac48da] 

2021-02-12 22:49:09.763 1 DEBUG nova.compute.manager [req-e108194b-f631-487f-bcf0-44b22edff115 aaab549c565f47d2a37ad83120ca1206 3c35142d74e947f0a3dd2780e335792c - default default] [instance: 59137f57-641e-4568-b598-e7a827a01081] Start destroying the instance on the hypervisor. _shutdown_instance /usr/lib/python2.7/site-packages/nova/compute/manager.py:2387

available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102
node 0 size: 261597 MB
node 0 free: 1074 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103
node 1 size: 262144 MB
node 1 free: 62 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10

Comment 4 melanie witt 2021-02-17 04:09:42 UTC
To attempt to workaround this bug, you could try to specify a preference for number of threads (hw:cpu_threads) in the flavor or image.

The logic in the code is that when hw:cpu_threads is not specified, a value of hw:cpu_threads=1 will be used. Then, it will choose a CPU topology from the list of possible topologies that has threads <= to hw:cpu_threads [1]. If hw:cpu_threads=1, then only topologies with threads=1 will match.

The bug occurs when no hw:cpu_threads is specified, hw:cpu_threads=1 and if no topologies in the list of possible topologies have threads=1, an empty list of valid topologies is returned and the IndexError happens. The idea behind the workaround is that if you set hw:cpu_threads=2 or hw:cpu_threads=4 then you will be able to match topologies with threads=4 or threads=3 or threads=2 or threads=1. 

[1] https://github.com/openstack/nova/blob/stable/queens/nova/virt/hardware.py#L455-L466

Comment 5 smooney 2021-02-17 11:21:34 UTC
actully not quite hw:cpu_threads=2 means exactly 2
we generally recomend that  hw:cpu_threads should be set to the number of threads on your hosts.
gernerally it will be 2. you should set the sockets to equal the number of numa nodes requested in teh vm
and the cores shoudl be un restrited to allow the number of cores to increase per vcpu requested.

nova only support 2 things either a max for threads,cores and sockets or an exact value.
we do not support a preference.


Note You need to log in before you can comment on or make changes to this bug.