Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2152723

Summary: Issues when scheduling certain asymmetric multi-numa guest topologies
Product: Red Hat OpenStack Reporter: Artom Lifshitz <alifshit>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED DUPLICATE QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: low Docs Contact:
Priority: low    
Version: 17.1 (Wallaby)CC: bgibizer, dasmith, eglynn, jhakimra, kchamart, osp-dfg-compute, sbauza, sgordon, vromanso
Target Milestone: z2Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-08 15:15:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Artom Lifshitz 2022-12-12 19:58:57 UTC
This bug was initially created as a copy of Bug #2135439

I am copying this bug because: 

Need the fix in 17.1.

Description of problem: This is meant to track the issue found originally when testing [1] for 17.0.  I have also tried doing this without the mixed dedicated policy and I ran into scheduling issues as well.  Below is the working and failing scenarios tried:

Compute Host NUMA Topology:

  NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
  NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39

Hosts Dedicated/Shared Configurations:
[tripleo-admin@computesriov-1 ~]$ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf compute cpu_dedicated_set
24-39
[tripleo-admin@computesriov-1 ~]$ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf compute cpu_shared_set
20-23

[tripleo-admin@computesriov-0 ~]$ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf compute cpu_dedicated_set
4-19
[tripleo-admin@computesriov-0 ~]$ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf compute cpu_shared_set
0,1,2,3

##########################################################################
# Working deployment (3 vCPU, asymmetric, and no mixed dedicated policy) #
##########################################################################

(overcloud) [stack@undercloud-0 ~]$ openstack flavor show tempest-MixedCPUPolicyTestMultiNuma-flavor-1395833240
+----------------------------+-------------------------------------------------------------------------------------------------------+
| Field                      | Value                                                                                                 |
+----------------------------+-------------------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                                                 |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                     |
| access_project_ids         | None                                                                                                  |
| description                | None                                                                                                  |
| disk                       | 1                                                                                                     |
| id                         | 862970624                                                                                             |
| name                       | tempest-MixedCPUPolicyTestMultiNuma-flavor-1395833240                                                 |
| os-flavor-access:is_public | True                                                                                                  |
| properties                 | hw:numa_cpus.0='0', hw:numa_cpus.1='1,2', hw:numa_mem.0='256', hw:numa_mem.1='768', hw:numa_nodes='2' |
| ram                        | 1024                                                                                                  |
| rxtx_factor                | 1.0                                                                                                   |
| swap                       |                                                                                                       |
| vcpus                      | 3                                                                                                     |
+----------------------------+-------------------------------------------------------------------------------------------------------+

XML Output
  <vcpu placement='static'>3</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='20,22'/>
    <vcpupin vcpu='1' cpuset='21,23'/>
    <vcpupin vcpu='2' cpuset='21,23'/>
    <emulatorpin cpuset='20-23'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0-1'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
    <memnode cellid='1' mode='strict' nodeset='1'/>
  </numatune>


######################################################################
# Failed deployment (3 vCPU, asymmetric, and mixed dedicated policy) #
######################################################################

(overcloud) [stack@undercloud-0 ~]$ openstack flavor show tempest-MixedCPUPolicyTestMultiNuma-flavor-1583852539
/usr/lib/python3.9/site-packages/openstack/config/cloud_region.py:452: UserWarning: You have a configured API_VERSION with 'latest' in it. In the context of openstacksdk this doesn't make any sense.
  warnings.warn(
+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                      | Value                                                                                                                                                    |
+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                                                                                                    |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                                                                        |
| access_project_ids         | None                                                                                                                                                     |
| description                | None                                                                                                                                                     |
| disk                       | 1                                                                                                                                                        |
| id                         | 1495385529                                                                                                                                               |
| name                       | tempest-MixedCPUPolicyTestMultiNuma-flavor-1583852539                                                                                                    |
| os-flavor-access:is_public | True                                                                                                                                                     |
| properties                 | hw:cpu_dedicated_mask='^0', hw:cpu_policy='mixed', hw:numa_cpus.0='0', hw:numa_cpus.1='1,2', hw:numa_mem.0='256', hw:numa_mem.1='768', hw:numa_nodes='2' |
| ram                        | 1024                                                                                                                                                     |
| rxtx_factor                | 1.0                                                                                                                                                      |
| swap                       |                                                                                                                                                          |
| vcpus                      | 3                                                                                                                                                        |
+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+

nova-scheduler.log:2022-10-17 16:25:17.163 12 DEBUG nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] Attempting to fit instance cell InstanceNUMACell(cpu_pinning_raw=None,cpu_policy='mixed',cpu_thread_policy=None,cpu_topology=<?>,cpuset=set([0]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([])) on host_cell NUMACell(cpu_usage=0,cpuset=set([20,22]),id=0,memory=128210,memory_usage=0,mempages=[NUMAPagesTopology,NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pcpuset=set([32,34,36,38,24,26,28,30]),pinned_cpus=set([]),siblings=[set([36]),set([24]),set([20]),set([30]),set([38]),set([22]),set([26]),set([28]),set([32]),set([34])],socket=0) _numa_fit_instance_cell /usr/lib/python3.9/site-packages/nova/virt/hardware.py:929
nova-scheduler.log:2022-10-17 16:25:17.163 12 DEBUG nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] No specific pagesize requested for instance, selected pagesize: 4 _numa_fit_instance_cell /usr/lib/python3.9/site-packages/nova/virt/hardware.py:956
nova-scheduler.log:2022-10-17 16:25:17.163 12 DEBUG nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] Instance has requested pinned CPUs _numa_fit_instance_cell /usr/lib/python3.9/site-packages/nova/virt/hardware.py:1021
nova-scheduler.log:2022-10-17 16:25:17.164 12 DEBUG nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] Packing an instance onto a set of siblings:     host_cell_free_siblings: [{36}, {24}, set(), {30}, {38}, set(), {26}, {28}, {32}, {34}]    instance_cell: InstanceNUMACell(cpu_pinning_raw=None,cpu_policy='mixed',cpu_thread_policy=None,cpu_topology=<?>,cpuset=set([0]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([]))    host_cell_id: 0    threads_per_core: 1    num_cpu_reserved: 0 _pack_instance_onto_cores /usr/lib/python3.9/site-packages/nova/virt/hardware.py:658
nova-scheduler.log:2022-10-17 16:25:17.164 12 DEBUG nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] Built sibling_sets: defaultdict(<class 'list'>, {1: [{36}, {24}, {30}, {38}, {26}, {28}, {32}, {34}]}) _pack_instance_onto_cores /usr/lib/python3.9/site-packages/nova/virt/hardware.py:679
nova-scheduler.log:2022-10-17 16:25:17.164 12 DEBUG nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] User did not specify a thread policy. Using default for 1 cores _pack_instance_onto_cores /usr/lib/python3.9/site-packages/nova/virt/hardware.py:794
nova-scheduler.log:2022-10-17 16:25:17.164 12 INFO nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] Computed NUMA topology CPU pinning: usable pCPUs: [[36], [24], [30], [38], [26], [28], [32], [34]], vCPUs mapping: []
nova-scheduler.log:2022-10-17 16:25:17.165 12 INFO nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] Computed NUMA topology CPU pinning: usable pCPUs: [[36], [24], [30], [38], [26], [28], [32], [34]], vCPUs mapping: []
nova-scheduler.log:2022-10-17 16:25:17.165 12 DEBUG nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] Failed to map instance cell CPUs to host cell CPUs _numa_fit_instance_cell /usr/lib/python3.9/site-packages/nova/virt/hardware.py:1049
nova-scheduler.log:2022-10-17 16:25:17.165 12 DEBUG nova.scheduler.filters.numa_topology_filter [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] [instance: fa4f0399-3507-42e8-a02d-5fd1269b3762] computesriov-1.localdomain, computesriov-1.localdomain fails NUMA topology requirements. The instance does not fit on this host. host_passes /usr/lib/python3.9/site-packages/nova/scheduler/filters/numa_topology_filter.py:106

###########################################################
# Failed deployment (4 vCPU, asymmetric, no mixed policy) #
###########################################################
(overcloud) [stack@undercloud-0 ~]$ openstack flavor show tempest-MixedCPUPolicyTestMultiNuma-flavor-1023626242
/usr/lib/python3.9/site-packages/openstack/config/cloud_region.py:452: UserWarning: You have a configured API_VERSION with 'latest' in it. In the context of openstacksdk this doesn't make any sense.
  warnings.warn(
+----------------------------+---------------------------------------------------------------------------------------------------------+
| Field                      | Value                                                                                                   |
+----------------------------+---------------------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                                                   |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                       |
| access_project_ids         | None                                                                                                    |
| description                | None                                                                                                    |
| disk                       | 1                                                                                                       |
| id                         | 2041661630                                                                                              |
| name                       | tempest-MixedCPUPolicyTestMultiNuma-flavor-1023626242                                                   |
| os-flavor-access:is_public | True                                                                                                    |
| properties                 | hw:numa_cpus.0='0', hw:numa_cpus.1='1,2,3', hw:numa_mem.0='256', hw:numa_mem.1='768', hw:numa_nodes='2' |
| ram                        | 1024                                                                                                    |
| rxtx_factor                | 1.0                                                                                                     |
| swap                       |                                                                                                         |
| vcpus                      | 4                                                                                                       |
+----------------------------+---------------------------------------------------------------------------------------------------------+
nova-scheduler.log:2022-10-17 16:32:58.191 12 DEBUG nova.virt.hardware [req-96e2e8a3-8ea4-49ba-94e5-44bc00cb3081 e0c95053ea3742a79ff16dcdf2c40b0c b29448db925f45c58bd7aa693df43f9f - default default] Attempting to fit instance cell InstanceNUMACell(cpu_pinning_raw=None,cpu_policy=None,cpu_thread_policy=None,cpu_topology=<?>,cpuset=set([1,2,3]),cpuset_reserved=None,id=1,memory=768,pagesize=None,pcpuset=set([])) on host_cell NUMACell(cpu_usage=0,cpuset=set([20,22]),id=0,memory=128210,memory_usage=0,mempages=[NUMAPagesTopology,NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pcpuset=set([32,34,36,38,24,26,28,30]),pinned_cpus=set([]),siblings=[set([36]),set([24]),set([20]),set([30]),set([38]),set([22]),set([26]),set([28]),set([32]),set([34])],socket=0) _numa_fit_instance_cell /usr/lib/python3.9/site-packages/nova/virt/hardware.py:929
nova-scheduler.log:2022-10-17 16:32:58.191 12 DEBUG nova.virt.hardware [req-96e2e8a3-8ea4-49ba-94e5-44bc00cb3081 e0c95053ea3742a79ff16dcdf2c40b0c b29448db925f45c58bd7aa693df43f9f - default default] No specific pagesize requested for instance, selected pagesize: 4 _numa_fit_instance_cell /usr/lib/python3.9/site-packages/nova/virt/hardware.py:956
nova-scheduler.log:2022-10-17 16:32:58.192 12 DEBUG nova.virt.hardware [req-96e2e8a3-8ea4-49ba-94e5-44bc00cb3081 e0c95053ea3742a79ff16dcdf2c40b0c b29448db925f45c58bd7aa693df43f9f - default default] Not enough host cell CPUs to fit instance cell; required: 3, actual: 2 _numa_fit_instance_cell /usr/lib/python3.9/site-packages/nova/virt/hardware.py:1010



Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20221004.n.1

How reproducible:
100%

Steps to Reproduce:
1. Deploy 17.1 and attempt to deploy guest with above flavors
2.
3.

Actual results:
Asymmetric topologies with mixed dedicated polices and/or 4+ vCPUs fail to schedule

Expected results:
Above failing flavors schedule correctly


Additional info:
Bed is available upon request

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1693377

Comment 3 Artom Lifshitz 2023-08-08 15:15:49 UTC
BZ 2135439 was moved to 17.1, closing this as a duplicate.

*** This bug has been marked as a duplicate of bug 2135439 ***

Comment 4 Red Hat Bugzilla 2023-12-07 04:25:53 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days