Bug 1850400 - VM resize via heat-stack fails with 'CPU set to unpin [3, 37, 38, 9, 10, 31] must be a subset of pinned CPU set [32, 35, 36, 4, 39, 7, 8, 41, 11, 13]'
Summary: VM resize via heat-stack fails with 'CPU set to unpin [3, 37, 38, 9, 10, 31] ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.0 (Train)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: z2
: 16.1 (Train on RHEL 8.2)
Assignee: Stephen Finucane
QA Contact: James Parker
URL:
Whiteboard:
Depends On: 1862396
Blocks: 1866161 2074195
TreeView+ depends on / blocked
 
Reported: 2020-06-24 09:09 UTC by Shailesh Chhabdiya
Modified: 2023-12-15 18:15 UTC (History)
18 users (show)

Fixed In Version: openstack-nova-20.4.1-1.20200914172612.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1866161 2074195 (view as bug list)
Environment:
Last Closed: 2020-10-28 15:38:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1879878 0 None None None 2020-07-09 01:39:54 UTC
OpenStack gerrit 744950 0 None MERGED tests: Add reproducer for bug #1879878 2021-01-07 23:47:13 UTC
OpenStack gerrit 744958 0 None MERGED Don't unset Instance.old_flavor, new_flavor until necessary 2021-01-07 23:47:13 UTC
Red Hat Issue Tracker OSP-3907 0 None None None 2022-04-11 18:33:38 UTC
Red Hat Product Errata RHEA-2020:4284 0 None None None 2020-10-28 15:38:31 UTC

Description Shailesh Chhabdiya 2020-06-24 09:09:03 UTC
Description of problem:
Stack creation works well via heat-stack but while upgrading the stack to resize the VM it fails with 'CPU set to unpin [3, 37, 38, 9, 10, 31]  must be a subset of pinned CPU set [32, 35, 36, 4, 39, 7, 8, 41, 11, 13]' 

Version-Release number of selected component (if applicable):
Red Hat OpenStack 16.0

Actual results:
Fails while resizing

Expected results:
Should be upgraded successfully

Additional info:

VM gets new flavor but till fails with the error and post hard reboot it works well.

Comment 2 Zane Bitter 2020-06-24 14:51:37 UTC
It's difficult to see how Heat could be the cause of this. It's likely the linked Nova issue https://bugs.launchpad.net/nova/+bug/1879878

Comment 5 Colum Gaynor 2020-07-05 14:53:19 UTC
Hi Team...

We are getting complaint from Nokia side at the lack of public update on Red Hat Support Case #02673096 which is associated with this Bugzilla. Escalation from customer is imminent as the situation has a business impact. so would appreciate some update on the BZ so that the support case can respond better to Nokia (Case Creator)

Colum Gaynor - Senior Customer Success Manager - CEE Focused Solutions + Nokia Global CSM

Comment 7 Artom Lifshitz 2020-07-08 19:36:11 UTC
I've looked at the logs, and I've formed a theory as to what is going on. However, because of hardware requirements which are difficult for me to meet with the virtual environments that I have easy access to, can I ask the customer to perform some actions to help me test my theory?

Here's what I think is happening.

First, a few relevant log snippets:

2020-06-10 10:55:42.720 [./0080-sosreport-destination-compute-13-2020-06-11-thpyiew.tar.xz/sosreport-compute-13-2020-06-11-thpyiew/var/log/containers/nova/nova-compute_destinaton.log.1] 7 ERROR nova.compute.manager [req-fa13d883-27ca-44f8-8b03-8d7f0761c06d 377fff61ff514aff98d801d865842d9b 081fd6e230c24b5aa47306c5ca50534e - default default] [instance: 493380a3-de3e-412a-9331-84c77a415e2a] Confirm resize failed on source host compute-13.localdomain. Resource allocations in the placement service will be removed regardless because the instance is now on the destination host compute-10.localdomain. You can try hard rebooting the instance to correct its state.: nova.exception.CPUUnpinningInvalid: CPU set to unpin [3, 37, 38, 9, 10, 31] must be a subset of pinned CPU set [32, 35, 36, 4, 39, 7, 8, 41, 11, 13]

And, within the stack trace that leads to the above error:

2020-06-10 10:55:42.720 [./0080-sosreport-destination-compute-13-2020-06-11-thpyiew.tar.xz/sosreport-compute-13-2020-06-11-thpyiew/var/log/containers/nova/nova-compute_destinaton.log.1] 7 ERROR nova.compute.manager [instance: 493380a3-de3e-412a-9331-84c77a415e2a]   File "/usr/lib/python3.6/site-packages/nova/objects/numa.py", line 120, in unpin_cpus_with_siblings

The instance is being resized from cpu_thread_policy=isolate (no SMT, not placed on tread siblings) to cpu_thread_policy=require (SMT, placed on thread siblings). When Nova removes the instance on the source (compute-13), it attempts to unpin its CPUs using the *new cpu_thread_policy, require* (by calling unpin_cpus_with_siblings and not just unpin_cpus). This is not valid, as it was pinned according to the isolate thread policy. The exact code snippet is in nova/virt/hardware.py:

  if free:
    if (instance_cell.cpu_thread_policy == fields.CPUThreadAllocationPolicy.ISOLATE):
      new_cell.unpin_cpus_with_siblings(pinned_cpus)
    else:
      new_cell.unpin_cpus(pinned_cpus)

It's difficult to fake SMT (or lack thereof) in virtual machines, so it's hard for me to reproduce this with the upstream master branch code. Can I ask for the customer's help? Here's what I'd like them to try:

1. Perform a few more resizes from cpu_thread_policy=isolate to cpu_thread_policy=require. This should be the only difference between the old and new flavor. If I'm correct, this should consistently fail.

2. Perform a few more resizes *without changing the cpu_thread_policy*. In other words, in the new flavor in comment #1, change hw:cpu_thread_policy='require' to hw:cpu_thread_policy='isolate'. Keep everything else identical, and perform the resize. If I'm correct, this should consistently succeed.

Thanks!

Comment 8 Artom Lifshitz 2020-07-08 20:14:48 UTC
(In reply to Artom Lifshitz from comment #7)
> I've looked at the logs, and I've formed a theory as to what is going on.
> However, because of hardware requirements which are difficult for me to meet
> with the virtual environments that I have easy access to, can I ask the
> customer to perform some actions to help me test my theory?
> 
> Here's what I think is happening.
> 
> First, a few relevant log snippets:
> 
> 2020-06-10 10:55:42.720
> [./0080-sosreport-destination-compute-13-2020-06-11-thpyiew.tar.xz/sosreport-
> compute-13-2020-06-11-thpyiew/var/log/containers/nova/nova-
> compute_destinaton.log.1] 7 ERROR nova.compute.manager
> [req-fa13d883-27ca-44f8-8b03-8d7f0761c06d 377fff61ff514aff98d801d865842d9b
> 081fd6e230c24b5aa47306c5ca50534e - default default] [instance:
> 493380a3-de3e-412a-9331-84c77a415e2a] Confirm resize failed on source host
> compute-13.localdomain. Resource allocations in the placement service will
> be removed regardless because the instance is now on the destination host
> compute-10.localdomain. You can try hard rebooting the instance to correct
> its state.: nova.exception.CPUUnpinningInvalid: CPU set to unpin [3, 37, 38,
> 9, 10, 31] must be a subset of pinned CPU set [32, 35, 36, 4, 39, 7, 8, 41,
> 11, 13]
> 
> And, within the stack trace that leads to the above error:
> 
> 2020-06-10 10:55:42.720
> [./0080-sosreport-destination-compute-13-2020-06-11-thpyiew.tar.xz/sosreport-
> compute-13-2020-06-11-thpyiew/var/log/containers/nova/nova-
> compute_destinaton.log.1] 7 ERROR nova.compute.manager [instance:
> 493380a3-de3e-412a-9331-84c77a415e2a]   File
> "/usr/lib/python3.6/site-packages/nova/objects/numa.py", line 120, in
> unpin_cpus_with_siblings
> 
> The instance is being resized from cpu_thread_policy=isolate (no SMT, not
> placed on tread siblings) to cpu_thread_policy=require (SMT, placed on
> thread siblings). When Nova removes the instance on the source (compute-13),
> it attempts to unpin its CPUs using the *new cpu_thread_policy, require* (by
> calling unpin_cpus_with_siblings and not just unpin_cpus). This is not
> valid, as it was pinned according to the isolate thread policy. The exact
> code snippet is in nova/virt/hardware.py:
> 
>   if free:
>     if (instance_cell.cpu_thread_policy ==
> fields.CPUThreadAllocationPolicy.ISOLATE):
>       new_cell.unpin_cpus_with_siblings(pinned_cpus)
>     else:
>       new_cell.unpin_cpus(pinned_cpus)
> 
> It's difficult to fake SMT (or lack thereof) in virtual machines,

I was instructed that it's in fact trivially easy to fake SMT in virtual machines, so I've done so in my environment. I've tested resizing from isolate to require, and it worked, so clearly I've missed something. I'll continue digging. That being said, it wouldn't be pointless for the customer to run the tests I've proposed, as it could shed further light on the reproducibility of this issue.

Comment 9 smooney 2020-07-09 01:19:10 UTC
I was able to reproduce the error once after several attempts testing an alternate theory

operating under the assumption that since the flavors where being updated such that the vms which previous could only run
on host with hyperthreading disabled to only run on hosts where hypertreading was enabled i speculated that in addtion to 
althering the flavor they may also be re-configuring the bios to enable hyperthreading/SMT.

to that end i created a vm with SMT enabled and offlined the odd numbered thread siblings of the pinned core then restarted libvirt and nova compute.

i.e.
ubuntu@numa-2:/opt/repos/devstack$ for cpu in $(seq 1 2 $(nproc --all) | xargs -n1 -I '{}'  echo /sys/devices/system/cpu/cpu{}/online); do echo 0 | sudo tee $cpu; done
0
0
0
0
0
0
ubuntu@numa-2:/opt/repos/devstack$ sudo systemctl restart libvirtd
ubuntu@numa-2:/opt/repos/devstack$ sudo systemctl restart devstack@n-cpu


on this host i booted a vm with the isolate policy i then resized it to a required flavor
to another host with SMT enabled

ubuntu@numa-1:/opt/repos/devstack$ openstack flavor show isolate
+----------------------------+----------------------------------------------------------------------------------------------------------------------+
| Field                      | Value                                                                                                                |
+----------------------------+----------------------------------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                                                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                                    |
| access_project_ids         | None                                                                                                                 |
| disk                       | 1                                                                                                                    |
| id                         | 9ac09778-5157-42b6-a4c0-5ade60087b6d                                                                                 |
| name                       | isolate                                                                                                              |
| os-flavor-access:is_public | True                                                                                                                 |
| properties                 | hw:cpu_policy='dedicated', hw:cpu_sockets='1', hw:cpu_thread_policy='isolate', hw:cpu_threads='2', hw:numa_nodes='2' |
| ram                        | 512                                                                                                                  |
| rxtx_factor                | 1.0                                                                                                                  |
| swap                       |                                                                                                                      |
| vcpus                      | 4                                                                                                                    |
+----------------------------+----------------------------------------------------------------------------------------------------------------------+
ubuntu@numa-1:/opt/repos/devstack$ openstack flavor show require
+----------------------------+----------------------------------------------------------------------------------------------------------------------+
| Field                      | Value                                                                                                                |
+----------------------------+----------------------------------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                                                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                                    |
| access_project_ids         | None                                                                                                                 |
| disk                       | 1                                                                                                                    |
| id                         | 4ab63c8b-4530-4619-83de-db088564b341                                                                                 |
| name                       | require                                                                                                              |
| os-flavor-access:is_public | True                                                                                                                 |
| properties                 | hw:cpu_policy='dedicated', hw:cpu_sockets='1', hw:cpu_thread_policy='require', hw:cpu_threads='2', hw:numa_nodes='2' |
| ram                        | 512                                                                                                                  |
| rxtx_factor                | 1.0                                                                                                                  |
| swap                       |                                                                                                                      |
| vcpus                      | 4                                                                                                                    |
+----------------------------+----------------------------------------------------------------------------------------------------------------------+

before confirming the resize i onlineed the hyperthread and restart libvirt and the comptue serivce

ubuntu@numa-2:/opt/repos/devstack$ for cpu in $(seq 1 2 $(nproc --all) | xargs -n1 -I '{}'  echo /sys/devices/system/cpu/cpu{}/online); do echo 1 | sudo tee $cpu; done
1
1
1
1
1
1
ubuntu@numa-2:/opt/repos/devstack$ journalctl -u devstack@n-cpu -f -n 5000 | lnav -q

ubuntu@numa-2:/opt/repos/devstack$ sudo systemctl restart libvirtd
ubuntu@numa-2:/opt/repos/devstack$ sudo systemctl restart devstack@n-cpu

and it work flawlessly.

i re offlined the cores and resized it back to the isolate flavor
and aftere 8 or 9 time i got lucky and triggered the issue.

http://paste.openstack.org/show/795679/

I do not know if this is related to onlining or offlining the hyperthread sibling
however it only happened after i booted the vm with SMT disabled, enabled SMT,
started the resize, disabled SMT, waited for the periodic update_avaiable_resouces task to run
and then confirmed the migration.

We do not have a reliable reproducer yet but we can retest this tomorrow and see if it can be recreated issue.
The behaviour does not really fit either of our working theories as of yet so this will require more investigation
to understand why and how it happens before we can advise further.

Comment 12 smooney 2020-07-09 22:02:09 UTC
while we have not fully root caused the issue we have made some determinations.

first we have see form the virsh capabilities that both compute-10 and compute-13
have SMT(hyper threading) enabled

the deployment also has the numatoplogy filter disabled.
this is an unsupported congurations.

if a customer is useing any numa related feature such as cpu pinning we require the
use fo the numa toplogy filter but it is not enabled on controller 0 

enabled_filters=RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,PciPassthroughFilter

the workaround config option for disabling fallback to using VCPU for pinned instances is set to false.
or rather it is commented out and the default value is false

#disable_fallback_pcpu_query=false

as a result when vms with the isolate policy can be booted on host with smt enabled with is not supported when using the cpu_dedicated_set config option.

as an intermediate step while we continue to investigate this can you ask nokia to enable the NUMATopologyFilter
by appending it to the end of the enabled_filters list.


this will be required to support numa in the deployment regardless of what else we find but it may be enough to mitigate the issue by preventing invalide placement of vms intially.

Comment 15 Bertrand 2020-07-23 15:49:19 UTC
Hi Sean,
Trying to sum up some of the config at play here to insure we're in agreement; Can you take a look?

In nova.conf:
cpu_dedicated_set=3-13,31-41,17-27,45-55
cpu_shared_set=1-2,29-30,15-16,43-44

#enabled_filters=AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter # What about NUMATopologyFilter ? Should we make sure it is enabled?

#disable_fallback_pcpu_query=false # Should this be set to Yes.

Flavor change from isolate to require as part of the update.
Going from 
hw:cpu_policy='dedicated', 
hw:cpu_thread_policy='isolate', # only valid on host with SMT disabled - How could this works if this is NOKIA starting point prior to flavor update
hw:cpu_sockets='1', 
hw:cpu_threads='2', 
hw:numa_nodes='2' 

hw:cpu_policy='dedicated', 
hw:cpu_thread_policy='require', #  only valid on host with SMT enabled
hw:cpu_sockets='1', 
hw:cpu_threads='2', 
hw:numa_nodes='2'

Comment 16 smooney 2020-07-24 10:35:31 UTC
As i have stated previously yes the NUMATopologyFilter is unconditionally required if you use any numa related feature for the deployment to be supported.
If the deployment does not have the NUMATopologyFilter enabled and they use cpu pinning, multiple virtual numa nodes,  explicti mempages/hugepages or pmem
then the deployment is in an unsupported configuration.

so the customer was starting form an unsupport configuration for multiple reasons.

by using cpu_dedicated_set and cpu_shared_set the host is reporting separate inventories of pCPUs and vCPUs to placemnet.

the disable_fallback_pcpu_query config option is there to contol the behavior on upgrade.
if you are starting with a stine host and move to train it would be useing  vcpu_pin_set and possibly cpu_shared_set but it would report only vCPUs.
When disable_fallback_pcpu_query=false  (the default) it allow the scudler to use vcpus for pinned instances in placement which is the pre train/osp16 behaviour.
this is require for upgrades as initially you need to upgrade a cloud to train using the vcpu_pin_set then when all compute nodes reach train you can 
replace the usage of vcpu_pin_set with cpu dedicated set triggering a reshap of allocation of pinned instnace form vCPUs to pCPUs and you can disable the fallback.

osp 16 does not currently support upgrades form any older release as such this deployment should be a green field.
in this case starting with 

cpu_dedicated_set=3-13,31-41,17-27,45-55
cpu_shared_set=1-2,29-30,15-16,43-44

is the correct thing to do and disable_fallback_pcpu_query shoudl be set to true.
however the NUMATopologyFilter would have blocked the isolate instance landing on host with smt enable if it was used.
Without the NUMATopologyFilter an instance with the isolate policy would get no results form the query for pCPUS as the isolate policy add the hyperthread trait as forbidden
however the fallback query will return allocation using vcpus. This is one of the reasons the NUMATopologyFilter is required when using cpu pinning although there are others.


by using cpu_dedicated_set and not enabling the NUMATopologyFilter and not setting disable_fallback_pcpu_query=true nova has been miscofigured in such a way that
the isolate policy will not work which is not a bug. starting form that misconfigured state there then seams to be an issue with resize where by unpinning
the cpus on the host can fail. it is not clear why this happens yet. so far i have only triggered the unpinning exception in a misconfigured state.

i have not seen it when the fallback is disabled as hosts with SMT enabled will be eliminated.
and the cpu_dedicated_set should only be defiend when disable_fallback_pcpu_query=true

Comment 17 smooney 2020-07-24 10:51:44 UTC
actully sorry that is not quite right

"however the NUMATopologyFilter would have blocked the isolate instance landing on host with smt enable if it was used.
Without the NUMATopologyFilter an instance with the isolate policy would get no results form the query for pCPUS as the isolate policy add the hyperthread trait as forbidden
however the fallback query will return allocation using vcpus."

i did not mean NUMATopologyFilter i meant disable_fallback_pcpu_query=true.

the NUMATopologyFilter would allow the old behaviour where flavor with the isolate policy would attempt to claim all thread siblings on a physical core
which is only valid when cpu_dedicated_set is not used and vcpu_pin_set is used with disable_fallback_pcpu_query=false or unset.

that is to say the NUMATopologyFilter allows a larger subset of host then is permissably when cpu_dedicated_set is used
however placement will filter out those inpermissable hosts before the NUMATopologyFilter executes if disable_fallback_pcpu_query=true.

Comment 18 Bertrand 2020-07-24 12:00:37 UTC
So before doing the resize, we should start with a valid configuration.
Given the following assumption:
a) All hosts have SMT enabled.
b) We're not doing an upgrade from Steain to Train. (OSP16 / Train green filed deployment)

With that said a valid Nova / Flavor configuration should be:

In nova.conf:
cpu_dedicated_set=3-13,31-41,17-27,45-55
cpu_shared_set=1-2,29-30,15-16,43-44

enabled_filters=NUMATopologyFilter # plus other as need be
disable_fallback_pcpu_query=true # Still not clear if this is actually needed.

hw:cpu_policy='dedicated', 
hw:cpu_thread_policy='require',
hw:cpu_sockets='1', 
hw:cpu_threads='2', 
hw:numa_nodes='2'

Comment 19 smooney 2020-07-24 12:20:54 UTC
cpu_dedicated_set is used on teh compute nodes and disable_fallback_pcpu_query is used by the schduler.
if you use cpu_dedicated_set on the compute nodes instead of the deprecated vcpu_pin_set then "disable_fallback_pcpu_query=true" should be defined in the scheduler/controller configs.
if you are using vcpu_pin_set instaead of cpu_dedicated_set then disable_fallback_pcpu_query should be false or unset
we have two different config options because these are used in different places but the values of them are closely related.
when you choose to start tracking pcpus in placement you need to disabel the fallback.

if we assume that a) is ture an all host have SMT enabled then no host is a vailid candiate for vms using the isolate policy if you use the new way of tracking cpus via placment.
i.e. if you use cpu_dedicated_set and disable_fallback_pcpu_query=true the iolate policy means select a host with SMT disabled in this case it would get a no valid host as all host have SMT enabled.

Comment 20 Stephen Finucane 2020-07-31 10:39:36 UTC
There are likely two bugs here. There is the initial issue described in comment 0, whereby nova apparently ends up confused about what host cores the instance is pinned to. This is still being investigated and continues to be tracked by this bug. There is also a second issue, whereby instances end up consuming 'VCPU' instead of 'PCPU' on hosts with new-style configuration ('[compute] cpu_dedicated_set' and '[compute] cpu_shared_set'). This is now being tracked by #1862396 and has a fix targeted at z2.

Comment 21 Stephen Finucane 2020-08-05 18:02:50 UTC
I've identified the primary bug and proposed a fix upstream. As mentioned in comment 1, this is the same bug as 1879878 on Launchpad. The bug appears to be caused by a race between a periodic task and the cleanup of a confirmed instance. The fix is unlikely to land until 16.1.2, however, in the interim you should be able to workaround it by setting the '[DEFAULT] update_resources_interval' config option to a reasonably high value (e.g. 120 seconds). I will update when I have a target release.

Comment 24 Stephen Finucane 2020-09-11 17:27:48 UTC
This is merged to master upstream. Proceeding with backports.

Comment 33 errata-xmlrpc 2020-10-28 15:38:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4284


Note You need to log in before you can comment on or make changes to this bug.