Bug 1921070 - [Upgrades] Live migration from OSP 13 (RHEL 7.9) to OSP 16.2 (RHEL 8.3) fails due to CPU incompatibility
Summary: [Upgrades] Live migration from OSP 13 (RHEL 7.9) to OSP 16.2 (RHEL 8.3) fail...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: beta
: 16.2 (Train on RHEL 8.4)
Assignee: Kashyap Chamarthy
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On: 1923165 1965811 1981432 2002346
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-27 13:33 UTC by Lukas Bezdicka
Modified: 2023-03-21 19:43 UTC (History)
16 users (show)

Fixed In Version: openstack-nova-20.6.1-2.20210322095026.7139634.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-15 07:11:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Nova Compute log from the destination host that has the XML being passed to compareCPU(), the libvirt API (6.27 MB, text/plain)
2021-01-27 17:28 UTC, Kashyap Chamarthy
no flags Details
guest.xml (generated by putting the <cpu> </cpu> elements from the nova-compute.log.1 at the timestamp 2021-01-26 22:37:31.346) (2.33 KB, text/plain)
2021-01-27 18:40 UTC, Kashyap Chamarthy
no flags Details
domcaps.xml from L0 (1.04 KB, text/plain)
2021-01-27 18:42 UTC, Kashyap Chamarthy
no flags Details
hostcaps.xml from L0 (12.56 KB, text/plain)
2021-01-27 18:46 UTC, Kashyap Chamarthy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1913716 0 None None None 2021-01-29 10:05:17 UTC
OpenStack gerrit 772917 0 None NEW libvirt: Remove compareCPU() check on the destination 2021-06-16 14:46:57 UTC
OpenStack gerrit 783969 0 None ABANDONED Computes the TSX Flags for the compute nodes 2021-06-16 14:46:59 UTC
Red Hat Issue Tracker OSP-1634 0 None None None 2023-03-21 19:43:08 UTC
Red Hat Issue Tracker OSP-1635 0 None None None 2023-03-21 19:43:17 UTC
Red Hat Issue Tracker OSP-1636 0 None None None 2023-03-21 19:43:21 UTC
Red Hat Issue Tracker UPG-2540 0 None None None 2021-08-27 16:43:05 UTC
Red Hat Product Errata RHEA-2021:3483 0 None None None 2021-09-15 07:12:16 UTC

Description Lukas Bezdicka 2021-01-27 13:33:05 UTC
Crucial feature of Openstack 13->16 FFWD upgrade stopped working on latest rhel8.3

Nova logs:

2021-01-26 23:30:25.169 7 ERROR nova.virt.libvirt.driver [req-774be110-7fb6-4865-a177-d624a821cf9e 19ec0130b8714aac8c64a5c2ee5b914b 352675f5f34d45d59bdd61fde58e4bd0 - default default] CPU doesn't have compatibility.
Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
2021-01-26 23:30:25.242 7 ERROR oslo_messaging.rpc.server [req-774be110-7fb6-4865-a177-d624a821cf9e 19ec0130b8714aac8c64a5c2ee5b914b 352675f5f34d45d59bdd61fde58e4bd0 - default default] Exception during message handling: nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility.

In nested virt scenarion we have hypervisor hosting 2 vms holding computes of openstack. Livemigration of vms running on these computes fails with the ^^ nova error.

Comparison of the computes and virsh domcapabilities diff:

OSP13 RHEL7.9 libvirt-daemon-4.5.0-36.el7_9.3.x86_64          |      RHEL8.3 libvirt-daemon-6.6.0-7.1.module+el8.3.0+8852+b44fca9f.x86_64
  <cpu>                                                       <
    <mode name='host-passthrough' supported='yes'/>           <
    <mode name='host-model' supported='yes'>                        <mode name='host-model' supported='yes'>         
      <model fallback='forbid'>Skylake-Server-IBRS</model>    |       <model fallback='forbid'>Cascadelake-Server</model>
      <vendor>Intel</vendor>                                          <vendor>Intel</vendor>
      <feature policy='require' name='ss'/>                           <feature policy='require' name='ss'/>          
                                                              >       <feature policy='require' name='vmx'/>         
      <feature policy='require' name='hypervisor'/>                   <feature policy='require' name='hypervisor'/>  
      <feature policy='require' name='tsc_adjust'/>                   <feature policy='require' name='tsc_adjust'/>  
      <feature policy='require' name='clflushopt'/>           <
      <feature policy='require' name='umip'/>                         <feature policy='require' name='umip'/>        
      <feature policy='require' name='pku'/>                          <feature policy='require' name='pku'/>         
      <feature policy='require' name='avx512vnni'/>           <
      <feature policy='require' name='md-clear'/>                     <feature policy='require' name='md-clear'/>    
      <feature policy='require' name='stibp'/>                        <feature policy='require' name='stibp'/>       
      <feature policy='require' name='ssbd'/>                 |       <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='ibpb'/>                         <feature policy='require' name='ibpb'/>        
                                                              >       <feature policy='require' name='amd-stibp'/>   
                                                              >       <feature policy='require' name='amd-ssbd'/>    
                                                              >       <feature policy='require' name='rdctl-no'/>    
                                                              >       <feature policy='require' name='ibrs-all'/>    
                                                              >       <feature policy='require' name='skip-l1dfl-vmentry'/>
                                                              >       <feature policy='require' name='mds-no'/>      
                                                              >       <feature policy='require' name='pschange-mc-no'/>
                                                              >       <feature policy='disable' name='hle'/>         
                                                              >       <feature policy='disable' name='rtm'/>         
      <feature policy='disable' name='mpx'/>                          <feature policy='disable' name='mpx'/>         
    </mode>                                                         </mode>

diff virsh capabiliries
    <cpu>                                                           <cpu>
      <arch>x86_64</arch>                                             <arch>x86_64</arch>
      <model>Skylake-Server-IBRS</model>                      |       <model>Cascadelake-Server-noTSX</model>        
      <vendor>Intel</vendor>                                          <vendor>Intel</vendor>
      <microcode version='1'/>                                        <microcode version='1'/>
      <topology sockets='8' cores='1' threads='1'/>           |       <topology sockets='8' dies='1' cores='1' threads='1'/>
      <feature name='ss'/>                                            <feature name='ss'/>
      <feature name='vmx'/>                                           <feature name='vmx'/>
      <feature name='osxsave'/>                                       <feature name='osxsave'/>
      <feature name='hypervisor'/>                                    <feature name='hypervisor'/>
      <feature name='tsc_adjust'/>                                    <feature name='tsc_adjust'/>
      <feature name='clflushopt'/>                            <
      <feature name='umip'/>                                          <feature name='umip'/>
      <feature name='pku'/>                                           <feature name='pku'/>
      <feature name='ospke'/>                                         <feature name='ospke'/>
      <feature name='avx512vnni'/>                            <
      <feature name='md-clear'/>                                      <feature name='md-clear'/>
      <feature name='stibp'/>                                         <feature name='stibp'/>
      <feature name='arch-facilities'/>                       |       <feature name='arch-capabilities'/>            
      <feature name='ssbd'/>                                  <
      <feature name='xsaves'/>                                        <feature name='xsaves'/>
      <feature name='ibpb'/>                                          <feature name='ibpb'/>
                                                              >       <feature name='amd-ssbd'/>
                                                              >       <feature name='rdctl-no'/>
                                                              >       <feature name='ibrs-all'/>
                                                              >       <feature name='skip-l1dfl-vmentry'/>           
                                                              >       <feature name='mds-no'/>
                                                              >       <feature name='pschange-mc-no'/>
                                                              >       <feature name='tsx-ctrl'/>
      <pages unit='KiB' size='4'/>                                    <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>                                 <pages unit='KiB' size='2048'/>
      <pages unit='KiB' size='1048576'/>                              <pages unit='KiB' size='1048576'/>             
    </cpu>                                                          </cpu>





Hypervisor:
RHEL8.2 libvirt-client-6.0.0-25.5.module+el8.2.1+8680+ea98947b.x86_64
    <mode name='host-passthrough' supported='yes'/>
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Cascadelake-Server</model>
      <vendor>Intel</vendor>
      <feature policy='require' name='ss'/>
      <feature policy='require' name='vmx'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='umip'/>
      <feature policy='require' name='pku'/>
      <feature policy='require' name='md-clear'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='xsaves'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='ibpb'/>
      <feature policy='require' name='amd-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='ibrs-all'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>
      <feature policy='require' name='tsx-ctrl'/>
    </mode>

virsh capabilities:
    <cpu>
      <arch>x86_64</arch>
      <model>Cascadelake-Server</model>
      <vendor>Intel</vendor>
      <microcode version='83898371'/>
      <counter name='tsc' frequency='2095077000' scaling='yes'/>                                                     
      <topology sockets='1' dies='1' cores='20' threads='2'/>                                                        
      <feature name='ds'/>
      <feature name='acpi'/>
      <feature name='ss'/>
      <feature name='ht'/>
      <feature name='tm'/>
      <feature name='pbe'/>
      <feature name='dtes64'/>
      <feature name='monitor'/>
      <feature name='ds_cpl'/>
      <feature name='vmx'/>
      <feature name='smx'/>
      <feature name='est'/>
      <feature name='tm2'/>
      <feature name='xtpr'/>
      <feature name='pdcm'/>
      <feature name='dca'/>
      <feature name='osxsave'/>
      <feature name='tsc_adjust'/>
      <feature name='cmt'/>
      <feature name='intel-pt'/>
      <feature name='pku'/>
      <feature name='ospke'/>
      <feature name='md-clear'/>
      <feature name='stibp'/>
      <feature name='arch-capabilities'/>
      <feature name='xsaves'/>
      <feature name='mbm_total'/>
      <feature name='mbm_local'/>
      <feature name='invtsc'/>
      <feature name='rdctl-no'/>
      <feature name='ibrs-all'/>
      <feature name='skip-l1dfl-vmentry'/>
      <feature name='mds-no'/>
      <feature name='tsx-ctrl'/>
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
      <pages unit='KiB' size='1048576'/>
    </cpu>

Comment 1 Kashyap Chamarthy 2021-01-27 14:03:27 UTC
(I'm just looking at this bug.)

IIUC, this is not the 'arch-facilities' (a RHEL-7-only thing) vs. 
'arch-capabilities' (RHEL-8) issue from last year, which OSP fixed it by
not advertising the 'arch-facilities' CPU feature on the source host
(RHEL-7):


    https://bugzilla.redhat.com/show_bug.cgi?id=1867128 — "[OSP-16]
    [Downstream-Only] Don't provide 'arch-facilities' CPU f.eature to
    migration XML, to avoid live migration breakage from EL7 to EL8"

Comment 2 Kashyap Chamarthy 2021-01-27 15:00:08 UTC
Version details:

(NB: This is a nested KVM environment.)

Source
------

On the "host" (a level-1 guest running RHEL-7):

    kernel-3.10.0-1160.6.1.el7.x86_64
    microcode_ctl-2.1-73.2.el7_9.x86_64

    libvirt-daemon-kvm-4.5.0-36.el7_9.3.x86_64
    qemu-kvm-rhev-2.12.0-48.el7_9.1.x86_64

Destination
------------

On the "host" (a level-1 guest running RHEL-8):

    kernel-4.18.0-240.el8.x86_64
    microcode_ctl-20200609-2.el8.x86_64

    qemu-kvm-5.1.0-14.module+el8.3.0+8790+80f9c6d8.1.x86_64
    libvirt-daemon-kvm-6.6.0-7.1.module+el8.3.0+8852+b44fca9f.x86_64

                - - -

(Note: on both source and destination QEMU / libvirt are running within
the 'nova_libvirt' container running on respective "hosts".)

Comment 3 Kashyap Chamarthy 2021-01-27 17:28:47 UTC
Created attachment 1751325 [details]
Nova Compute log from the destination host that has the XML being passed to compareCPU(), the libvirt API

Comment 4 Daniel Berrangé 2021-01-27 17:29:48 UTC
In the compute log file for the target host we can find the guest CPU XML that is being checked at:

2021-01-26 22:37:31.346 7 DEBUG nova.virt.libvirt.driver [req-a141dd51-ffb9-46c8-a8e7-e37d2bf68422 19ec0130b8714aac8c64a5c2ee5b914b 352675f5f34d45d59bdd61fde58e4bd0 - default default] [instance: 462432e3-cd25-4c52-9c61-34aca9174bf4] cpu compare xml: <cpu>

save that to guest.xml

Now take the CPU from  virsh capabilities and  virs domcapbilities saving to hostcaps.xml and domcaps.xml respectively.

$ virsh cpu-baseline --features guest.xml > guest-full.xml
$ virsh cpu-baseline --features hostcaps.xml > hostcaps-full.xml
$ virsh cpu-baseline --features domcaps.xml > domcaps-full.xml


When I now compare them

 diff -u guest-full.xml hostcaps-full.xml  | grep -E '^(-|\+)'
--- guest-full.xml	2021-01-27 17:22:20.655831989 +0000
+++ hostcaps-full.xml	2021-01-27 17:22:27.262779417 +0000
-  <model fallback='forbid'>Skylake-Server-IBRS</model>
+  <model fallback='forbid'>Cascadelake-Server</model>
+  <feature policy='require' name='acpi'/>
+  <feature policy='require' name='arch-capabilities'/>
+  <feature policy='require' name='dca'/>
+  <feature policy='require' name='ds'/>
+  <feature policy='require' name='ds_cpl'/>
+  <feature policy='require' name='dtes64'/>
+  <feature policy='require' name='est'/>
-  <feature policy='require' name='hypervisor'/>
-  <feature policy='require' name='ibpb'/>
+  <feature policy='require' name='ht'/>
+  <feature policy='require' name='ibrs-all'/>
+  <feature policy='require' name='intel-pt'/>
+  <feature policy='require' name='invtsc'/>
+  <feature policy='require' name='mds-no'/>
+  <feature policy='require' name='monitor'/>
+  <feature policy='require' name='pbe'/>
+  <feature policy='require' name='pdcm'/>
+  <feature policy='require' name='rdctl-no'/>
+  <feature policy='require' name='skip-l1dfl-vmentry'/>
+  <feature policy='require' name='smx'/>
+  <feature policy='require' name='tm'/>
+  <feature policy='require' name='tm2'/>
-  <feature policy='require' name='umip'/>
+  <feature policy='require' name='tsx-ctrl'/>
+  <feature policy='require' name='xtpr'/>

Those three missing features are what will cause the virConnectCompareCPU method to return failure.

If we meanwhile compare against the domcaps

$ diff -u guest-full.xml domcaps-full.xml  | grep -E '^(-|\+)'
--- guest-full.xml	2021-01-27 17:22:20.655831989 +0000
+++ domcaps-full.xml	2021-01-27 17:22:32.321739161 +0000
-  <model fallback='forbid'>Skylake-Server-IBRS</model>
+  <model fallback='forbid'>Cascadelake-Server</model>
+  <feature policy='require' name='amd-ssbd'/>
+  <feature policy='require' name='arch-capabilities'/>
+  <feature policy='require' name='ibrs-all'/>
+  <feature policy='require' name='invtsc'/>
+  <feature policy='require' name='mds-no'/>
+  <feature policy='require' name='pschange-mc-no'/>
+  <feature policy='require' name='rdctl-no'/>
+  <feature policy='require' name='skip-l1dfl-vmentry'/>
+  <feature policy='require' name='tsx-ctrl'/>


we see full compatibility.


This difference reflects the design limitations of the original  virConnectCompareCPU() API that Nova is using. This API compares against the host physical CPUID. There are features in this CPUID that KVM doesn't expose, and there are also features KVM emulates which are not in the host CPUID.  The latter is what's causing the problem.

If Nova simply didn't call virConnectCompareCPU at all, then libvirt would do the CPU comparison itself during migration and "do the right thing".

If Nova absolutely must do a CPU comparison itself, then it needs to change to use virConnectCompareHypervisorCPU instead which reflects the CPUID that KVM is actually able to expose

Comment 5 Kashyap Chamarthy 2021-01-27 17:53:25 UTC
Thanks for the detailed comment, Dan.

I agree, currently Nova's usage of compareCPU() and baselineCPU() is 
outdated.  And it should switch to compareHypervisorCPU() and
baselineHypervisorCPU() APIs.


FWIW, that is what I've outlined in the design of this Nova spec here[1]

Where the commit message does recognize the problem:

    Make Nova's guest CPU selection approach more effective and reliable
    by introducing two new QEMU- and libvirt-based CPU configuration
    APIs: baselineHypervisorCPU() and compareHypervisorCPU().  These new
    APIs are more "hypervisor-literate" compared to the existing libvirt
    APIs that Nova uses.  As in, the new APIs take into account what the
    "host hypervisor" (meaning: KVM, QEMU, and what libvirt knows about
    the host) is capable of.

    Taking advantage of these newer APIs will allow Nova to make more
    well-informed decisions when determining CPU models that are
    compatible across different hosts.

And there's WIP to that that end[2] here.  I'll work with upstream Nova
to accelerate it.

[1] Add "CPU selection with hypervisor consideration" spec —
    https://opendev.org/openstack/nova-specs/commit/70811da221035044e27

[2] https://review.opendev.org/c/openstack/nova/+/762330/ — CPU
    selection with hypervisor consideration

Comment 6 Kashyap Chamarthy 2021-01-27 18:40:13 UTC
Created attachment 1751339 [details]
guest.xml (generated by putting the <cpu> </cpu> elements from the nova-compute.log.1 at the timestamp 2021-01-26 22:37:31.346)

Comment 7 Kashyap Chamarthy 2021-01-27 18:42:01 UTC
Created attachment 1751342 [details]
domcaps.xml from L0

The domcaps.xml is generated by taking the 'arch', 'model', 'vendor' and 'feature' (only from the CPU) from the `virsh domcapabilities` on the baremetal host (L0), and putting it all under a <cpu> element in domcaps.xml

Comment 8 Kashyap Chamarthy 2021-01-27 18:46:56 UTC
Created attachment 1751345 [details]
hostcaps.xml from L0

The hostcaps.xml is generated by running `virsh domcapabilities` on the L0 host.

               - - -

(With these three files — guest.xml, domcaps.xml, and hostcaps.xml — you can now reconstruct the `virsh cpu-baseline` results and the `diffs`s on any machine based on what Dan describes in comment#4.)

Comment 9 Jiri Denemark 2021-01-27 22:41:12 UTC
Oops, I forgot to send my comment and Daniel explained it already. The only
thing I have to add is a link to bug 1611845 in which we were discussing the
same issue and switching to the *HypervisorCPU APIs was suggested.

Comment 10 Kashyap Chamarthy 2021-01-28 16:40:02 UTC
For what it's worth, I've got a patch here that removes the compareCPU() check on the destination, and let libvirt do the right thing:

https://review.opendev.org/c/openstack/nova/+/772917 — [WIP] libvirt: Remove compareCPU() check on the destination

Comment 12 Lukas Bezdicka 2021-01-29 22:14:13 UTC
With the patch I have failure on source:

2021-01-29 22:00:26.661 9 ERROR nova.virt.libvirt.driver [-] [instance: d84c2601-201e-4e06-9cb7-debf06c66ed7] Live Migration failure: operation failed: guest CPU doesn't match specification: missing features: hle,rtm: libvirt.libvirtError: operation failed: guest CPU doesn't match specification: missing features: hle,rtm

Comment 13 Jiri Denemark 2021-02-01 10:13:10 UTC
(In reply to Lukas Bezdicka from comment #12)
> libvirt.libvirtError: operation failed: guest CPU doesn't match
> specification: missing features: hle,rtm

This is most likely caused by trying to migrate a domain from host with TSX
enabled to a host with TSX disabled.

Comment 14 Kashyap Chamarthy 2021-02-01 11:08:48 UTC
(In reply to Lukas Bezdicka from comment #12)
> With the patch I have failure on source:
> 
> 2021-01-29 22:00:26.661 9 ERROR nova.virt.libvirt.driver [-] [instance:
> d84c2601-201e-4e06-9cb7-debf06c66ed7] Live Migration failure: operation
> failed: guest CPU doesn't match specification: missing features: hle,rtm:
> libvirt.libvirtError: operation failed: guest CPU doesn't match
> specification: missing features: hle,rtm

Okay, that error is unrelated to the original problem.


As for the error you're seeing, I just learnt from KVM developers that what you're hitting is due to a different reason -- RHEL-8.3 has disabled TSX (which disables those two CPU features: 'hle', and 'rtm').

https://bugzilla.redhat.com/show_bug.cgi?id=1828642 — kernel: Disable Intel TSX by default on newer CPUs 

The only workaround in this case (where you're using 'host-model' — Nova defaults to this) is to temporarily turn on TSX on the RHEL-8.3 kernel command-line, in /etc/default/grub

    GRUB_CMDLINE_LINUX_DEFAULT="[...] tsx=on"

Comment 15 Paolo Bonzini 2021-02-01 11:23:37 UTC
Note that host-model only provides *safety* for live migration: if migration succeeds, the guest will run correctly and will have no ABI change.  Host-passthrough does not provide safety, that is it's up to the administrator to ensure that the source and destination hosts are not identical in both hardware, kernel/QEMU version, microcode version and configuration.

However, even for host-model there is no guarantee that migration succeeds if there are differences between source and destination hosts (again, for any of hardware, kernel version, microcode version and configuration).  _Usually_ old->new works, but not always.  In the past Intel has disabled features in microcode updates (which you could get just by updating your destination host to a more recent RHEL minor release).  More rarely features were disabled in newer processor generations.  Regarding software and configuration, migrating to a newer QEMU should always be safe, but bug 1828642 is one case where a newer kernel version changes the defaults and makes it impossible to migrate to a newer kernel (without manually reverting the configuration changes).

Comment 16 Daniel Berrangé 2021-02-01 11:30:36 UTC
(In reply to Paolo Bonzini from comment #15)
> Note that host-model only provides *safety* for live migration: if migration
> succeeds, the guest will run correctly and will have no ABI change. 
> Host-passthrough does not provide safety, that is it's up to the
> administrator to ensure that the source and destination hosts are not
> identical in both hardware, kernel/QEMU version, microcode version and
> configuration.
> 
> However, even for host-model there is no guarantee that migration succeeds
> if there are differences between source and destination hosts (again, for
> any of hardware, kernel version, microcode version and configuration). 

Note host-model is just a  syntax sugar around a named CPU model.

So essentially this is saying that there is no guarantee of forwards
migration for any CPU model. This rather compromises the main point
of using a named CPU model / host-model.

Obviously if the hardware has disabled a feature (due to microcode update)
then there's usually nothing the software stack can do to fix that.

If the hardware still supports the feture (because the user intentionally
didn't install the microcode which breaks compat), then IMHO it is unreasonable
for the kernel to then intentionally break forwards compatibility out of the
box within a minor y-stream update.

Comment 18 Kashyap Chamarthy 2021-02-01 13:23:44 UTC
For what it's worth, I've filed this:

    https://bugzilla.redhat.com/show_bug.cgi?id=1923118
    — [kernel] "redhat/configs: Change Intel TSX default to off" breaks live migration of KVM guests

I'm not expecting a revert here; but I filed it for the sake of discussion.

Comment 19 Kashyap Chamarthy 2021-02-02 15:37:58 UTC
I completely forgot: last year, we _had_ similar report upstream Nova of failing live migration with Cascade Lake CPU as a destination, and we went with this band-aid:

    https://review.opendev.org/c/openstack/nova/+/757577
    — Handle disabled CPU features to fix live migration failures


commit eeeca4ceff576beaa8558360c8a6a165d716f996
Author: Andrew Bonney <andrew.bonney.uk>
Date:   Tue Oct 6 14:42:38 2020 +0100

    Handle disabled CPU features to fix live migration failures
    
    When performing a live migration between hypervisors running
    libvirt, where one or more CPU features are disabled, nova does
    not take account of these. This results in migration failures
    as none of the available hypervisor targets appear compatible.
    
    This patch ensures that the libvirt 'disable' poicy is taken
    account of, at least in a basic sense, by explicitly ignoring
    items flagged in this way when enumerating CPU features.
    
    Closes-Bug: #1898715
    Change-Id: Iaf14ca97cfac99dd280d1114123f2d4bb6292b63

Comment 36 errata-xmlrpc 2021-09-15 07:11:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:3483


Note You need to log in before you can comment on or make changes to this bug.