Bug 2122283

Summary: On INTEL nodes we advertising that is AMD CPU compatible
Product: Container Native Virtualization (CNV) Reporter: Akriti Gupta <akrgupta>
Component: VirtualizationAssignee: Barak <bmordeha>
Status: CLOSED ERRATA QA Contact: Kedar Bidarkar <kbidarka>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.11.0CC: dshchedr, jdenemar, mprivozn, ryasharz, sgott, vromanso
Target Milestone: ---Flags: bmordeha: needinfo? (vromanso)
Target Release: 4.14.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-07-25 15:00:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Akriti Gupta 2022-08-29 17:27:59 UTC
Description of problem: on INTEL nodes, we see 
cpu-model.node.kubevirt.io/Opteron_G1=true  and 
cpu-model.node.kubevirt.io/Opteron_G2=true , which is a AMD CPU model


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. oc describe node <node-name>
2.
3.

Actual results:
cpu-model.node.kubevirt.io/Opteron_G1=true  and 
cpu-model.node.kubevirt.io/Opteron_G2=true

Expected results:
we do not see Opteron_G1 and Opteron_G2 in the common cpu list

Additional info: Along with INTEL CPU models, we mentioning Opteron cpu model

Comment 1 Kedar Bidarkar 2022-08-29 18:59:51 UTC
[kbidarka@localhost auth]$ oc describe node node-11.redhat.com | grep "cpu-model.node" 
                    cpu-model.node.kubevirt.io/Broadwell-noTSX=true
                    cpu-model.node.kubevirt.io/Broadwell-noTSX-IBRS=true
                    cpu-model.node.kubevirt.io/Cascadelake-Server-noTSX=true
                    cpu-model.node.kubevirt.io/Haswell-noTSX=true
                    cpu-model.node.kubevirt.io/Haswell-noTSX-IBRS=true
                    cpu-model.node.kubevirt.io/IvyBridge=true
                    cpu-model.node.kubevirt.io/IvyBridge-IBRS=true
                    cpu-model.node.kubevirt.io/Nehalem=true
                    cpu-model.node.kubevirt.io/Nehalem-IBRS=true
                    cpu-model.node.kubevirt.io/Opteron_G1=true
                    cpu-model.node.kubevirt.io/Opteron_G2=true
                    cpu-model.node.kubevirt.io/Penryn=true
                    cpu-model.node.kubevirt.io/SandyBridge=true
                    cpu-model.node.kubevirt.io/SandyBridge-IBRS=true
                    cpu-model.node.kubevirt.io/Skylake-Client-noTSX-IBRS=true
                    cpu-model.node.kubevirt.io/Skylake-Server-noTSX-IBRS=true
                    cpu-model.node.kubevirt.io/Westmere=true
                    cpu-model.node.kubevirt.io/Westmere-IBRS=true
[kbidarka@localhost auth]$ oc describe node node-11.redhat.com | grep "Intel"
                    cpu-vendor.node.kubevirt.io/Intel=true


Opteron appears to be AMD CPU, https://en.wikipedia.org/wiki/Opteron

Comment 3 Kedar Bidarkar 2022-09-01 11:00:58 UTC
If we set the cpu.model = "Opteron_G2" , we see the below error message.


[kbidarka@localhost auth]$ oc get vmi -A
NAMESPACE                                              NAME                                                 AGE     PHASE       IP            NODENAME                                         READY
virt-migration-and-maintenance-test-node-maintenance   rhel8-template-node-maintenance-1661787579-7115862   3m45s   Scheduled   xx.yy.zz.aa   node-13.redhat.com   False

[kbidarka@localhost auth]$ oc describe vmi rhel8-template-node-maintenance-1661787579-7115862 -n virt-migration-and-maintenance-test-node-maintenance | grep "Events"
Events:
  Type     Reason            Age                  From                         Message
  ----     ------            ----                 ----                         -------
  Normal   SuccessfulCreate  109s                 disruptionbudget-controller  Created PodDisruptionBudget kubevirt-disruption-budget-skxs8
  Normal   SuccessfulCreate  108s                 virtualmachine-controller    Created virtual machine pod virt-launcher-rhel8-template-node-maintenance-1661787579-7fjb94
  Warning  SyncFailed        16s (x16 over 100s)  virt-handler                 server error. command SyncVMI failed: "LibvirtError(Code=91, Domain=31, Message='the CPU is incompatible with host CPU: Host CPU does not provide required features: svm')"

Comment 4 Barak 2022-09-06 14:17:03 UTC
Hey @Jiri,

I looked into this bug and i think that it's related to libvirt/qemu.
The way we get the usable cpu models in kubevirt and it's though :

`virsh domcapabilities --machine q35 --arch x86_64 --virttype kvm`

in the `custom` list i saw:

...
    </mode>
    <mode name='custom' supported='yes'>
...
      <model usable='yes'>Opteron_G2</model>
...

although in `cat /usr/share/libvirt/cpu_map/x86_Opteron_G2.xml`
i see
<cpus>
  <model name='Opteron_G2'>
...
    <feature name='svm'/>
...
  </model>
</cpus>


and i don't have `svm`.

i saw in:
https://github.com/libvirt/libvirt/blob/3aa7c75fec9d92eb77cf900930503a057e1310cb/src/qemu/qemu_monitor_json.c#L4867

that libvirt use `query-cpu-definitions` command to check which features are missing(if there are missing features)

after running :

qemu-system-x86_64 -qmp tcp:127.0.0.1:12345,server,nowait  &

nc localhost 12345

{ "execute": "qmp_capabilities" }

{ "execute": "query-cpu-definitions", "arguments": {} }


i got :
...{"name": "Opteron_G2", "typename": "Opteron_G2-x86_64-cpu", "unavailable-features": [], "alias-of": "Opteron_G2-v1", "static": false, "migration-safe": true, "deprecated": false}...

As you can see the unavailable-features list is empty although i don't have `svm` feature which is required feature by `cat /usr/share/libvirt/cpu_map/x86_Opteron_G2.xml` but not by domcapabilities

I think that there is a synchronization problem between libvirt and qemu
Do we really need svm for Opteron_G2?
I saw that the same thing happens for other cpuModels that require `svm` and heard that there is a similar problem with `rdtscp`

Am i missing something?

Comment 5 Barak 2022-09-06 14:17:32 UTC
Hey @Jiri,

I looked into this bug and i think that it's related to libvirt/qemu.
The way we get the usable cpu models in kubevirt and it's though :

`virsh domcapabilities --machine q35 --arch x86_64 --virttype kvm`

in the `custom` list i saw:

...
    </mode>
    <mode name='custom' supported='yes'>
...
      <model usable='yes'>Opteron_G2</model>
...

although in `cat /usr/share/libvirt/cpu_map/x86_Opteron_G2.xml`
i see
<cpus>
  <model name='Opteron_G2'>
...
    <feature name='svm'/>
...
  </model>
</cpus>


and i don't have `svm`.

i saw in:
https://github.com/libvirt/libvirt/blob/3aa7c75fec9d92eb77cf900930503a057e1310cb/src/qemu/qemu_monitor_json.c#L4867

that libvirt use `query-cpu-definitions` command to check which features are missing(if there are missing features)

after running :

qemu-system-x86_64 -qmp tcp:127.0.0.1:12345,server,nowait  &

nc localhost 12345

{ "execute": "qmp_capabilities" }

{ "execute": "query-cpu-definitions", "arguments": {} }


i got :
...{"name": "Opteron_G2", "typename": "Opteron_G2-x86_64-cpu", "unavailable-features": [], "alias-of": "Opteron_G2-v1", "static": false, "migration-safe": true, "deprecated": false}...

As you can see the unavailable-features list is empty although i don't have `svm` feature which is required feature by `cat /usr/share/libvirt/cpu_map/x86_Opteron_G2.xml` but not by domcapabilities

I think that there is a synchronization problem between libvirt and qemu
Do we really need svm for Opteron_G2?
I saw that the same thing happens for other cpuModels that require `svm` and heard that there is a similar problem with `rdtscp`

Am i missing something?

Comment 6 Jiri Denemark 2022-09-07 11:04:49 UTC
The CPU definition in cpu_map XML are only used for checking
compatibility of a domain XML with a host CPU. We cannot change the
definitions in cpu_map because we need all libvirt versions to share the
same definition, otherwise migration between non-matching libvirt
releases might break.

But the cpu_map definitions don't really matter when a guest is actually
started. So the following XML

    <cpu mode='custom' match='exact' check='partial'>
        <model>Opteron_G2</model>
    </cpu>

will translate to -cpu Opteron_G2 passed to QEMU and since it dropped
svm from its CPU models, svm won't be enabled. But that's not an issue
as libvirt will detect it and change the live XML of the domain to
contain

    <feature name='svm' policy='disable'/>

to make sure we don't expect svm to be enabled during migration. And
similarly for all features QEMU does not enabled when asked for
Opteron_G2.

It would be an issue with check='full' since that's telling libvirt not to
accept any disabled or enabled features compared to domain XML and this
comparison uses CPU model definitions from the cpu_map definition. But it's
generally discouraged to use check='full' when starting a domain (once started
libvirt will change the check attribute to 'full' anyway to make sure the CPU
does not change during migration).

But looking at the earlier comments I see this is happening on an Intel host
which does not provide svm and thus the libvirt compatibility check which uses
the cpu_map definition will fail. A workaround would be to use check='none' in
the domain XML, which is safe as long as a CPU model marked as usable='yes' is
used.

Although the actual problem here is that Opteron_G2 is an AMD CPU model and it
is marked as usable on an Intel host. Which is caused by QEMU removing the svm
feature which would make it incompatible by any Intel host. The question is
what to do now. QEMU correctly says Opteron_G2 is usable on the Intel host as
it can provide all the required CPU features (although mixing CPU vendors can
sometimes have strange consequences especially when migrating from one vendor
to another). I guess we could add a vendor='...' attribute to the domain
capabilities XML so that users can chose to filter usable CPU models based on
their vendor. Alternatively we could add a flag telling libvirt to do this
filtering, but I think the additional attribute is actually better.

Comment 7 Barak 2022-09-07 12:38:35 UTC
@jiri dene

Hey Jiri,
Thanks for the clarification.

But i think that it would be a problem for us to set check='partial'.

For instance :
If we will start a Virtual Machine with cpuModel: `Opteron_G2` 
and it will land on a node with `svm` than we won't be able to migrate
to other node that does't have `svm`(Please correct me if i'm wrong).

We would prefer to stick with well defined set of features for named models
as it is the "safe" configuration for a Virtual Machine in KubeVirt
(although the default configuration is host-model)
When users set named model we avoid migration(scheduling) with 
consideration to features.

Is it possible to tell libvirt to avoid enabling features that aren't
required by the custom list in `virsh domcapabilities --machine q35 --arch x86_64 --virttype kvm`
even if the host support them?

Comment 8 Jiri Denemark 2022-09-07 15:14:54 UTC
Well, if you want to use check='full' you cannot use just a named model, you'd
need to list additional features (either disabled or enabled) to make sure
QEMU does not enable or disable anything unexpected. That's the only way to
avoid this automatic behavior. As long as you actually want to start the
domain. Otherwise check='full' will of course not allow QEMU to enable or
disable anything unexpected, but we can only do so by reporting an error it
happens.

That said check='partial' or even check='none' for CPU models listed as
usable='yes' is generally safe because the features QEMU will enable or
disable depend on machine type. So as long as you use the same machine type,
you get the same guest CPU regardless on which host you start it on. But 'svm'
seems to be different as it is included in QEMU's definition of CPU models and
disabled in runtime (I don't really know the exact conditions, though). So
what you could do to avoid this behavior is to always disable svm (if you
don't need it of course):

    <cpu mode='custom' match='exact' check='partial'>
      <model fallback='forbid'>....</model>
      <feature name='svm' policy='disable'/>
    </cpu>

This will make sure this particular problematic feature will never be enabled
even if QEMU would otherwise do so.

Comment 9 Barak 2022-09-08 07:30:29 UTC
@jdenemar 
So just to make sure i understood you correctly:

if the `custom` list output of
`virsh domcapabilities --machine q35 --arch x86_64 --virttype kvm`

contain SomeModel with <model usable='yes'>SomeModel</model> 
this mean that the host has all the required features for SomeModel
that we must have to run the domain.

But QEMU might enable more features by default(like `svm`) than just 
the features that we must have.  
And libvirt will disable them automatically if we will set
check='partial'\'none' but not with check='full'.


All the feature that we must have + the features that QEMU will 
enable by default are in `/usr/share/libvirt/cpu_map/SomeModel.xml`
that is being used only with host-model.

Libvirt cannot change this behavior because of backward compatibility reasons.

Did i get it right?

Comment 10 Jiri Denemark 2022-09-08 13:06:03 UTC
(In reply to Barak from comment #9)
> if the `custom` list output of
> `virsh domcapabilities --machine q35 --arch x86_64 --virttype kvm`
> 
> contain SomeModel with <model usable='yes'>SomeModel</model> 
> this mean that the host has all the required features for SomeModel
> that we must have to run the domain.

Right.

> But QEMU might enable more features by default(like `svm`) than just 
> the features that we must have.  

Not really. The features added or disabled by QEMU are mostly caused by the
difference between the CPU model definitions in QEMU and libvirt. In other
words if libvirt's definition of model 'A' has 'foo' feature and doesn't have
'bar' feature, but the definition in QEMU has 'bar' and not 'foo' you would
see 'foo' as disabled and 'bar' as enabled once the domain starts. QEMU would
also automatically disable any feature that is included in the selected model
but unavailable on the host, but this should not happen when usable='yes' as
this means the list of required unavailable features is empty.

'svm' is strange. AMD CPU models in QEMU are defined with 'svm' enabled, but QEMU
apparently disables it. Not sure if it's based on a machine type, host
availability, or something else. For example, on my AMD host I get 'svm'
enabled with "-cpu host" (mode='host-passthrough' equivalent), but it is
disabled with "-cpu Opteron_G2" even though the definition of this model in
QEMU includes 'svm'.

> And libvirt will disable them automatically if we will set
> check='partial'\'none' but not with check='full'.

Libvirt doesn't disable anything. We just pass whatever you ask for in domain
XML to QEMU and check the result. With check='partial' we first check the
given CPU configuration is runnable on the host using our definition of the
selected CPU model (this is the reason Opteron_G2 fails to start on an Intel
host). But with check='none' we don't do any checks before starting the
domain. In both cases if QEMU starts successfully we check what virtual CPU it
created and record any differences from the domain XML using our definition of
the CPU model. The check='full' means that libvirt will abort the start in
case the difference between virtual CPU provided by QEMU and domain XML is
non-empty.

> All the feature that we must have + the features that QEMU will 
> enable by default are in `/usr/share/libvirt/cpu_map/SomeModel.xml`
> that is being used only with host-model.

The CPU definition in our cpu_map is identical to that of QEMU at the time we
added the CPU model. But since then QEMU might have changed the definition
(for new machine types to keep compatibility with existing configs), but we
cannot change it because we need to make sure all libvirt releases have the
same definition of each CPU model (there are rare exceptions, but that's not
important here) otherwise our checks could provide incorrect results.

But all features in a each CPU model are required. The observed behavior of
extra features being enabled is caused by the difference in model definitions.
On the other hand QEMU will happily disable any requested feature (either via
named CPU model or explicitly) if it's not available on the host. And we would
tolerate this or not depending on the value of the 'check' attribute.

> Libvirt cannot change this behavior because of backward compatibility
> reasons.

Right, you don't want your domains to suddenly stop working when libvirt is
upgraded while the host configuration and domain XML stayed the same. And even
worse, we absolutely have to maintain ABI compatibility when migrating a
domain to a different host as the guest could easily crash otherwise.

Comment 11 Jiri Denemark 2022-10-04 14:39:35 UTC
Libvirt patches introducing vendor='...' attribute for <model> elements in
domain capabilities XML have just been sent upstream for review:

https://listman.redhat.com/archives/libvir-list/2022-October/234661.html

The new attribute can be used to filter usable CPU models based on host CPU
vendor. In case you'd find these patches useful and want to have them
backported to a particular RHEL release (they should automatically appear in
RHEL 9.2), please file a libvirt BZ.

Comment 12 Michal Privoznik 2022-11-22 12:54:04 UTC
And they landed in the repo too:

190486519a NEWS: Document CPU reporting improvements
ce8d025be8 virsh: Add completer for hypervisor-cpu-baseline --model
268a2708c4 virsh: Add --model option for hypervisor-cpu-baseline
b0ff3af412 qemu_capabilities: Translate CPU blockers
b9db1ec17d Document specifics of virConnectBaselineHypervisorCPU
d4975a98b6 docs: Enhance documentation of CPU models in domain caps
ed51d2b606 cpu_arm: Don't implement virCPUGetVendorForModel
e8efe42409 cpu_ppc64: Implement virCPUGetVendorForModel
311e21ad32 cpu_x86: Implement virCPUGetVendorForModel
bbd2d9cb40 Introduce virCPUGetVendorForModel and use it in QEMU driver
2784a83907 domain_capabilities: Add vendor attribute for CPU models
6f927dce93 qemu: Do not pass qemuCaps to virQEMUCapsCPUFeature{To,From}QEMU
0cc8e87520 cpu_ppc64: Avoid repeated loading of CPU map
f0554d88fb conf: virDomainCapsCPUModelsAdd never fails

They are contained in the libvirt-8.9.0 release.

Comment 21 errata-xmlrpc 2023-07-25 15:00:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 4.11.5 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4271