Bug 1765445 - Cmd "virsh Hypervisor-cpu-compare" outputs wrong result with VM's active dumpxml as input because of topoext [NEEDINFO]
Summary: Cmd "virsh Hypervisor-cpu-compare" outputs wrong result with VM's active dump...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: yalzhang@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-25 06:32 UTC by jiyan
Modified: 2023-04-05 23:23 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:
jdenemar: needinfo? (dgilbert)


Attachments (Terms of Use)

Description jiyan 2019-10-25 06:32:01 UTC
Description of problem:
Cmd "virsh Hypervisor-cpu-compare" outputs wrong result with VM's active dumpxml as input because of topoext

Version-Release number of selected component (if applicable):
qemu-kvm-4.1.0-10.module+el8.1.0+4234+33aa4f57.x86_64
libvirt-5.6.0-5.virtcov.el8.x86_64
kernel-4.18.0-141.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Check domcapabilities, there is no topoext cpu feature here
# virsh domcapabilities
...
  <cpu>
    <mode name='host-passthrough' supported='yes'/>
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>EPYC-IBPB</model>
      <vendor>AMD</vendor>
      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='tsc-deadline'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='cmp_legacy'/>
      <feature policy='require' name='perfctr_core'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='virt-ssbd'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='disable' name='monitor'/>
      <feature policy='disable' name='svm'/>
    </mode>

2. Prepare a shutdown VM with the following conf
# virsh domstate test_1 
shut off

# virsh dumpxml test_1 --inactive |grep "<cpu" -A2
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
  </cpu>

4. Start VM and check active VM's dumpxml, topoext is enabled here
# virsh start test_1
Domain test_1 started

# virsh dumpxml test_1  |grep "<cpu" -A20
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC-IBPB</model>
    <vendor>AMD</vendor>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='tsc-deadline'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='cmp_legacy'/>
    <feature policy='require' name='perfctr_core'/>
    <feature policy='require' name='virt-ssbd'/>
    <feature policy='require' name='skip-l1dfl-vmentry'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='svm'/>
    <feature policy='require' name='topoext'/>
  </cpu>

5. Use the active VM's dumpxml as input of "virsh hypervisor-cpu-compare"
# virsh dumpxml test_1 > test_1.xml

# virsh hypervisor-cpu-compare test_1.xml 
CPU described in test_1.xml is incompatible with the CPU provided by hypervisor on the host

Actual results:
As step-5 shows

Expected results:
Since VM can start successfully on this host, I think the CPU conf should not be incompatible. 

Additional info:
1> This issue can be reproduced on both RHEL-8.1 AV and RHEL-8.1.
2> The reproducing step above is for RHEL-8.1 AV, and the reproducing steps for RHEL-8.1 can be seen in this link: https://bugzilla.redhat.com/show_bug.cgi?id=1619798#c9

Comment 3 yalzhang@redhat.com 2021-04-08 02:56:57 UTC
Guest can not start with <cpu mode='host-model' check='full'/> on EPYC-IBPB system
# rpm -q libvirt-libs qemu-kvm
libvirt-libs-7.0.0-12.module+el8.4.0+10596+32ba7df3.x86_64
qemu-kvm-5.2.0-14.module+el8.4.0+10425+ad586fa5.x86_64

# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  2
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        8
Vendor ID:           AuthenticAMD
BIOS Vendor ID:      AMD
CPU family:          23
Model:               1
Model name:          AMD EPYC 7601 32-Core Processor
BIOS Model name:     AMD EPYC 7601 32-Core Processor   
...
2. Prepare a vm with below cpu setting:
<cpu mode='host-model' check='full'/>

3. Try to start the vm, it failed to start
# virsh start rhel
error: Failed to start domain 'rhel'
error: operation failed: guest CPU doesn't match specification: extra features: topoext

Comment 4 John Ferlan 2021-09-08 13:30:23 UTC
Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 5 yalzhang@redhat.com 2022-04-22 09:01:50 UTC
I have tested on latest libvirt-8.2.0-1.el9.x86_64 on host with EPYC-Milan cpu, the issue in comment 0 can still be reproduced.

Comment 6 Jiri Denemark 2022-11-16 12:50:45 UTC
So apparently -cpu host does not enable topoext and thus libvirt does not show
it in the host-model definition in domain capabilities. But once libvirt
starts QEMU with that cpu-model, topoext is enabled by QEMU. Most likely
because the QEMU definition of EPYC-IBPB contains topoext. Thus it appears in
the live XML as

    <feature policy='require' name='topoext'/>

Checking this CPU definition with hypervisor-cpu-compare fails because libvirt
thinks the host cannot provide topoext (as it was not reported as enabled by
QEMU for -cpu host).

So it looks like topoext is just another magic feature which is only enabled
in some cases? But I'm not sure why would -cpu host refuse to enable it when a
named CPU model enables it by itself without an explicit request.

Comment 7 yalzhang@redhat.com 2023-04-03 05:32:10 UTC
The bug can still be reproduced on rhel 9.2 with libvirt-9.0.0-10.el9_2.x86_64 and qemu-kvm-7.2.0-11.el9_2.x86_64.
Since there is no explicite resolution for it, extend the stale date with 6M.


Note You need to log in before you can comment on or make changes to this bug.