Bug 1850614 - hypervisor-cpu-compare succeeds on invalid cpu definition
Summary: hypervisor-cpu-compare succeeds on invalid cpu definition
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.3
Hardware: All
OS: Linux
medium
high
Target Milestone: rc
: 8.4
Assignee: Tim Wiederhake
QA Contact: Jing Qi
URL:
Whiteboard:
Depends On: 1660904
Blocks: 1796871
TreeView+ depends on / blocked
 
Reported: 2020-06-24 14:49 UTC by smitterl
Modified: 2021-05-25 06:42 UTC (History)
13 users (show)

Fixed In Version: libvirt-6.10.0-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-25 06:42:15 UTC
Type: Bug
Target Upstream Version: 6.9.0
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 186438 0 None None None 2020-06-25 09:52:54 UTC
Red Hat Bugzilla 1607287 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Bugzilla 1660904 1 None None None 2023-06-12 10:37:24 UTC

Internal Links: 1850680

Description smitterl 2020-06-24 14:49:43 UTC
Description of problem:
<cpu/> is parsed incorrectly by libvirt

Version-Release number of selected component (if applicable):
libvirt-client-6.0.0-24.module+el8.3.0+7105+0cc49779.x86_64
libvirt-client-6.0.0-24.module+el8.3.0+7105+0cc49779.s390x

How reproducible:
100%


Steps to Reproduce:
1. virsh domcapabilties > domcaps.xml
2a. copy //cpu from domcaps.xml to cpu.xml (sample xml attached)
OR
2b. create cpu.xml with "<cpu><featur name="aes"/></cpu>" (invalid element "featur")
3. virsh hypervisor-cpu-compare cpu.xml

Actual results:
>>
The CPU provided by hypervisor on the host is a superset of CPU described in cpu_from_domcaps.xml


Expected results:
>>
error: File 'cpu.xml' does not contain a valid <cpu> element or valid domain XML, host capabilities XML, or domain capabilities XML

Additional info:
On s390x, the command correctly fails; however the error message isn't very clear:
error: Failed to compare hypervisor CPU with cpu.xml
error: internal error: unable to execute QEMU command 'query-cpu-model-comparison': Invalid parameter type for 'modelb.name', expected: string

Comment 1 smitterl 2020-06-24 15:37:48 UTC
(In reply to smitterl from comment #0)
> Description of problem:
> 2a. copy //cpu from domcaps.xml to cpu.xml (sample xml attached)
> OR
> 2b. create cpu.xml with "<cpu><featur name="aes"/></cpu>" (invalid element

Actually, there's a 2c.:
virsh dumpxml vm > cpu.xml
virsh hypervisor-cpu-compare cpu.xml
The CPU provided by hypervisor on the host is a superset of CPU described in /root/avocado-vt-vm1

That is, there must not even be a toplevel <cpu> if I understand correctly.

Comment 2 smitterl 2020-06-24 15:38:57 UTC
> The CPU provided by hypervisor on the host is a superset of CPU described in
> /root/avocado-vt-vm1
c&p error, should say "The CPU provided by hypervisor on the host is a superset of CPU described in cpu.xml"

Comment 3 IBM Bug Proxy 2020-06-25 22:40:48 UTC
------- Comment From Collin.Walling 2020-06-25 18:32 EDT-------
The XML to CPU definition parser in libvirt seems to ignore any erroneous tags, instead of reporting an error.

When copying the CPU definition from domcapabilities, then modifying one of the feature tags, the parser will ignore that erroneous feature tag. When comparing this CPU to the hypervisor/host CPU, the result will be reported as superset.

As for the s390x error message observed, I have proposed a fix for https://bugzilla.redhat.com/show_bug.cgi?id=1850680. With this fix in place, I now see the same message as observed on x86.

Comment 4 Tim Wiederhake 2020-09-16 13:39:41 UTC
Proposed patch https://www.redhat.com/archives/libvir-list/2020-September/msg00933.html

Comment 5 IBM Bug Proxy 2020-09-17 08:31:49 UTC
------- Comment From Collin.Walling 2020-09-16 15:01 EDT-------
Thanks for pointing out those patches.

It looks like the root cause with libvirt not catching an erroneous value within the user-provided XML file is due to the fact that the file is not ran against an XML schema. A schema is ran against commands such as virsh define and edit, which can catch any unrecognised elements (such as a typo).

This might be beyond what IBM can fix, as this is something the libvirt team would have to implement -- hopefully the conversation on the list steers toward that direction.

I'll keep an eye on the progress.

(Please note that the patches I mentioned in a previous comment here relate to a different bugzilla. Apologies for the confusion)

Comment 6 Tim Wiederhake 2020-09-23 07:32:35 UTC
Proposed patch https://www.redhat.com/archives/libvir-list/2020-September/msg01073.html

Comment 8 Tim Wiederhake 2020-10-07 10:32:04 UTC
Merged upstream: https://gitlab.com/libvirt/libvirt/-/commit/9702659807157648a45c5928626a363ae3489cc9

$ cat cpu.xml
<cpu><featur name="aes"/></cpu>
$ ./run tools/virsh hypervisor-cpu-compare cpu.xml
The CPU provided by hypervisor on the host is a superset of CPU described in cpu.xml
$ ./run tools/virsh hypervisor-cpu-compare cpu.xml --validate
error: Failed to compare hypervisor CPU with cpu.xml
error: XML document failed to validate against schema: Unable to validate doc against /home/twiederh/git/libvirt/docs/schemas/cpu.rng
Element cpu has extra content: featur
Did not expect element featur there

Comment 10 Jing Qi 2020-10-20 06:43:10 UTC
Tested with upstream libvirt -
build/src/libvirtd (libvirt) 6.9.0

Test result as below -

Scenario 1:
# cat cpu.xml
<cpu><feature name="aes"/></cpu>

#./build/run build/tools/virsh hypervisor-cpu-compare /cpu.xml --validate
2020-10-20 06:32:22.688+0000: 148270: error : virXMLValidatorValidate:1290 : XML document failed to validate against schema: Unable to validate doc against /root/libvirt/docs/schemas/cpu.rng
Element feature failed to validate attributes
Extra element feature in interleave
Element cpu failed to validate content
Did not expect element feature there

Scenario 2:

#cat /cpu1.xml
<cpu mode='host-model'>
      <model fallback='forbid'>EPYC-IBPB</model>
      <vendor>AMD</vendor>
      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='pschange-mc-no'/>
      <feature policy='disable' name='monitor'/>
      <feature policy='disable' name='svm'/>
</cpu>

#./build/run build/tools/virsh hypervisor-cpu-compare /cpu1.xml --validate
The CPU provided by hypervisor on the host is a superset of CPU described in /cpu1.xml

Scenario 3-                   
#cat /cpu2.xml
<cpu mode='host-model'>
      <model fallback='forbid'>EPYC-IBPB</model>
      <vendor>AMD</vendor>
      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='pschange-mc-no'/>
      <feature policy='disable' name='monitor'/>
    
</cpu>

#./build/run build/tools/virsh hypervisor-cpu-compare /cpu2.xml --validate
CPU described in /cpu2.xml is incompatible with the CPU provided by hypervisor on the host


For Scenario 1, is it expected? | From Scenario 2 to Scenario 3, there is only one feature removed, but the message in scenario3  seems not valid. Can you help to confirm it?

Comment 11 Tim Wiederhake 2020-10-20 09:15:13 UTC
Hi,

Scenario 1:

$ cat cpu.xml
<cpu><feature name="aes"/></cpu>

"cpu.xml" describes a host CPU. Host CPUs are required to have elements "arch"
and "model". The error is expected.

$ cat cpu_archmodel.xml
<cpu><arch>x86_64</arch><model>EPYC-IBPB</model><feature name="aes"/></cpu>
$ ./run tools/virsh hypervisor-cpu-compare cpu_archmodel.xml --validate
CPU described in cpu_archmodel.xml is incompatible with the CPU provided by hypervisor on the host



Scenario 2 and 3:

$ diff -u cpu{1,2}.xml
--- cpu1.xml	2020-10-20 10:23:01.189440953 +0200
+++ cpu2.xml	2020-10-20 11:04:51.720036649 +0200
@@ -4,5 +4,4 @@
       <feature policy='require' name='x2apic'/>
       <feature policy='require' name='pschange-mc-no'/>
       <feature policy='disable' name='monitor'/>
-      <feature policy='disable' name='svm'/>
 </cpu>

Feature "svm" is enabled by default for this model, i.e. removing the
"policy='disable'" line effectively sets it to "require". To use this feature,
nested virtualization must be enabled and supported on the host. If this is not
the case, the message in scenario 3 is correct.

I hope I could help,
Tim

Comment 12 Jing Qi 2020-11-19 04:30:55 UTC
Tim,
Thanks for your answer. Can you please help to check if the patch has been added in the RHEL 8.4 branch? 

I checked the libvirt version - libvirt-daemon-6.0.0-30.module+el8.4.0+8705+34397d87.x86_64 -
Ran the command "virsh hypervisor-cpu-compare cpu_archmodel.xml --validate", it still reported 
error: command 'hypervisor-cpu-compare' doesn't support option --validate.

Comment 14 Jiri Denemark 2020-11-24 09:35:02 UTC
(In reply to Jing Qi from comment #12)
> Tim,
> Thanks for your answer. Can you please help to check if the patch has been
> added in the RHEL 8.4 branch?

No, the bug is in POST, which means the patches were prepared, but not built
yet. Once included in a RHEL build the bug will change to MODIFIED and "Fixed
in Version" field will contain the exact package release which contains the
patches.

Comment 15 Jiri Denemark 2020-11-24 14:16:56 UTC
Downstream reviewer was concerned about the risk of the backport compared to
the benefits. The backport refactors XML schema files, which could not be
cherry-picked cleanly without causing conflict that had to be resolved
manually. On the other hand, the benefit of seeing possible mistakes in the
input XML passed to CPU comparison APIs needs to be explicitly turned on by
the user with a special flag.

So the question is, whether customer really needs this functionality (earlier
than after the next rebase of RHEL on RHEL-AV) and would actually be using it
since changes on their side are needed to consume it.

Thomas, can you please explain this to them and provide their expectations?

Comment 16 Thomas Huth 2020-11-24 15:17:42 UTC
(In reply to Jiri Denemark from comment #15)
> So the question is, whether customer really needs this functionality (earlier
> than after the next rebase of RHEL on RHEL-AV) and would actually be using it
> since changes on their side are needed to consume it.
> 
> Thomas, can you please explain this to them and provide their expectations?

The "customer" is Sebastian in this case (he reported the bug) ... so let's ask him... Sebastian, IMHO it should be ok if you check whether this problem is fixed in the rebased AV builds, and if yes, let's close this as DEFERRED or UPSTREAM? What do you think?

Comment 18 smitterl 2020-11-25 14:25:46 UTC
(In reply to Thomas Huth from comment #16)
> (In reply to Jiri Denemark from comment #15)
> > So the question is, whether customer really needs this functionality (earlier
> > than after the next rebase of RHEL on RHEL-AV) and would actually be using it
> > since changes on their side are needed to consume it.
> > 
> > Thomas, can you please explain this to them and provide their expectations?
> 
> The "customer" is Sebastian in this case (he reported the bug) ... so let's
> ask him... Sebastian, IMHO it should be ok if you check whether this problem
> is fixed in the rebased AV builds, and if yes, let's close this as DEFERRED
> or UPSTREAM? What do you think?

I agree with you that RHEL-AV should be good enough. What I'm not sure of is if we want to close this one as I can imagine we should have one minimal automated validation test on system level.

Luyao, I believe you're QE feature owner, what's your opinion? Is it OK to move this to RHEL AV and arch x86_64? Would you want to close this one as UPSTREAM or rather leave it open?

Comment 19 Luyao Huang 2020-11-26 02:42:47 UTC
(In reply to smitterl from comment #18)
> (In reply to Thomas Huth from comment #16)
> > (In reply to Jiri Denemark from comment #15)
> > > So the question is, whether customer really needs this functionality (earlier
> > > than after the next rebase of RHEL on RHEL-AV) and would actually be using it
> > > since changes on their side are needed to consume it.
> > > 
> > > Thomas, can you please explain this to them and provide their expectations?
> > 
> > The "customer" is Sebastian in this case (he reported the bug) ... so let's
> > ask him... Sebastian, IMHO it should be ok if you check whether this problem
> > is fixed in the rebased AV builds, and if yes, let's close this as DEFERRED
> > or UPSTREAM? What do you think?
> 
> I agree with you that RHEL-AV should be good enough. What I'm not sure of is
> if we want to close this one as I can imagine we should have one minimal
> automated validation test on system level.
> 
> Luyao, I believe you're QE feature owner, what's your opinion? Is it OK to
> move this to RHEL AV and arch x86_64? Would you want to close this one as
> UPSTREAM or rather leave it open?

In my opinion, we can move this bug to RHEL-AV. Because most of up-layer projects are using RHEL-AV libvirt, and even user hit this problem with RHEL libvirt, it is not a critical problem.

Thanks,
Luyao

Comment 20 IBM Bug Proxy 2020-12-02 13:00:24 UTC
------- Comment From cborntra.com 2020-12-02 07:51 EDT-------
Redhat:
FWIW, moving this to AV will move s390x in the cold until 8.6 or so.
Not sure if this is the right thing to do.

I will discuss this with the redhat KVM/s390 team and come back.

Comment 21 Jing Qi 2020-12-08 09:31:32 UTC
Verified in libvirt-daemon-6.10.0-1.module+el8.4.0+8898+a84e86e1.x86_64

# virsh hypervisor-cpu-compare hostcpu.xml --validate
setlocale: No such file or directory
error: Failed to compare hypervisor CPU with hostcpu.xml
error: XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/cpu.rng
Element cpu has extra content: arch
Invalid attribute dies for element topology

hostcpu.xml -
<cpu>
      <arch>x86_64</arch>
      <model>Cascadelake-Server-noTSX</model>
      <vendor>Intel</vendor>
      <microcode version='83898371'/>
      <counter name='tsc' frequency='2194842000' scaling='yes'/>
      <topology sockets='1' dies='1' cores='12' threads='2'/>
      <feature name='ds'/>
      <feature name='acpi'/>
      <feature name='ss'/>
      <feature name='ht'/>
      <feature name='tm'/>
      <feature name='pbe'/>
      <feature name='dtes64'/>
      <feature name='monitor'/>
      <feature name='ds_cpl'/>
      <feature name='vmx'/>
      <feature name='smx'/>
      <feature name='est'/>
      <feature name='tm2'/>
      <feature name='xtpr'/>
      <feature name='pdcm'/>
      <feature name='dca'/>
      <feature name='osxsave'/>
      <feature name='tsc_adjust'/>
      <feature name='cmt'/>
      <feature name='intel-pt'/>
      <feature name='pku'/>
      <feature name='ospke'/>
      <feature name='md-clear'/>
      <feature name='stibp'/>
      <feature name='arch-capabilities'/>
      <feature name='xsaves'/>
      <feature name='mbm_total'/>
      <feature name='mbm_local'/>
      <feature name='invtsc'/>
      <feature name='rdctl-no'/>
      <feature name='ibrs-all'/>
      <feature name='skip-l1dfl-vmentry'/>
      <feature name='mds-no'/>
      <feature name='tsx-ctrl'/>
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
      <pages unit='KiB' size='1048576'/>
    </cpu>

From comment 11, the arch is supported. Can you please help to confirm it? Thanks

Comment 22 Jing Qi 2020-12-08 09:32:57 UTC
Verified in libvirt-daemon-6.10.0-1.module+el8.4.0+8898+a84e86e1.x86_64

# virsh hypervisor-cpu-compare hostcpu.xml --validate
setlocale: No such file or directory
error: Failed to compare hypervisor CPU with hostcpu.xml
error: XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/cpu.rng
Element cpu has extra content: arch
Invalid attribute dies for element topology

hostcpu.xml -
<cpu>
      <arch>x86_64</arch>
      <model>Cascadelake-Server-noTSX</model>
      <vendor>Intel</vendor>
      <microcode version='83898371'/>
      <counter name='tsc' frequency='2194842000' scaling='yes'/>
      <topology sockets='1' dies='1' cores='12' threads='2'/>
      <feature name='ds'/>
      <feature name='acpi'/>
      <feature name='ss'/>
      <feature name='ht'/>
      <feature name='tm'/>
      <feature name='pbe'/>
      <feature name='dtes64'/>
      <feature name='monitor'/>
      <feature name='ds_cpl'/>
      <feature name='vmx'/>
      <feature name='smx'/>
      <feature name='est'/>
      <feature name='tm2'/>
      <feature name='xtpr'/>
      <feature name='pdcm'/>
      <feature name='dca'/>
      <feature name='osxsave'/>
      <feature name='tsc_adjust'/>
      <feature name='cmt'/>
      <feature name='intel-pt'/>
      <feature name='pku'/>
      <feature name='ospke'/>
      <feature name='md-clear'/>
      <feature name='stibp'/>
      <feature name='arch-capabilities'/>
      <feature name='xsaves'/>
      <feature name='mbm_total'/>
      <feature name='mbm_local'/>
      <feature name='invtsc'/>
      <feature name='rdctl-no'/>
      <feature name='ibrs-all'/>
      <feature name='skip-l1dfl-vmentry'/>
      <feature name='mds-no'/>
      <feature name='tsx-ctrl'/>
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
      <pages unit='KiB' size='1048576'/>
    </cpu>

From comment 11, the arch is supported. Can you please help to confirm it? Thanks

Comment 23 Tim Wiederhake 2020-12-14 11:02:51 UTC
The error message is misleading, but expected.

The actual problem is the "<counter name='tsc' frequency='2194842000' scaling='yes'/>" element, that is not defined in the schema. As the user may supply either a host cpu definition or a guest cpu definition, the validator will first try to validate the user supplied xml under the assumption that it is a host cpu definition, fail over the unexpected "counter" element, re-try under the assumption that the user must have supplied a guest cpu definition, and then fails for the (for guest cpu definitions) unexpected "arch" element.

Comment 24 Jing Qi 2020-12-15 01:45:19 UTC
Thanks Tim for your answer. 
Please help to see what's the problem in below scenario which is tested in dell-per740xd machine and 
libvirt-daemon-6.10.0-1.module+el8.4.0+8898+a84e86e1.x86_64

1. Run virsh capabilities & get the cpu part and save it to cpu.xml -

#virsh capabilities 

<cpu>
      <arch>x86_64</arch>
      <model>Cascadelake-Server-noTSX</model>
      <vendor>Intel</vendor>
      <microcode version='83898371'/>
      <counter name='tsc' frequency='2194842000' scaling='yes'/>
      <topology sockets='1' dies='1' cores='12' threads='2'/>
      <feature name='ds'/>
      <feature name='acpi'/>
      <feature name='ss'/>
      <feature name='ht'/>
      <feature name='tm'/>
      <feature name='pbe'/>
      <feature name='dtes64'/>
      <feature name='monitor'/>
      <feature name='ds_cpl'/>
      <feature name='vmx'/>
      <feature name='smx'/>
      <feature name='est'/>
      <feature name='tm2'/>
      <feature name='xtpr'/>
      <feature name='pdcm'/>
      <feature name='dca'/>
      <feature name='osxsave'/>
      <feature name='tsc_adjust'/>
      <feature name='cmt'/>
      <feature name='intel-pt'/>
      <feature name='pku'/>
      <feature name='ospke'/>
      <feature name='md-clear'/>
      <feature name='stibp'/>
      <feature name='arch-capabilities'/>
      <feature name='xsaves'/>
      <feature name='mbm_total'/>
      <feature name='mbm_local'/>
      <feature name='invtsc'/>
      <feature name='rdctl-no'/>
      <feature name='ibrs-all'/>
      <feature name='skip-l1dfl-vmentry'/>
      <feature name='mds-no'/>
      <feature name='tsx-ctrl'/>
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
      <pages unit='KiB' size='1048576'/>
    </cpu>
2. remove the counter element and then run  hyperviseor-cpu-compare validate.

# virsh hypervisor-cpu-compare cpu.xml --validate
error: Failed to compare hypervisor CPU with cpu.xml
error: XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/cpu.rng
Element cpu has extra content: arch
Invalid attribute dies for element topology

3. remove the "dies='1'" and rerun 

# virsh hypervisor-cpu-compare cpu.xml --validate
CPU described in cpu.xml is incompatible with the CPU provided by hypervisor on the host

Question:
1. If the command support to validate the host cpu definition, why it still failed with removing the invalid attributes?
2. Will these invalid attributes be supported in the future?

Thanks

Comment 25 Tim Wiederhake 2020-12-15 12:21:57 UTC
Hi Jing,

Regarding your first question: The host cpu definition in cpu.xml contains features that cannot be virtualized, e.g. "monitor". The output is correct and expected. Compare your results to "virsh cpu-compare cpu.xml".

Regarding your second question: I just sent out patches, see https://www.redhat.com/archives/libvir-list/2020-December/msg00703.html.

Comment 33 Jing Qi 2020-12-16 02:07:03 UTC
(In reply to Tim Wiederhake from comment #25)
> Hi Jing,
> 
> Regarding your first question: The host cpu definition in cpu.xml contains
> features that cannot be virtualized, e.g. "monitor". The output is correct
> and expected. Compare your results to "virsh cpu-compare cpu.xml".
> 

>>> 
Hi Tim,

The "monitor" is in the cpu part of output of "virsh capabilities". How can I know what other features also cannot be
 "virtualized"? Since I removed "monitor" feature and reran the command, it still reports "incompatible".  How to get the compatible cpu xml file for the host? 

# virsh hypervisor-cpu-compare cpu.xml --validate
CPU described in cpu.xml is incompatible with the CPU provided by hypervisor on the host

Thanks,
Jing Qi

Comment 34 Tim Wiederhake 2020-12-17 15:45:58 UTC
"virsh hypervisor-cpu-compare cpu.xml --validate --error" should give you the list of features in cpu.xml that cannot be virtualized on the current hypervisor.

If you want the description of a cpu that can be virtualized ("is compatible") with your physical cpu, get the hostmodel:

$ virsh domcapabilities > domcapabilities.xml 
$ virsh hypervisor-cpu-baseline domcapabilities.xml > hostmodel.xml
$ virsh hypervisor-cpu-compare hostmodel.xml --validate
CPU described in hostmodel.xml is identical to the CPU provided by hypervisor on the host

Comment 35 Jing Qi 2020-12-21 08:16:53 UTC
Verified with libvirt-daemon-6.10.0-1.module+el8.4.0+8898+a84e86e1.x86_64 & qemu-kvm-5.2.0-0.module+el8.4.0+8855+a9e237a9.x86_64
S1: Verify the cpu description from domcapabilities is compatible in hypervisor-cpu-compare.
1. # virsh domcapabilities > domcapabilities.xml 
2. # virsh hypervisor-cpu-baseline domcapabilities.xml > hostmodel.xml
3. #virsh hypervisor-cpu-compare hostmodel.xml --validate
   CPU described in hostmodel.xml is identical to the CPU provided by hypervisor on the host

Comment 37 errata-xmlrpc 2021-05-25 06:42:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2098


Note You need to log in before you can comment on or make changes to this bug.