Description of problem: <cpu/> is parsed incorrectly by libvirt Version-Release number of selected component (if applicable): libvirt-client-6.0.0-24.module+el8.3.0+7105+0cc49779.x86_64 libvirt-client-6.0.0-24.module+el8.3.0+7105+0cc49779.s390x How reproducible: 100% Steps to Reproduce: 1. virsh domcapabilties > domcaps.xml 2a. copy //cpu from domcaps.xml to cpu.xml (sample xml attached) OR 2b. create cpu.xml with "<cpu><featur name="aes"/></cpu>" (invalid element "featur") 3. virsh hypervisor-cpu-compare cpu.xml Actual results: >> The CPU provided by hypervisor on the host is a superset of CPU described in cpu_from_domcaps.xml Expected results: >> error: File 'cpu.xml' does not contain a valid <cpu> element or valid domain XML, host capabilities XML, or domain capabilities XML Additional info: On s390x, the command correctly fails; however the error message isn't very clear: error: Failed to compare hypervisor CPU with cpu.xml error: internal error: unable to execute QEMU command 'query-cpu-model-comparison': Invalid parameter type for 'modelb.name', expected: string
(In reply to smitterl from comment #0) > Description of problem: > 2a. copy //cpu from domcaps.xml to cpu.xml (sample xml attached) > OR > 2b. create cpu.xml with "<cpu><featur name="aes"/></cpu>" (invalid element Actually, there's a 2c.: virsh dumpxml vm > cpu.xml virsh hypervisor-cpu-compare cpu.xml The CPU provided by hypervisor on the host is a superset of CPU described in /root/avocado-vt-vm1 That is, there must not even be a toplevel <cpu> if I understand correctly.
> The CPU provided by hypervisor on the host is a superset of CPU described in > /root/avocado-vt-vm1 c&p error, should say "The CPU provided by hypervisor on the host is a superset of CPU described in cpu.xml"
------- Comment From Collin.Walling 2020-06-25 18:32 EDT------- The XML to CPU definition parser in libvirt seems to ignore any erroneous tags, instead of reporting an error. When copying the CPU definition from domcapabilities, then modifying one of the feature tags, the parser will ignore that erroneous feature tag. When comparing this CPU to the hypervisor/host CPU, the result will be reported as superset. As for the s390x error message observed, I have proposed a fix for https://bugzilla.redhat.com/show_bug.cgi?id=1850680. With this fix in place, I now see the same message as observed on x86.
Proposed patch https://www.redhat.com/archives/libvir-list/2020-September/msg00933.html
------- Comment From Collin.Walling 2020-09-16 15:01 EDT------- Thanks for pointing out those patches. It looks like the root cause with libvirt not catching an erroneous value within the user-provided XML file is due to the fact that the file is not ran against an XML schema. A schema is ran against commands such as virsh define and edit, which can catch any unrecognised elements (such as a typo). This might be beyond what IBM can fix, as this is something the libvirt team would have to implement -- hopefully the conversation on the list steers toward that direction. I'll keep an eye on the progress. (Please note that the patches I mentioned in a previous comment here relate to a different bugzilla. Apologies for the confusion)
Proposed patch https://www.redhat.com/archives/libvir-list/2020-September/msg01073.html
Merged upstream: https://gitlab.com/libvirt/libvirt/-/commit/9702659807157648a45c5928626a363ae3489cc9 $ cat cpu.xml <cpu><featur name="aes"/></cpu> $ ./run tools/virsh hypervisor-cpu-compare cpu.xml The CPU provided by hypervisor on the host is a superset of CPU described in cpu.xml $ ./run tools/virsh hypervisor-cpu-compare cpu.xml --validate error: Failed to compare hypervisor CPU with cpu.xml error: XML document failed to validate against schema: Unable to validate doc against /home/twiederh/git/libvirt/docs/schemas/cpu.rng Element cpu has extra content: featur Did not expect element featur there
Tested with upstream libvirt - build/src/libvirtd (libvirt) 6.9.0 Test result as below - Scenario 1: # cat cpu.xml <cpu><feature name="aes"/></cpu> #./build/run build/tools/virsh hypervisor-cpu-compare /cpu.xml --validate 2020-10-20 06:32:22.688+0000: 148270: error : virXMLValidatorValidate:1290 : XML document failed to validate against schema: Unable to validate doc against /root/libvirt/docs/schemas/cpu.rng Element feature failed to validate attributes Extra element feature in interleave Element cpu failed to validate content Did not expect element feature there Scenario 2: #cat /cpu1.xml <cpu mode='host-model'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <feature policy='require' name='x2apic'/> <feature policy='require' name='pschange-mc-no'/> <feature policy='disable' name='monitor'/> <feature policy='disable' name='svm'/> </cpu> #./build/run build/tools/virsh hypervisor-cpu-compare /cpu1.xml --validate The CPU provided by hypervisor on the host is a superset of CPU described in /cpu1.xml Scenario 3- #cat /cpu2.xml <cpu mode='host-model'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <feature policy='require' name='x2apic'/> <feature policy='require' name='pschange-mc-no'/> <feature policy='disable' name='monitor'/> </cpu> #./build/run build/tools/virsh hypervisor-cpu-compare /cpu2.xml --validate CPU described in /cpu2.xml is incompatible with the CPU provided by hypervisor on the host For Scenario 1, is it expected? | From Scenario 2 to Scenario 3, there is only one feature removed, but the message in scenario3 seems not valid. Can you help to confirm it?
Hi, Scenario 1: $ cat cpu.xml <cpu><feature name="aes"/></cpu> "cpu.xml" describes a host CPU. Host CPUs are required to have elements "arch" and "model". The error is expected. $ cat cpu_archmodel.xml <cpu><arch>x86_64</arch><model>EPYC-IBPB</model><feature name="aes"/></cpu> $ ./run tools/virsh hypervisor-cpu-compare cpu_archmodel.xml --validate CPU described in cpu_archmodel.xml is incompatible with the CPU provided by hypervisor on the host Scenario 2 and 3: $ diff -u cpu{1,2}.xml --- cpu1.xml 2020-10-20 10:23:01.189440953 +0200 +++ cpu2.xml 2020-10-20 11:04:51.720036649 +0200 @@ -4,5 +4,4 @@ <feature policy='require' name='x2apic'/> <feature policy='require' name='pschange-mc-no'/> <feature policy='disable' name='monitor'/> - <feature policy='disable' name='svm'/> </cpu> Feature "svm" is enabled by default for this model, i.e. removing the "policy='disable'" line effectively sets it to "require". To use this feature, nested virtualization must be enabled and supported on the host. If this is not the case, the message in scenario 3 is correct. I hope I could help, Tim
Tim, Thanks for your answer. Can you please help to check if the patch has been added in the RHEL 8.4 branch? I checked the libvirt version - libvirt-daemon-6.0.0-30.module+el8.4.0+8705+34397d87.x86_64 - Ran the command "virsh hypervisor-cpu-compare cpu_archmodel.xml --validate", it still reported error: command 'hypervisor-cpu-compare' doesn't support option --validate.
(In reply to Jing Qi from comment #12) > Tim, > Thanks for your answer. Can you please help to check if the patch has been > added in the RHEL 8.4 branch? No, the bug is in POST, which means the patches were prepared, but not built yet. Once included in a RHEL build the bug will change to MODIFIED and "Fixed in Version" field will contain the exact package release which contains the patches.
Downstream reviewer was concerned about the risk of the backport compared to the benefits. The backport refactors XML schema files, which could not be cherry-picked cleanly without causing conflict that had to be resolved manually. On the other hand, the benefit of seeing possible mistakes in the input XML passed to CPU comparison APIs needs to be explicitly turned on by the user with a special flag. So the question is, whether customer really needs this functionality (earlier than after the next rebase of RHEL on RHEL-AV) and would actually be using it since changes on their side are needed to consume it. Thomas, can you please explain this to them and provide their expectations?
(In reply to Jiri Denemark from comment #15) > So the question is, whether customer really needs this functionality (earlier > than after the next rebase of RHEL on RHEL-AV) and would actually be using it > since changes on their side are needed to consume it. > > Thomas, can you please explain this to them and provide their expectations? The "customer" is Sebastian in this case (he reported the bug) ... so let's ask him... Sebastian, IMHO it should be ok if you check whether this problem is fixed in the rebased AV builds, and if yes, let's close this as DEFERRED or UPSTREAM? What do you think?
(In reply to Thomas Huth from comment #16) > (In reply to Jiri Denemark from comment #15) > > So the question is, whether customer really needs this functionality (earlier > > than after the next rebase of RHEL on RHEL-AV) and would actually be using it > > since changes on their side are needed to consume it. > > > > Thomas, can you please explain this to them and provide their expectations? > > The "customer" is Sebastian in this case (he reported the bug) ... so let's > ask him... Sebastian, IMHO it should be ok if you check whether this problem > is fixed in the rebased AV builds, and if yes, let's close this as DEFERRED > or UPSTREAM? What do you think? I agree with you that RHEL-AV should be good enough. What I'm not sure of is if we want to close this one as I can imagine we should have one minimal automated validation test on system level. Luyao, I believe you're QE feature owner, what's your opinion? Is it OK to move this to RHEL AV and arch x86_64? Would you want to close this one as UPSTREAM or rather leave it open?
(In reply to smitterl from comment #18) > (In reply to Thomas Huth from comment #16) > > (In reply to Jiri Denemark from comment #15) > > > So the question is, whether customer really needs this functionality (earlier > > > than after the next rebase of RHEL on RHEL-AV) and would actually be using it > > > since changes on their side are needed to consume it. > > > > > > Thomas, can you please explain this to them and provide their expectations? > > > > The "customer" is Sebastian in this case (he reported the bug) ... so let's > > ask him... Sebastian, IMHO it should be ok if you check whether this problem > > is fixed in the rebased AV builds, and if yes, let's close this as DEFERRED > > or UPSTREAM? What do you think? > > I agree with you that RHEL-AV should be good enough. What I'm not sure of is > if we want to close this one as I can imagine we should have one minimal > automated validation test on system level. > > Luyao, I believe you're QE feature owner, what's your opinion? Is it OK to > move this to RHEL AV and arch x86_64? Would you want to close this one as > UPSTREAM or rather leave it open? In my opinion, we can move this bug to RHEL-AV. Because most of up-layer projects are using RHEL-AV libvirt, and even user hit this problem with RHEL libvirt, it is not a critical problem. Thanks, Luyao
------- Comment From cborntra.com 2020-12-02 07:51 EDT------- Redhat: FWIW, moving this to AV will move s390x in the cold until 8.6 or so. Not sure if this is the right thing to do. I will discuss this with the redhat KVM/s390 team and come back.
Verified in libvirt-daemon-6.10.0-1.module+el8.4.0+8898+a84e86e1.x86_64 # virsh hypervisor-cpu-compare hostcpu.xml --validate setlocale: No such file or directory error: Failed to compare hypervisor CPU with hostcpu.xml error: XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/cpu.rng Element cpu has extra content: arch Invalid attribute dies for element topology hostcpu.xml - <cpu> <arch>x86_64</arch> <model>Cascadelake-Server-noTSX</model> <vendor>Intel</vendor> <microcode version='83898371'/> <counter name='tsc' frequency='2194842000' scaling='yes'/> <topology sockets='1' dies='1' cores='12' threads='2'/> <feature name='ds'/> <feature name='acpi'/> <feature name='ss'/> <feature name='ht'/> <feature name='tm'/> <feature name='pbe'/> <feature name='dtes64'/> <feature name='monitor'/> <feature name='ds_cpl'/> <feature name='vmx'/> <feature name='smx'/> <feature name='est'/> <feature name='tm2'/> <feature name='xtpr'/> <feature name='pdcm'/> <feature name='dca'/> <feature name='osxsave'/> <feature name='tsc_adjust'/> <feature name='cmt'/> <feature name='intel-pt'/> <feature name='pku'/> <feature name='ospke'/> <feature name='md-clear'/> <feature name='stibp'/> <feature name='arch-capabilities'/> <feature name='xsaves'/> <feature name='mbm_total'/> <feature name='mbm_local'/> <feature name='invtsc'/> <feature name='rdctl-no'/> <feature name='ibrs-all'/> <feature name='skip-l1dfl-vmentry'/> <feature name='mds-no'/> <feature name='tsx-ctrl'/> <pages unit='KiB' size='4'/> <pages unit='KiB' size='2048'/> <pages unit='KiB' size='1048576'/> </cpu> From comment 11, the arch is supported. Can you please help to confirm it? Thanks
The error message is misleading, but expected. The actual problem is the "<counter name='tsc' frequency='2194842000' scaling='yes'/>" element, that is not defined in the schema. As the user may supply either a host cpu definition or a guest cpu definition, the validator will first try to validate the user supplied xml under the assumption that it is a host cpu definition, fail over the unexpected "counter" element, re-try under the assumption that the user must have supplied a guest cpu definition, and then fails for the (for guest cpu definitions) unexpected "arch" element.
Thanks Tim for your answer. Please help to see what's the problem in below scenario which is tested in dell-per740xd machine and libvirt-daemon-6.10.0-1.module+el8.4.0+8898+a84e86e1.x86_64 1. Run virsh capabilities & get the cpu part and save it to cpu.xml - #virsh capabilities <cpu> <arch>x86_64</arch> <model>Cascadelake-Server-noTSX</model> <vendor>Intel</vendor> <microcode version='83898371'/> <counter name='tsc' frequency='2194842000' scaling='yes'/> <topology sockets='1' dies='1' cores='12' threads='2'/> <feature name='ds'/> <feature name='acpi'/> <feature name='ss'/> <feature name='ht'/> <feature name='tm'/> <feature name='pbe'/> <feature name='dtes64'/> <feature name='monitor'/> <feature name='ds_cpl'/> <feature name='vmx'/> <feature name='smx'/> <feature name='est'/> <feature name='tm2'/> <feature name='xtpr'/> <feature name='pdcm'/> <feature name='dca'/> <feature name='osxsave'/> <feature name='tsc_adjust'/> <feature name='cmt'/> <feature name='intel-pt'/> <feature name='pku'/> <feature name='ospke'/> <feature name='md-clear'/> <feature name='stibp'/> <feature name='arch-capabilities'/> <feature name='xsaves'/> <feature name='mbm_total'/> <feature name='mbm_local'/> <feature name='invtsc'/> <feature name='rdctl-no'/> <feature name='ibrs-all'/> <feature name='skip-l1dfl-vmentry'/> <feature name='mds-no'/> <feature name='tsx-ctrl'/> <pages unit='KiB' size='4'/> <pages unit='KiB' size='2048'/> <pages unit='KiB' size='1048576'/> </cpu> 2. remove the counter element and then run hyperviseor-cpu-compare validate. # virsh hypervisor-cpu-compare cpu.xml --validate error: Failed to compare hypervisor CPU with cpu.xml error: XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/cpu.rng Element cpu has extra content: arch Invalid attribute dies for element topology 3. remove the "dies='1'" and rerun # virsh hypervisor-cpu-compare cpu.xml --validate CPU described in cpu.xml is incompatible with the CPU provided by hypervisor on the host Question: 1. If the command support to validate the host cpu definition, why it still failed with removing the invalid attributes? 2. Will these invalid attributes be supported in the future? Thanks
Hi Jing, Regarding your first question: The host cpu definition in cpu.xml contains features that cannot be virtualized, e.g. "monitor". The output is correct and expected. Compare your results to "virsh cpu-compare cpu.xml". Regarding your second question: I just sent out patches, see https://www.redhat.com/archives/libvir-list/2020-December/msg00703.html.
(In reply to Tim Wiederhake from comment #25) > Hi Jing, > > Regarding your first question: The host cpu definition in cpu.xml contains > features that cannot be virtualized, e.g. "monitor". The output is correct > and expected. Compare your results to "virsh cpu-compare cpu.xml". > >>> Hi Tim, The "monitor" is in the cpu part of output of "virsh capabilities". How can I know what other features also cannot be "virtualized"? Since I removed "monitor" feature and reran the command, it still reports "incompatible". How to get the compatible cpu xml file for the host? # virsh hypervisor-cpu-compare cpu.xml --validate CPU described in cpu.xml is incompatible with the CPU provided by hypervisor on the host Thanks, Jing Qi
"virsh hypervisor-cpu-compare cpu.xml --validate --error" should give you the list of features in cpu.xml that cannot be virtualized on the current hypervisor. If you want the description of a cpu that can be virtualized ("is compatible") with your physical cpu, get the hostmodel: $ virsh domcapabilities > domcapabilities.xml $ virsh hypervisor-cpu-baseline domcapabilities.xml > hostmodel.xml $ virsh hypervisor-cpu-compare hostmodel.xml --validate CPU described in hostmodel.xml is identical to the CPU provided by hypervisor on the host
Verified with libvirt-daemon-6.10.0-1.module+el8.4.0+8898+a84e86e1.x86_64 & qemu-kvm-5.2.0-0.module+el8.4.0+8855+a9e237a9.x86_64 S1: Verify the cpu description from domcapabilities is compatible in hypervisor-cpu-compare. 1. # virsh domcapabilities > domcapabilities.xml 2. # virsh hypervisor-cpu-baseline domcapabilities.xml > hostmodel.xml 3. #virsh hypervisor-cpu-compare hostmodel.xml --validate CPU described in hostmodel.xml is identical to the CPU provided by hypervisor on the host
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098