Bug 1804224
| Summary: | libvirtd: error : virCPUx86UpdateLive:3110 : operation failed: guest CPU doesn't match specification: missing features: fxsr_opt | ||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Riccardo Pittau <rpittau> | ||||||||||||||||||||||||||||
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||||||||||||||||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | jiyan <jiyan> | ||||||||||||||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||||||||||||||
| Priority: | high | ||||||||||||||||||||||||||||||
| Version: | 8.1 | CC: | dyuan, jdenemar, lhuang, lmen, mtessun, virt-maint, xuzhang | ||||||||||||||||||||||||||||
| Target Milestone: | rc | Keywords: | FutureFeature, Regression, Triaged, ZStream | ||||||||||||||||||||||||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||||||||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||||||||||||
| Fixed In Version: | libvirt-4.5.0-41.el8 | Doc Type: | If docs needed, set a value | ||||||||||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||||||||||
| Clone Of: | |||||||||||||||||||||||||||||||
| : | 1809510 (view as bug list) | Environment: | |||||||||||||||||||||||||||||
| Last Closed: | 2020-04-28 15:33:47 UTC | Type: | Feature Request | ||||||||||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||||||
| Embargoed: | |||||||||||||||||||||||||||||||
| Bug Depends On: | |||||||||||||||||||||||||||||||
| Bug Blocks: | 1809510 | ||||||||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||||||||
|
Description
Riccardo Pittau
2020-02-18 13:43:11 UTC
Created attachment 1663747 [details]
libvirt cpu map
Created attachment 1663748 [details]
libvirtd logs
Created attachment 1663749 [details]
ls cpu output
Created attachment 1663750 [details]
testvm1 xml dump
Created attachment 1663752 [details]
libvirt cpu map
Created attachment 1663754 [details]
libvirtd logs
Created attachment 1663755 [details]
ls cpu output
Created attachment 1663756 [details]
testvm1 xml dump
Created attachment 1663758 [details]
testvm1 logs
Created attachment 1663759 [details]
virsh capabilities
Created attachment 1663760 [details]
virsh domcapabilities
Oh, you're creating a TCG domain, i.e., of type 'qemu'. This is not supported on RHEL. We only support KVM domains (type='kvm'). I guess TCG is used because you're doing all this in a VM, where you would normally use nested KVM, but that first level VM was created with a very poor CPU model. Since the libvirt log does not contain anything usefule, could you please enable debug logs (see https://wiki.libvirt.org/page/DebugLogs guidance) and reproduce the issue again? The testvm1_dump.xml describes the already running domain. Could you also provide the XML used to initially create the domain? And what does "restart the vm" mean here? If you use virsh or libvirt API, could you provide the exact commands you use to reproduce this bug. Created attachment 1664083 [details]
libvirt debug logs
Created attachment 1664084 [details]
libvirt template
Hey Jiri, I attached the libvirt debug logs and the libvirt domain template used to create the domain. About the commands, we define the vm using the template attached and then start it using virtualbmc (so using libvirt api with libvirt-python) with 'vbmc start testvm1' (kind of the same as 'virsh start [domain]'). Thank you again for your help. It doesn't seem libvirt_template.xml is actually used for starting the domain.
At least not after you "restart" it (whatever that means). The error suggests
there must be a check='full' in the inactive domain XML. BTW, it will always
be shown in the XML describing the running domain (i.e., in the "virsh dumpxml
testvm1" output while the domain is running).
Looking at the debug logs I can see the domain was defined with a host-model
CPU and started. The real CPU definition which was used as host-model is not
visible in the logs, but we can infer it from the QEMU command line:
/usr/libexec/qemu-kvm \
-name guest=testvm1,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-testvm1/master-key.aes \
-machine pc-i440fx-rhel7.6.0,accel=tcg,usb=off,dump-guest-core=off \
-cpu EPYC,acpi=on,ss=on,hypervisor=on,erms=on,mpx=on,pcommit=on,clwb=on,\
pku=on,ospke=on,la57=on,3dnowext=on,3dnow=on,vme=off,fma=off,avx=off,\
f16c=off,rdrand=off,avx2=off,rdseed=off,sha-ni=off,xsavec=off,\
misalignsse=off,3dnowprefetch=off,osvw=off,topoext=off \
-m 3072 \
...
which would translate to
<cpu match='exact'>
<model>EPYC</model>
<feature name='acpi' policy='require'/>
<feature name='ss' policy='require'/>
<feature name='hypervisor' policy='require'/>
<feature name='erms' policy='require'/>
<feature name='mpx' policy='require'/>
<feature name='pcommit' policy='require'/>
<feature name='clwb' policy='require'/>
<feature name='pku' policy='require'/>
<feature name='ospke' policy='require'/>
<feature name='la57' policy='require'/>
<feature name='3dnowext' policy='require'/>
<feature name='3dnow' policy='require'/>
<feature name='vme' policy='disable'/>
<feature name='fma' policy='disable'/>
<feature name='avx' policy='disable'/>
<feature name='f16c' policy='disable'/>
<feature name='rdrand' policy='disable'/>
<feature name='avx2' policy='disable'/>
<feature name='rdseed' policy='disable'/>
<feature name='sha-ni' policy='disable'/>
<feature name='xsavec' policy='disable'/>
<feature name='misalignsse' policy='disable'/>
<feature name='3dnowprefetch' policy='disable'/>
<feature name='osvw' policy='disable'/>
<feature name='topoext' policy='disable'/>
</cpu>
The fxsr-opt feature was not explicitly disabled and when QEMU starts, we can
detect that fxsr-opt (enabled implicitly by asking for EPYC model) could not
be enabled. This would result in
<feature policy="disable" name="fxsr_opt"/>
being added to the active domain XML (in addition to setting the cpu/@check
attribute to 'full').
A bit later it seems the domain is redefined using its active definition,
i.e., the host-model CPU is replaced with
<cpu check="full" match="exact" mode="custom">
<model fallback="forbid">EPYC</model>
<vendor>AMD</vendor>
<feature name="acpi" policy="require" />
<feature name="ss" policy="require" />
<feature name="hypervisor" policy="require" />
<feature name="erms" policy="require" />
<feature name="mpx" policy="require" />
<feature name="pcommit" policy="require" />
<feature name="clwb" policy="require" />
<feature name="pku" policy="require" />
<feature name="ospke" policy="require" />
<feature name="la57" policy="require" />
<feature name="3dnowext" policy="require" />
<feature name="3dnow" policy="require" />
<feature name="vme" policy="disable" />
<feature name="fma" policy="disable" />
<feature name="avx" policy="disable" />
<feature name="f16c" policy="disable" />
<feature name="rdrand" policy="disable" />
<feature name="avx2" policy="disable" />
<feature name="rdseed" policy="disable" />
<feature name="sha-ni" policy="disable" />
<feature name="xsavec" policy="disable" />
<feature name="misalignsse" policy="disable" />
<feature name="3dnowprefetch" policy="disable" />
<feature name="osvw" policy="disable" />
<feature name="topoext" policy="disable" />
<feature name="fxsr_opt" policy="disable" />
</cpu>
This happens twice in 10 seconds. And a bit later a domain is shut down. Then
libvirt is asked to start the domain. So in other words, the answer to my
original ``And what does "restart the vm" mean here?'' question is: redefining
the domain using its active definition, stopping the domain and starting it
again with the following QEMU command line:
/usr/libexec/qemu-kvm \
-name guest=testvm1,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-testvm1/master-key.aes \
-machine pc-i440fx-rhel7.6.0,accel=tcg,usb=off,dump-guest-core=off \
-cpu EPYC,acpi=on,ss=on,hypervisor=on,erms=on,mpx=on,pcommit=on,clwb=on,\
pku=on,ospke=on,la57=on,3dnowext=on,3dnow=on,vme=off,fma=off,avx=off,\
f16c=off,rdrand=off,avx2=off,rdseed=off,sha-ni=off,xsavec=off,\
misalignsse=off,3dnowprefetch=off,osvw=off,topoext=off \
-m 3072 \
...
However, fxsr-opt=off is not passed to the -cpu option even though we know for
sure the domain XML contains
<feature name="fxsr_opt" policy="disable"/>
Thus, when we check what features could not be provided by QEMU, we get
fxsr_opt again as it wasn't explicitly disabled. The cpu/@check attribute is
set to 'full', which turns this situation into a fatal error.
This makes me think that even the CPU definition used to replace the original
host-model CPU when the domain was first started explicitly disabled the
fxsr_opt feature. But since the cpu/@check attribute was set to 'partial' at
that point, the domain started just fine.
And indeed, even asking libvirt to create a QEMU command line for the
following simple XML reveals the issue.
# cat test.xml
<domain type='kvm'>
<name>test</name>
<memory>4096</memory>
<os>
<type arch='x86_64'>hvm</type>
</os>
<cpu mode='custom' match='exact' check='none'>
<model>EPYC</model>
<feature name='fxsr_opt' policy='disable'/>
</cpu>
</domain>
# virsh domxml-to-native qemu-argv --xml test.xml
...
/usr/libexec/qemu-kvm \
-name guest=test,debug-threads=on \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain--1-test/master-key.aes \
-machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \
-cpu EPYC \
-m 4 \
...
Upstream and even RHEL-AV-8.1.1 work fine, which means this bug is limited to
the 4.5.0 based version of libvirt used in RHEL-8.1.
Bisecting pointed to the following patch backported for libvirt-4.5.0-35.1.el8
(RHEL-8.1.0.z) and libvirt-4.5.0-36.el8 (RHEL-8.2.0):
commit ac34e141596fab70fbe91a396311f80db6cb57c5
Refs: v5.9.0-145-gac34e14159
Author: Jiri Denemark <jdenemar>
AuthorDate: Fri Oct 18 14:33:19 2019 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Tue Nov 12 20:14:16 2019 +0100
qemu: Drop disabled CPU features unknown to QEMU
When a CPU definition wants to explicitly disable some features that are
unknown to QEMU, we can safely drop them from the definition before
starting QEMU. Naturally QEMU won't enable such features implicitly.
Signed-off-by: Jiri Denemark <jdenemar>
Reviewed-by: Daniel P. Berrangé <berrange>
However, this commit works as expected upstream, because the function used for
gathering CPU features supported by QEMU (virQEMUCapsGetCPUFeatures) was fixed
upstream earlier by:
commit 1fd28a2e79692babd63d6b8e9eea90168dd0897e
Refs: v5.5.0-333-g1fd28a2e79
Author: Jiri Denemark <jdenemar>
AuthorDate: Thu Jul 25 10:27:45 2019 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Fri Jul 26 16:37:30 2019 +0200
qemu: Translate features in virQEMUCapsGetCPUFeatures
Starting with QEMU 4.1 qemuMonitorCPUModelInfo structure in virQEMUCaps
stores only canonical feature names which may differ from the name used
by libvirt. We need translate these canonical names into libvirt names
for further consumption.
This fixes a bug in qemuConnectBaselineHypervisorCPU which would remove
all features for which libvirt's spelling differs from the QEMU's
preferred name. For example, the following result of
qemuConnectBaselineHypervisorCPU on my host with QEMU 4.1 is wrong:
<cpu mode='custom' match='exact'>
<model fallback='forbid'>Skylake-Client</model>
<vendor>Intel</vendor>
<feature policy='require' name='ss'/>
<feature policy='require' name='vmx'/>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='clflushopt'/>
<feature policy='require' name='umip'/>
<feature policy='require' name='arch-capabilities'/>
<feature policy='require' name='xsaves'/>
<feature policy='require' name='pdpe1gb'/>
<feature policy='require' name='invtsc'/>
<feature policy='disable' name='pclmuldq'/>
<feature policy='disable' name='lahf_lm'/>
</cpu>
The 'pclmuldq' and 'lahf_lm' should not be disabled in the baseline CPU
as they are supported by QEMU on this host.
Signed-off-by: Jiri Denemark <jdenemar>
Reviewed-by: Andrea Bolognani <abologna>
Unfortunately, this earlier fix was never backported downstream.
This is a regression since RHEL-8.1.0 GA (in both 8.2.0 and 8.1.0.z). Reproduced this bug with libvirt-4.5.0-35.2.module+el8.1.0+5256+4b9ab730.x86_64.
Verified this bug with libvirt-4.5.0-35.3.module+el8.1.0+5931+8897e7e1.x86_64.
Version:
libvirt-4.5.0-35.2.module+el8.1.0+5256+4b9ab730.x86_64
qemu-kvm-2.12.0-88.module+el8.1.0+5708+85d8e057.3.x86_64
kernel-4.18.0-147.el8.x86_64
Steps:
1. Preprare a shutdown VM with the following conf
# virsh domstate test81
shut off
# virsh dumpxml test81 --inactive |grep "<domain t"
<domain type='qemu'>
# virsh dumpxml test81 --inactive |grep "<cpu" -A29
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>EPYC</model>
<vendor>AMD</vendor>
<feature policy='require' name='acpi'/>
<feature policy='require' name='ss'/>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='erms'/>
<feature policy='require' name='mpx'/>
<feature policy='require' name='pcommit'/>
<feature policy='require' name='clwb'/>
<feature policy='require' name='pku'/>
<feature policy='require' name='ospke'/>
<feature policy='require' name='la57'/>
<feature policy='require' name='3dnowext'/>
<feature policy='require' name='3dnow'/>
<feature policy='disable' name='vme'/>
<feature policy='disable' name='fma'/>
<feature policy='disable' name='avx'/>
<feature policy='disable' name='f16c'/>
<feature policy='disable' name='rdrand'/>
<feature policy='disable' name='avx2'/>
<feature policy='disable' name='rdseed'/>
<feature policy='disable' name='sha-ni'/>
<feature policy='disable' name='xsavec'/>
<feature policy='disable' name='misalignsse'/>
<feature policy='disable' name='3dnowprefetch'/>
<feature policy='disable' name='osvw'/>
<feature policy='disable' name='topoext'/>
<feature policy='disable' name='fxsr_opt'/>
</cpu>
2. Start VM and check the log
# virsh start test81
error: Failed to start domain test81
error: operation failed: guest CPU doesn't match specification: missing features: fxsr_opt
# tail -f /var/log/libvirt/qemu/test81.log
2020-03-09T02:23:15.501270Z qemu-kvm: warning: TCG doesn't support requested feature: CPUID.80000001H:EDX.fxsr-opt [bit 25]
2020-03-09T02:23:15.501943Z qemu-kvm: warning: TCG doesn't support requested feature: CPUID.80000001H:EDX.fxsr-opt [bit 25]
3. Update libvirt and restart libvirtd
# yum update libvirt* -y
# systemctl restart libvirtd
# rpm -qa libvirt
libvirt-4.5.0-35.3.module+el8.1.0+5931+8897e7e1.x86_64
4. Start the VM again and check the qemu cmd line/active dumpxml
# virsh start test81
Domain test81 started
# ps -ef |grep test81
-cpu EPYC,acpi=on,ss=on,hypervisor=on,erms=on,mpx=on,pcommit=on,clwb=on,pku=on,ospke=on,la57=on,3dnowext=on,3dnow=on,vme=off,fma=off,avx=off,f16c=off,rdrand=off,avx2=off,rdseed=off,sha-ni=off,xsavec=off,misalignsse=off,3dnowprefetch=off,osvw=off,topoext=off,fxsr-opt=off
# virsh dumpxml test81 | grep "<cpu" -A30
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>EPYC</model>
<vendor>AMD</vendor>
<feature policy='require' name='acpi'/>
<feature policy='require' name='ss'/>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='erms'/>
<feature policy='require' name='mpx'/>
<feature policy='require' name='pcommit'/>
<feature policy='require' name='clwb'/>
<feature policy='require' name='pku'/>
<feature policy='require' name='ospke'/>
<feature policy='require' name='la57'/>
<feature policy='require' name='3dnowext'/>
<feature policy='require' name='3dnow'/>
<feature policy='disable' name='vme'/>
<feature policy='disable' name='fma'/>
<feature policy='disable' name='avx'/>
<feature policy='disable' name='f16c'/>
<feature policy='disable' name='rdrand'/>
<feature policy='disable' name='avx2'/>
<feature policy='disable' name='rdseed'/>
<feature policy='disable' name='sha-ni'/>
<feature policy='disable' name='xsavec'/>
<feature policy='disable' name='misalignsse'/>
<feature policy='disable' name='3dnowprefetch'/>
<feature policy='disable' name='osvw'/>
<feature policy='disable' name='topoext'/>
<feature policy='disable' name='fxsr_opt'/>
</cpu>
Sry for the wrong update for this bug, the previous comment should be updated for Bug 1809510. I will update the right versions and steps for this bug later. Reproduced this bug with libvirt-4.5.0-40.module+el8.2.0+5761+d16d25e7.x86_64
Verified this bug with libvirt-4.5.0-41.module+el8.2.0+5928+db9eea38.x86_64
Version:
libvirt-4.5.0-40.module+el8.2.0+5761+d16d25e7.x86_64
qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64
kernel-4.18.0-185.el8.x86_64
Steps:
1. Prepare a shutdown VM with the following conf:
# virsh domstate test82
shut off
# virsh dumpxml test82 --inactive | grep "<domain t"
<domain type='qemu'>
# virsh dumpxml test82 --inactive | grep "<cpu" -A29
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>EPYC</model>
<vendor>AMD</vendor>
<feature policy='require' name='acpi'/>
<feature policy='require' name='ss'/>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='erms'/>
<feature policy='require' name='mpx'/>
<feature policy='require' name='pcommit'/>
<feature policy='require' name='clwb'/>
<feature policy='require' name='pku'/>
<feature policy='require' name='ospke'/>
<feature policy='require' name='la57'/>
<feature policy='require' name='3dnowext'/>
<feature policy='require' name='3dnow'/>
<feature policy='disable' name='vme'/>
<feature policy='disable' name='fma'/>
<feature policy='disable' name='avx'/>
<feature policy='disable' name='f16c'/>
<feature policy='disable' name='rdrand'/>
<feature policy='disable' name='avx2'/>
<feature policy='disable' name='rdseed'/>
<feature policy='disable' name='sha-ni'/>
<feature policy='disable' name='xsavec'/>
<feature policy='disable' name='misalignsse'/>
<feature policy='disable' name='3dnowprefetch'/>
<feature policy='disable' name='osvw'/>
<feature policy='disable' name='topoext'/>
<feature policy='disable' name='fxsr_opt'/>
</cpu>
2. Start VM
# virsh start test82
error: Failed to start domain test82
error: operation failed: guest CPU doesn't match specification: missing features: fxsr_opt
# tail -f /var/log/libvirt/qemu/test82.log
2020-03-09T03:06:06.780420Z qemu-kvm: warning: TCG doesn't support requested feature: CPUID.80000001H:EDX.fxsr-opt [bit 25]
2020-03-09T03:06:06.781249Z qemu-kvm: warning: TCG doesn't support requested feature: CPUID.80000001H:EDX.fxsr-opt [bit 25]
3. Update libvirt and restart libvirtd
# yum update libvirt* -y
# systemctl restart libvirtd
# rpm -qa libvirt
libvirt-4.5.0-41.module+el8.2.0+5928+db9eea38.x86_64
4. Start VM again
# virsh start test82
Domain test82 started
# ps -ef |grep test82
...cpu EPYC,acpi=on,ss=on,hypervisor=on,erms=on,mpx=on,pcommit=on,clwb=on,pku=on,ospke=on,la57=on,3dnowext=on,3dnow=on,vme=off,fma=off,avx=off,f16c=off,rdrand=off,avx2=off,rdseed=off,sha-ni=off,xsavec=off,misalignsse=off,3dnowprefetch=off,osvw=off,topoext=off,fxsr-opt=off
# virsh dumpxml test82 | grep "<cpu" -A30
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>EPYC</model>
<vendor>AMD</vendor>
<feature policy='require' name='acpi'/>
<feature policy='require' name='ss'/>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='erms'/>
<feature policy='require' name='mpx'/>
<feature policy='require' name='pcommit'/>
<feature policy='require' name='clwb'/>
<feature policy='require' name='pku'/>
<feature policy='require' name='ospke'/>
<feature policy='require' name='la57'/>
<feature policy='require' name='3dnowext'/>
<feature policy='require' name='3dnow'/>
<feature policy='disable' name='vme'/>
<feature policy='disable' name='fma'/>
<feature policy='disable' name='avx'/>
<feature policy='disable' name='f16c'/>
<feature policy='disable' name='rdrand'/>
<feature policy='disable' name='avx2'/>
<feature policy='disable' name='rdseed'/>
<feature policy='disable' name='sha-ni'/>
<feature policy='disable' name='xsavec'/>
<feature policy='disable' name='misalignsse'/>
<feature policy='disable' name='3dnowprefetch'/>
<feature policy='disable' name='osvw'/>
<feature policy='disable' name='topoext'/>
<feature policy='disable' name='fxsr_opt'/>
</cpu>
All the test results are as expected, move this bug to be verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:1587 |