Bug 1815572
Summary: | VM live migration fails: the CPU is incompatible with host CPU: Host CPU does not provide required fea-tures: virt-ssbd | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Oliver Freyermuth <o.freyermuth> |
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.7 | CC: | fjin, hu.zhou, jdenemar, jsuchane, lmen, wienemann, xuzhang, yalzhang |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | libvirt-4.5.0-34.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-09-29 20:29:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Oliver Freyermuth
2020-03-20 15:39:08 UTC
Since this is a migration between two hosts, could you please tell us the version of kernel, qemu-kvm, and libvirt from both the source and the destination host? What happens if you create a new VM with <cpu mode='host-model'/> on kvm010.example.com, is "<feature policy='require' name='virt-ssbd'/>" also present in the output of virsh dumpxml THE_NEW_DOMAIN? Of course: kvm010 (source): Kernel: 3.10.0-1062.18.1.el7.x86_64 qemu-kvm-1.5.3-167.el7_7.4.x86_64 libvirt-4.5.0-23.el7_7.6.x86_64 kmv009 (target): Kernel: 3.10.0-1062.18.1.el7.x86_64 qemu-kvm-1.5.3-167.el7_7.4.x86_64 libvirt-4.5.0-23.el7_7.6.x86_64 Both nodes have been rebooted a few days ago, so they are also running the version which is installed. After creating a new VM on kvm010, the issue prevails: [root@kvm010 ~]# grep host-model /etc/libvirt/qemu/broken-vm.example.com.xml -A2 <cpu mode='host-model' check='partial'> <model fallback='allow'/> </cpu> [root@kvm010 ~]# virsh dumpxml broken-vm.example.com | grep ssbd <feature policy='require' name='virt-ssbd'/> OK, so both hosts correctly enable virt-ssbd (it's emulated by the virtualization stack, hence the "virt-" prefix), but somethings wrong with the host CPU compatibility check. Could you please provide complete domain XML of the running domain (or at least its <cpu> element including children)? Here's the CPU element of the running domain: ---------------- <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>Opteron_G4</model> <vendor>AMD</vendor> <feature policy='require' name='vme'/> <feature policy='disable' name='ht'/> <feature policy='disable' name='monitor'/> <feature policy='disable' name='osxsave'/> <feature policy='require' name='mmxext'/> <feature policy='require' name='fxsr_opt'/> <feature policy='require' name='cmp_legacy'/> <feature policy='disable' name='extapic'/> <feature policy='require' name='cr8legacy'/> <feature policy='require' name='osvw'/> <feature policy='disable' name='ibs'/> <feature policy='disable' name='skinit'/> <feature policy='disable' name='wdt'/> <feature policy='disable' name='nodeid_msr'/> <feature policy='disable' name='topoext'/> <feature policy='disable' name='perfctr_core'/> <feature policy='disable' name='perfctr_nb'/> <feature policy='require' name='ibpb'/> <feature policy='require' name='virt-ssbd'/> <feature policy='require' name='x2apic'/> <feature policy='require' name='hypervisor'/> <feature policy='disable' name='rdtscp'/> <feature policy='disable' name='svm'/> </cpu> ---------------- In case it is also of interest, here's the CPU tags of the capabilities commands: --------------- # virsh capabilities ... <cpu> <arch>x86_64</arch> <model>Opteron_G4</model> <vendor>AMD</vendor> <microcode version='100664894'/> <counter name='tsc' frequency='2299999000'/> <topology sockets='1' cores='64' threads='1'/> <feature name='vme'/> <feature name='ht'/> <feature name='monitor'/> <feature name='osxsave'/> <feature name='mmxext'/> <feature name='fxsr_opt'/> <feature name='cmp_legacy'/> <feature name='extapic'/> <feature name='cr8legacy'/> <feature name='osvw'/> <feature name='ibs'/> <feature name='skinit'/> <feature name='wdt'/> <feature name='lwp'/> <feature name='nodeid_msr'/> <feature name='topoext'/> <feature name='perfctr_core'/> <feature name='perfctr_nb'/> <feature name='invtsc'/> <feature name='ibpb'/> <pages unit='KiB' size='4'/> <pages unit='KiB' size='2048'/> <pages unit='KiB' size='1048576'/> </cpu> ... --------------- # virsh domcapabilities ... <cpu> <mode name='host-passthrough' supported='yes'/> <mode name='host-model' supported='yes'> <model fallback='allow'>Opteron_G4</model> <vendor>AMD</vendor> <feature policy='require' name='vme'/> <feature policy='require' name='ht'/> <feature policy='require' name='monitor'/> <feature policy='require' name='osxsave'/> <feature policy='require' name='mmxext'/> <feature policy='require' name='fxsr_opt'/> <feature policy='require' name='cmp_legacy'/> <feature policy='require' name='extapic'/> <feature policy='require' name='cr8legacy'/> <feature policy='require' name='osvw'/> <feature policy='require' name='ibs'/> <feature policy='require' name='skinit'/> <feature policy='require' name='wdt'/> <feature policy='require' name='nodeid_msr'/> <feature policy='require' name='topoext'/> <feature policy='require' name='perfctr_core'/> <feature policy='require' name='perfctr_nb'/> <feature policy='require' name='invtsc'/> <feature policy='require' name='ibpb'/> </mode> <mode name='custom' supported='yes'> <model usable='unknown'>EPYC-IBPB</model> ... ... --------------- And the actual flags the CPU reports: --------------- # cat /proc/cpuinfo | grep flags | head -n1 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate retpoline_amd ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold --------------- Let me know if there's anything else of interest. This bug can be easily reproduced even on a single host using save/restore: 1. define a domain with <cpu mode='host-model'/> 2. virsh start $DOM 3. virsh managedsave $DOM 4. virsh start $DOM error: Failed to start domain $DOM error: the CPU is incompatible with host CPU: Host CPU does not provide required features: virt-ssbd The bug is triggered by a RHEL-7 only hack for bug 1745181, which adds virt-ssbd feature to all host-model CPUs on AMD hosts. Depending on the state of virt-ssbd in the freshly started VM, it may or may not appear in the domain definition. For compatibility with old QEMU the domain XML used for migration and save/restore contains the original CPU definition (used when starting a domain) with check='partial' and the real CPU definition (modified to match the actual CPU created by QEMU) with check='full' is stored separately in a cookie. With QEMU 1.5.3 domains are started with the CPU definition from domain XML and the CPU def in cookie is ignored. Since the original CPU definition is stored after we add virt-ssbd (for bug 1745181), the domain XML will contain this feature. However, neither host CPU capabilities nor domain capabilities contain virt-ssbd (as QEMU 1.5.3 is too old to report its support) and libvirt complains the host CPU does not support virt-ssbd when checking compatibility of the CPU definition with virt-ssbd. Reproduced this issue with libvirt-4.5.0-33.el7.x86_64. Version: libvirt-4.5.0-33.el7.x86_64 qemu-kvm-1.5.3-174.el7.x86_64 kernel-3.10.0-1144.el7.x86_64 Steps: 1. Prepare a shutdown VM with the following conf # virsh domstate test79 shut off # virsh dumpxml test79 --inactive | grep "<cpu" -A2 <cpu mode='host-model' check='partial'> <model fallback='allow'/> </cpu> 2. Start VM and check active dumpxml # virsh start test79 Domain test79 started # virsh dumpxml test79 | grep "<cpu" -A23 <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <feature policy='disable' name='ht'/> <feature policy='disable' name='osxsave'/> <feature policy='require' name='cmp_legacy'/> <feature policy='disable' name='extapic'/> <feature policy='disable' name='skinit'/> <feature policy='disable' name='wdt'/> <feature policy='disable' name='tce'/> <feature policy='disable' name='topoext'/> <feature policy='disable' name='perfctr_core'/> <feature policy='disable' name='perfctr_nb'/> <feature policy='require' name='virt-ssbd'/> ****** <feature policy='disable' name='monitor'/> <feature policy='require' name='hypervisor'/> <feature policy='disable' name='arat'/> <feature policy='disable' name='svm'/> </cpu> 3. Managedsave VM and then start VM # virsh managedsave test79 Domain test79 state saved by libvirt # virsh start test79 error: Failed to start domain test79 error: the CPU is incompatible with host CPU: Host CPU does not provide required features: virt-ssbd Verified this issue with libvirt-4.5.0-36.el7.x86_64. 1. Upgrade libvirt and restart libvirtd service based on the previous comment # yum update libvirt* -y # rpm -qa libvirt libvirt-4.5.0-36.el7.x86_64 # systemctl restart libvirtd 2. Prepare a shutdowm VM with the following conf # virsh domstate test79-new shut off # virsh dumpxml test79-new --inactive | grep "<cpu" -A2 <cpu mode='host-model' check='partial'> <model fallback='allow'/> </cpu> 3. Start VM, then check active dumpxml and qemu cmd line and guest cpu flag # virsh start test79-new Domain test79-new started # virsh dumpxml test79-new | grep "<cpu" -A18 <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <feature policy='disable' name='ht'/> <feature policy='disable' name='osxsave'/> <feature policy='require' name='cmp_legacy'/> <feature policy='disable' name='extapic'/> <feature policy='disable' name='skinit'/> <feature policy='disable' name='wdt'/> <feature policy='disable' name='tce'/> <feature policy='disable' name='topoext'/> <feature policy='disable' name='perfctr_core'/> <feature policy='disable' name='perfctr_nb'/> ** <feature policy='require' name='virt-ssbd'/> ** <feature policy='disable' name='monitor'/> <feature policy='require' name='hypervisor'/> <feature policy='disable' name='arat'/> <feature policy='disable' name='svm'/> </cpu> # ps -ef | grep test79-new -cpu EPYC-IBPB,+ht,+osxsave,+cmp_legacy,+extapic,+skinit,+wdt,+tce,+topoext,+perfctr_core,+perfctr_nb,** +virt-ssbd ** # virsh console test79-new Connected to domain test79-new Escape character is ^] Red Hat Enterprise Linux Server 7.9 Beta (Maipo) Kernel 3.10.0-1136.el7.x86_64 on an x86_64 localhost login: root Password: [root@localhost ~]# lscpu | grep ssbd Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm art rep_good nopl extd_apicid eagerfpu pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw retpoline_amd ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 ** virt_ssbd ** arat 4. Managedsave VM and then start VM # virsh managedsave test79-new Domain test79-new state saved by libvirt # virsh start test79-new Domain test79-new started # virsh console test79-new Connected to domain test79-new Escape character is ^] [root@localhost ~]# lscpu | grep ssbd Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm art rep_good nopl extd_apicid eagerfpu pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw retpoline_amd ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 ** virt_ssbd ** arat All the test results are as expected, move this bug to be verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: libvirt security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4000 Sadly, the originally reported issue seems not fixed / shows up in a different way now. While "managedsave" is fixed, live-migration still fails! Starting a VM freshly shows this in the /var/log/libvirt/qemu/FQDN.log: -cpu Opteron_G4,+vme,+ht,+monitor,+osxsave,+mmxext,+fxsr_opt,+cmp_legacy,+extapic,+cr8legacy,+osvw,+ibs,+skinit,+wdt,+nodeid_msr,+topoext,+perfctr_core,+perfctr_nb,+ibpb,+virt-ssbd \ However, migrating it to another hypervisor reveals this when qemu is started during migration: -cpu Opteron_G4,+vme,+ht,+monitor,+osxsave,+mmxext,+fxsr_opt,+cmp_legacy,+extapic,+cr8legacy,+osvw,+ibs,+skinit,+wdt,+nodeid_msr,+topoext,+perfctr_core,+perfctr_nb,+ibpb \ Both nodes are running 7.9 (i.e. the updated packages). The migrated VM freezes a few seconds after migration. Should I report this in a new issue, since the symptoms are different? I have reported this as a new issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1897948 following the statement "If the solution does not work for you, open a new bug report.". |