Environment ----------- This is Nested KVM environment: - L1 (F32), launched with KVM 'host-passthrough' libvirt-daemon-kvm-6.1.0-4.fc32.x86_64 qemu-system-x86-4.2.1-1.fc32.x86_64 - L0 (F32), the baremetal server Model: "Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz" libvirt-daemon-kvm-7.0.0-1.fc32.x86_64 qemu-system-x86-5.2.0-4.fc32.x86_64 Reproducer ---------- (1) Given an L2 KVM guest the below CPU configuration: [root@l1-vm ~]# virsh dumpxml cvm1 --inactive | xpath -q -e '//cpu/' <cpu mode="custom" match="exact" check="full"> <model fallback="forbid">Nehalem-IBRS</model> <feature policy="require" name="md-clear" /> <feature policy="require" name="pcid" /> <feature policy="require" name="ssbd" /> </cpu> (2) And start it: `virsh start cvm1` Actual result ------------- It fails to start with: [root@l1-vm ~]# virsh start cvm1 error: Failed to start domain cvm1 error: operation failed: guest CPU doesn't match specification: extra features: vme,x2apic,hypervisor Expected result --------------- L2 guest should start. I did not ask it to give the extra features "vme", "x2apic", and "hypervisor". I only asked it to start with a usable CPU model, Nehalem-IBRS; and the supported CPU flags: 'md-clear', 'pcid', and 'ssbd'.
From L1's `virsh capabilities: ----------------------------------------------------------------------- [root@l1-vm ~]# virsh capabilities | fgrep 'feature name' <feature name='vme'/> <feature name='ss'/> <feature name='pclmuldq'/> <feature name='vmx'/> <feature name='fma'/> <feature name='pdcm'/> <feature name='pcid'/> <feature name='x2apic'/> <feature name='movbe'/> <feature name='tsc-deadline'/> <feature name='xsave'/> <feature name='osxsave'/> <feature name='avx'/> <feature name='f16c'/> <feature name='rdrand'/> <feature name='hypervisor'/> <feature name='arat'/> <feature name='fsgsbase'/> <feature name='tsc_adjust'/> <feature name='bmi1'/> <feature name='avx2'/> <feature name='smep'/> <feature name='bmi2'/> <feature name='erms'/> <feature name='invpcid'/> <feature name='umip'/> <feature name='md-clear'/> <feature name='stibp'/> <feature name='arch-capabilities'/> <feature name='ssbd'/> <feature name='xsaveopt'/> <feature name='pdpe1gb'/> <feature name='rdtscp'/> <feature name='abm'/> <feature name='ibpb'/> <feature name='amd-ssbd'/> <feature name='skip-l1dfl-vmentry'/> ----------------------------------------------------------------------- From L1's `virsh domcapabilties`: ----------------------------------------------------------------------- [root@l1-vm ~]# virsh domcapabilities | fgrep 'feature policy' <feature policy='require' name='vme'/> <feature policy='require' name='ss'/> <feature policy='require' name='vmx'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='f16c'/> <feature policy='require' name='rdrand'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='arat'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='umip'/> <feature policy='require' name='md-clear'/> <feature policy='require' name='stibp'/> <feature policy='require' name='arch-capabilities'/> <feature policy='require' name='ssbd'/> <feature policy='require' name='xsaveopt'/> <feature policy='require' name='pdpe1gb'/> <feature policy='require' name='abm'/> <feature policy='require' name='ibpb'/> <feature policy='require' name='amd-ssbd'/> <feature policy='require' name='skip-l1dfl-vmentry'/> <feature policy='disable' name='aes'/> -----------------------------------------------------------------------
From L0's `virsh capabilities`: ----------------------------------------------------------------------- [root@l0-baremetal ~]# virsh capabilities | fgrep 'feature name' <feature name='vme'/> <feature name='ds'/> <feature name='acpi'/> <feature name='ss'/> <feature name='ht'/> <feature name='tm'/> <feature name='pbe'/> <feature name='pclmuldq'/> <feature name='dtes64'/> <feature name='monitor'/> <feature name='ds_cpl'/> <feature name='vmx'/> <feature name='smx'/> <feature name='est'/> <feature name='tm2'/> <feature name='fma'/> <feature name='xtpr'/> <feature name='pdcm'/> <feature name='pcid'/> <feature name='dca'/> <feature name='x2apic'/> <feature name='movbe'/> <feature name='tsc-deadline'/> <feature name='xsave'/> <feature name='osxsave'/> <feature name='avx'/> <feature name='f16c'/> <feature name='rdrand'/> <feature name='arat'/> <feature name='fsgsbase'/> <feature name='tsc_adjust'/> <feature name='bmi1'/> <feature name='avx2'/> <feature name='smep'/> <feature name='bmi2'/> <feature name='erms'/> <feature name='invpcid'/> <feature name='cmt'/> <feature name='md-clear'/> <feature name='stibp'/> <feature name='ssbd'/> <feature name='xsaveopt'/> <feature name='pdpe1gb'/> <feature name='rdtscp'/> <feature name='abm'/> <feature name='invtsc'/> ----------------------------------------------------------------------- From L0's `virsh domcapabilities`: ----------------------------------------------------------------------- [root@l0-baremetal ~]# virsh domcapabilities | fgrep 'feature policy' <feature policy='require' name='vme'/> <feature policy='require' name='ss'/> <feature policy='require' name='vmx'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='f16c'/> <feature policy='require' name='rdrand'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='arat'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='umip'/> <feature policy='require' name='md-clear'/> <feature policy='require' name='stibp'/> <feature policy='require' name='arch-capabilities'/> <feature policy='require' name='ssbd'/> <feature policy='require' name='xsaveopt'/> <feature policy='require' name='pdpe1gb'/> <feature policy='require' name='abm'/> <feature policy='require' name='invtsc'/> <feature policy='require' name='ibpb'/> <feature policy='require' name='amd-stibp'/> <feature policy='require' name='amd-ssbd'/> <feature policy='require' name='skip-l1dfl-vmentry'/> <feature policy='require' name='pschange-mc-no'/> <feature policy='disable' name='aes'/> -----------------------------------------------------------------------
Guest QEMU command-line: ----------------------------------------------------------------------- 2021-02-12 12:38:35.090+0000: starting up libvirt version: 6.1.0, package: 4.fc32 (Fedora Project, 2020-06-02-17:50:10, ), qemu version: 4.2.1qemu-4.2.1-1.fc32, kernel: 5.10.13-100.fc32.x86_64 [...] /usr/bin/qemu-system-x86_64 \ -name guest=cvm1,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-8-cvm1/master-key.aes \ -machine pc-q35-4.2,accel=kvm,usb=off,dump-guest-core=off \ -cpu Broadwell-IBRS,md-clear=on,pcid=on,ssbd=on \ -m 512 \ -overcommit mem-lock=off \ -smp 2,sockets=2,cores=1,threads=1 \ -uuid 65be8765-9c9b-4b1c-80e8-6a60bd891310 \ -display none \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=40,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global ICH9-LPC.disable_s3=1 \ -global ICH9-LPC.disable_s4=1 \ -boot strict=on \ -device pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \ -device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 \ -device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 \ -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 \ -device pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4 \ -device pcie-root-port,port=0xd,chassis=6,id=pci.6,bus=pcie.0,addr=0x1.0x5 \ -device pcie-root-port,port=0xe,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x6 \ -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/cirros-0.5.1.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \ -device virtio-blk-pci,scsi=off,bus=pci.4,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1 \ -netdev tap,fd=42,id=hostnet0,vhost=on,vhostfd=43 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c7:a1:b7,bus=pci.1,addr=0x0 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -chardev socket,id=charchannel0,fd=44,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \ -device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 \ -object rng-random,id=objrng0,filename=/dev/urandom \ -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on char device redirected to /dev/pts/4 (label charserial0) 2021-02-12T12:38:35.445881Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.01H:ECX.aes [bit 25] 2021-02-12T12:38:35.446008Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.hle [bit 4] 2021-02-12T12:38:35.446020Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.rtm [bit 11] 2021-02-12T12:38:35.446030Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.rdseed [bit 18] 2021-02-12T12:38:35.446041Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.adx [bit 19] 2021-02-12T12:38:35.446051Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.smap [bit 20] 2021-02-12T12:38:35.446062Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.3dnowprefetch [bit 8] 2021-02-12T12:38:35.454373Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.01H:ECX.aes [bit 25] 2021-02-12T12:38:35.454402Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.hle [bit 4] 2021-02-12T12:38:35.454413Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.rtm [bit 11] 2021-02-12T12:38:35.454423Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.rdseed [bit 18] 2021-02-12T12:38:35.454433Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.adx [bit 19] 2021-02-12T12:38:35.454443Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.smap [bit 20] 2021-02-12T12:38:35.454456Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.3dnowprefetch [bit 8] 2021-02-12 12:38:37.668+0000: shutting down, reason=failed 2021-02-12T12:38:37.671141Z qemu-system-x86_64: terminating on signal 15 from pid 244485 (/usr/sbin/libvirtd) -----------------------------------------------------------------------
This behavior would be better described as "libvirt is now telling you more information about what's actually exposed to the guest (and always was)". "x2apic" and "hypervisor" were always enabled in KVM guests by QEMU. Maybe libvirt was not including this explicitly in the domain XML, but now it is. I will investigate about "vme", but my guess is that this is a similar situation: libvirt's view of the CPU model probably doesn't include "vme", but QEMU is enabling it. libvirt seems to be updating the domain XML to reflect that and reflect more precisely what is being seen by the guest.
Yeah, Nehalem (and Nehalem-IBRS) do include VME: v4.2.1:target/i386/cpu.c- .name = "Nehalem", v4.2.1:target/i386/cpu.c- .level = 11, v4.2.1:target/i386/cpu.c- .vendor = CPUID_VENDOR_INTEL, v4.2.1:target/i386/cpu.c- .family = 6, v4.2.1:target/i386/cpu.c- .model = 26, v4.2.1:target/i386/cpu.c- .stepping = 3, v4.2.1:target/i386/cpu.c- .features[FEAT_1_EDX] = v4.2.1:target/i386/cpu.c: CPUID_VME | CPUID_SSE2 | CPUID_SSE | CPUID_FXSR | CPUID_MMX | So libvirt is not including something that was not asked for. 'vme', 'x2apic' and 'hypervisor' are all part of what "Nehalem-IBRS" means for QEMU. Now, that doesn't explain the "guest CPU doesn't match specification" error. Is it a side effect of the match='exact' attribute?
(In reply to Eduardo Habkost from comment #5) > Yeah, Nehalem (and Nehalem-IBRS) do include VME: > > v4.2.1:target/i386/cpu.c- .name = "Nehalem", > v4.2.1:target/i386/cpu.c- .level = 11, > v4.2.1:target/i386/cpu.c- .vendor = CPUID_VENDOR_INTEL, > v4.2.1:target/i386/cpu.c- .family = 6, > v4.2.1:target/i386/cpu.c- .model = 26, > v4.2.1:target/i386/cpu.c- .stepping = 3, > v4.2.1:target/i386/cpu.c- .features[FEAT_1_EDX] = > v4.2.1:target/i386/cpu.c: CPUID_VME | CPUID_SSE2 | CPUID_SSE | > CPUID_FXSR | CPUID_MMX | > > So libvirt is not including something that was not asked for. 'vme', > 'x2apic' and 'hypervisor' are all part of what "Nehalem-IBRS" means for QEMU. I see; thanks for the investigation. > Now, that doesn't explain the "guest CPU doesn't match specification" error. > Is it a side effect of the match='exact' attribute? I think so (but I did not add the match='exact' manually); perhaps Jiri can confirm.
You asked for check='full' (rather then check='partial'), which means libvirt will produce the "guest CPU doesn't match specification" error if the virtual CPU is not exactly what you asked for. And because it considers libvirt's internal definition of CPU models when checking this, you need to be careful. In most cases check='full' will fail unless you copy the model and features from domcapabilities or otherwise make sure the list of features matches what QEMU thinks about the CPU model. For example, once a domain is started, libvirt automatically changes the live definition to contain check='full' because it knows exactly how the virtual CPU looks like and you can copy that definition and use it when defining another domain.