Description ----------- [This is a nested KVM environment.] Trying to import a disk image into libvirt, as a nested guest (L2), with the guest hypervisor's (L1) CPU mode as either 'host-passthrough' or 'host-model', results in libvirt (on L1) incorrectly trying to enable the 'INVPCID' CPU instruction, and failing to import the nested guest. The baremetal host has the 'INVPCID' instruction enabled: "invpcid": true And the level-1 guest does not have it enabled: "invpcid": false Confirmed it by running the 'query-cpu-model-expansion' instruction (thanks: Eduardo Habkost) on both L0 & L1: $ qemu-system-x86_64 -machine pc-i440fx-2.9,accel=kvm -display none \ -nodefconfig -nodefaults -m 512 -device virtio-scsi-pci,id=scsi \ -device virtio-serial-pci \ -blockdev node-name=foo,driver=qcow2,file.driver=file,file.filename=./cirros-0.3.5.qcow2 \ -qmp-pretty stdio {"execute": "query-cpu-model-expansion", "arguments": {"model": {"name": "max"}, "type": "full"}} Version ------- - L0 and L1 libvirt & QEMU: $ rpm -q libvirt-daemon-kvm qemu-system-x86 libvirt-daemon-kvm-3.2.0-1.fc25.x86_64 qemu-system-x86-2.9.0-0.1.rc3.fc25.x86_64 - L0 Kernel: 4.9.14-200.fc25.x86_64 - L1 Kernel: 4.10.8-200.fc25.x86_64 Steps to reproduce ------------------ (1) Boot a level-1 Fedora guest, and ensure to enable either 'host-passthrough' (or 'host-model') attributes to the level1 guest (requires reboot of level-1 guest): $ virt-xml guest-hyp \ --edit \ --cpu host-passthrough,clearxml=yes (2) On the level-1 guest hypervisor (which should have /dev/kvm exposed within now), try to import a Fedora 25 disk image $ virt-install --name f25-l2 --ram 512 \ --disk path=./Fedora-Cloud-Base-25-1.3.x86_64.qcow2 \ --nographics --import --os-variant fedora25 Actual result ------------- Import of the nested guest fails with: $ virt-install --name f25-l2 --ram 512 \ --disk path=./Fedora-Cloud-Base-25-1.3.x86_64.qcow2 \ --nographics --import --os-variant fedora25 Starting install... ERROR the CPU is incompatible with host CPU: Host CPU does not provide required features: invpcid Expected result --------------- Libvirt on guest hypervisor, when using 'host-passthrough' / 'host-model' should not enable the 'INVPCID' CPU instruction -- when QEMU Additional info --------------- Using a named CPU model (like '--cpu IvyBridge') with `virt-install` succeeds .
Created attachment 1269492 [details] Output of 'quer-cpu-model-expansion' QMP command from bare metal
Created attachment 1269493 [details] Output of 'query-cpu-model-expansion' QMP command from L1 guest
Would you mind attaching the domain XMLs for both guest-hyp and f25-l2 as well as virsh capabilities and virsh domcapabilities from both the host and guest-hyp?
Created attachment 1269610 [details] Guest hypervisor ('l1-f25') libvirt XML
Created attachment 1269611 [details] Nested guest ('l2-f25') libvirt XML - extracted from libvirt debug log with log filters
Created attachment 1269613 [details] `virsh capabilities` output from bare metal host (L0)
Created attachment 1269617 [details] `virsh-capabilities` output from L1 guest
Looking at v.3.2.0 libvirt code, seems to be coming from line 1707 in function virCPUx86Compare(), from src/cpu/cpu_x86.c: [...] 1699 if (failIncompatible) { 1700 ret = VIR_CPU_COMPARE_ERROR; 1701 if (message) { 1702 if (noTSX) { 1703 virReportError(VIR_ERR_CPU_INCOMPATIBLE, 1704 _("%s; try using '%s-noTSX' CPU model"), 1705 message, cpu->model); 1706 } else { 1707 virReportError(VIR_ERR_CPU_INCOMPATIBLE, "%s", message); 1708 } 1709 } else { 1710 if (noTSX) { 1711 virReportError(VIR_ERR_CPU_INCOMPATIBLE, 1712 _("try using '%s-noTSX' CPU model"), 1713 cpu->model); 1714 } else { 1715 virReportError(VIR_ERR_CPU_INCOMPATIBLE, NULL); 1716 } 1717 } 1718 } 1719 } [...] $ git show 7f127ded --stat commit 7f127ded657b24e0e55cd5f3539ef5b2dc935908 Author: Jiri Denemark <jdenemar> Date: Tue Aug 9 13:26:53 2016 +0200 cpu: Rework cpuCompare* APIs Both cpuCompare* APIs are renamed to virCPUCompare*. And they should now work for any guest CPU definition, i.e., even for host-passthrough (trivial) and host-model CPUs. The implementation in x86 driver is enhanced to provide a hint about -noTSX Broadwell and Haswell models when appropriate. Signed-off-by: Jiri Denemark <jdenemar> src/cpu/cpu.c | 42 ++++++++++++++++++++++++++---------------- src/cpu/cpu.h | 21 +++++++++++---------- src/cpu/cpu_arm.c | 8 ++++---- src/cpu/cpu_ppc64.c | 15 +++++++++++++-- src/cpu/cpu_x86.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------- src/libvirt_private.syms | 4 ++-- src/libxl/libxl_driver.c | 14 ++------------ src/qemu/qemu_driver.c | 14 ++------------ tests/cputest.c | 4 ++-- 9 files changed, 126 insertions(+), 80 deletions(-)
Created attachment 1269663 [details] `virsh domcapabilities` output from bare metal host (L0)
Created attachment 1269665 [details] `virsh domcapabilities` output from L1 guest
L1's QEMU command-line, generated by libvirt: ---- qemu 13920 1 2 Apr06 ? 00:29:40 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name guest=l1-f25,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-7-l1-f25/master-key.aes -machine pc-i440fx-2.7,accel=kvm,usb=off,dump-guest-core=off -cpu host -m 8192 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid f8ba0b36-0121-4b86-9fc5-78ec297bd90a -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-7-l1-f25/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/home/kashyapc/vmimages/l1-f25.raw,format=raw,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,host_mtu=1500,netdev=hostnet0,id=net0,mac=52:54:00:3e:c7:0f,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-7-l1-f25/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -msg timestamp=on ----
Interesting, so in other words, the host CPU supports invpcid and both QEMU and libvirt agree: - libvirt detected host CPU as Haswell-noTSX (which contains invpcid) - QEMU reports "invpcid": true for "max" CPU model and libvirt correctly parses it as can be seen in domcapabilities XML But when we try all that in the L1 guest, situation changes: - libvirt detects L1 CPU as Haswell-noTSX which means invpcid CPUID bit should be set - but QEMU reports "invpcid": false for "max" CPU and libvirt correctly parses it and adds <feature policy='disable' name='invpcid'/> in domcapabilities XML The question is why QEMU doesn't want to enable invpcid for L2. So, could you please check a few more things? - check /proc/cpuinfo in L1 (it should list invpcid) - run "qemu-system-x86_64 -machine pc,accel=kvm -cpu Haswell-noTSX,enforce" in L1 guest and see if it complains about unsupported invpcid BTW, IvyBridge CPU model works because it doesn't enable invpcid.
(In reply to Jiri Denemark from comment #12) > Interesting, so in other words, the host CPU supports invpcid and both QEMU > and libvirt agree: > > - libvirt detected host CPU as Haswell-noTSX (which contains invpcid) > - QEMU reports "invpcid": true for "max" CPU model and libvirt correctly > parses it as can be seen in domcapabilities XML > > But when we try all that in the L1 guest, situation changes: > > - libvirt detects L1 CPU as Haswell-noTSX which means invpcid CPUID bit > should be set > - but QEMU reports "invpcid": false for "max" CPU and libvirt correctly > parses it and adds <feature policy='disable' name='invpcid'/> in > domcapabilities XML > > The question is why QEMU doesn't want to enable invpcid for L2. Irrespe > > So, could you please check a few more things? > > - check /proc/cpuinfo in L1 (it should list invpcid) > - run "qemu-system-x86_64 -machine pc,accel=kvm -cpu > Haswell-noTSX,enforce" > in L1 guest and see if it complains about unsupported invpcid (1) Yes, /proc/cpuinfo on L1 _does_ list 'invpcid': [l1-f25] $ cat /proc/cpuinfo [...] model name : Intel(R) Core(TM) i5-4670T CPU @ 2.30GHz [...] flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat [...] I also ran the `vmxcap` on L0 and L1: `vmxcap` on L0: Enable INVPCID yes `vmxcap` on L1: Enable INVPCID no (2) Also yes, QEMU does complain about unsupported 'invpcid': $ qemu-system-x86_64 -machine pc,accel=kvm -cpu Haswell-noTSX,enforce Unable to init server: Could not connect: Connection refused warning: host doesn't support requested feature: CPUID.07H:EBX.invpcid [bit 10] qemu-system-x86_64: Host doesn't support requested features > BTW, IvyBridge CPU model works because it doesn't enable invpcid. Yeah, I noticed it later :-) Thanks for confirming.
(In reply to Kashyap Chamarthy from comment #13) > I also ran the `vmxcap` on L0 and L1: > > `vmxcap` on L0: > > Enable INVPCID yes > > `vmxcap` on L1: > > Enable INVPCID no Note that this bit is required to be able to virtualize invpcid, so KVM+QEMU really should report it as unavailable. Probably the L0 host doesn't have the ability to emulate this VMX capability yet.
So this is caused by the changes which aimed to fix host-model CPUs. Libvirt used to check host CPU features itself and used the result for both host-model and checking guest/host CPU compatibility. Currently we ask QEMU for the host CPU features so that the CPU we use for host-model matches what QEMU could do on the host. And the CPU specs we get from QEMU is used for checking guest/host CPU compatibility. This is more correct, but it introduces a regression: when a host CPU supports some feature which QEMU/KVM will filter out, current libvirt will report an error when someone tries to enable it for a guest while older libvirt would happily pass it to QEMU which would filter it out, i.e., the guest would start, but would not get the feature. We can't really get back to what old libvirt was doing since QEMU/KVM can even enable some features the host does not support and we don't want to refuse to start domains which want these features. I think we need to use the CPU from QEMU for host-model and a union of the CPU from QEMU and the CPU we probed for checking whether a given guest CPU can run on the host. BTW, you could work around this bug by adding check='none' attribute to the L2 domain XML: <cpu mode="custom" match="exact" check='none'> <model>Haswell-noTSX</model> </cpu>
(In reply to Jiri Denemark from comment #16) > So this is caused by the changes which aimed to fix host-model CPUs. Libvirt > used to check host CPU features itself and used the result for both > host-model and checking guest/host CPU compatibility. I see. I think it's this series: https://www.redhat.com/archives/libvir-list/2017-February/msg01295.html [PATCH v3 00/28] qemu: Detect host CPU model by asking QEMU on x86_64 > Currently we ask QEMU for the host CPU features so that the CPU we use for > host-model matches what QEMU could do on the host. And the CPU specs we get > from QEMU is used for checking guest/host CPU compatibility. This is more > correct, but it introduces a regression: when a host CPU supports some > feature which QEMU/KVM will filter out, current libvirt will report an error > when someone tries to enable it for a guest while older libvirt would > happily pass it to QEMU which would filter it out, i.e., the guest would > start, but would not get the feature. Interesting, thanks for the explanation. [Just noting for my own edification here, from our conversation from IRC]: When you write above "[...] when a host CPU supports some feature which QEMU/KVM will filter out [...]", the possible _reasons_ why QEMU / KVM could filter out are: - QEMU/KVM will filter it out because it doesn't _yet_ support the said feature Or: - The CPU does not support something else which is needed to virtualize the feature (which is what seems to have happened with the "INVPCID") > We can't really get back to what old libvirt was doing since QEMU/KVM can > even enable some features the host does not support and we don't want to > refuse to start domains which want these features. I think we need to use > the CPU from QEMU for host-model and a union of the CPU from QEMU and the > CPU we probed for checking whether a given guest CPU can run on the host. > BTW, you could work around this bug by adding check='none' attribute to the > L2 domain XML: > > <cpu mode="custom" match="exact" check='none'> > <model>Haswell-noTSX</model> > </cpu> Oh, I see, the check='none' attribute fixes because the behavior is to leave it to QEMU, which will start the guest _anyway_, as its default behavior, even if the requested CPU feature is not available: From http://libvirt.org/formatdomain.html#elementsCPU: "Libvirt does no checking and it is up to the hypervisor to refuse to start the domain if it cannot provide the requested CPU. With QEMU this means no checking is done at all since the default behavior of QEMU is to emit warnings, but start the domain anyway." Thanks!
This is now fixed by commit 5b4a6adb5ca24a6cb91cdc55c31506fb278d3a91 Refs: v3.2.0-197-g5b4a6adb5 Author: Jiri Denemark <jdenemar> AuthorDate: Tue Apr 11 20:46:05 2017 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Wed Apr 19 16:36:38 2017 +0200 qemu: Use more data for comparing CPUs With QEMU older than 2.9.0 libvirt uses CPUID instruction to determine what CPU features are supported on the host. This was later used when checking compatibility of guest CPUs. Since QEMU 2.9.0 we ask QEMU for the host CPU data. But the two methods we use usually provide disjoint sets of CPU features because QEMU/KVM does not support all features provided by the host CPU and on the other hand it can enable some feature even if the host CPU does not support them. So if there is a domain which requires a CPU features disabled by QEMU/KVM, libvirt will refuse to start it with QEMU > 2.9.0 as its guest CPU is incompatible with the host CPU data we got from QEMU. But such domain would happily start on older QEMU (of course, the features would be missing the guest CPU). To fix this regression, we need to combine both CPU feature sets when checking guest CPU compatibility. https://bugzilla.redhat.com/show_bug.cgi?id=1439933 Signed-off-by: Jiri Denemark <jdenemar>