1439933 – With 'host-passthrough' ('or 'host-model'), libvirt incorrectly tries to enable 'INVPCID' CPU instruction

Bug 1439933 - With 'host-passthrough' ('or 'host-model'), libvirt incorrectly tries to enable 'INVPCID' CPU instruction

Summary: With 'host-passthrough' ('or 'host-model'), libvirt incorrectly tries to enab...

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Virtualization Tools
Classification:	Community
Component:	libvirt
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Jiri Denemark
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-04-06 22:05 UTC by Kashyap Chamarthy
Modified:	2017-04-19 14:51 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-04-19 14:51:51 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Output of 'quer-cpu-model-expansion' QMP command from bare metal (8.31 KB, text/plain) 2017-04-06 22:11 UTC, Kashyap Chamarthy	no flags	Details
Output of 'query-cpu-model-expansion' QMP command from L1 guest (8.10 KB, text/plain) 2017-04-06 22:18 UTC, Kashyap Chamarthy	no flags	Details
Guest hypervisor ('l1-f25') libvirt XML (4.17 KB, text/plain) 2017-04-07 08:31 UTC, Kashyap Chamarthy	no flags	Details
Nested guest ('l2-f25') libvirt XML - extracted from libvirt debug log with log filters (1.64 KB, text/plain) 2017-04-07 08:36 UTC, Kashyap Chamarthy	no flags	Details
`virsh capabilities` output from bare metal host (L0) (10.51 KB, text/plain) 2017-04-07 08:39 UTC, Kashyap Chamarthy	no flags	Details
`virsh-capabilities` output from L1 guest (9.91 KB, text/plain) 2017-04-07 08:44 UTC, Kashyap Chamarthy	no flags	Details
`virsh domcapabilities` output from bare metal host (L0) (3.85 KB, text/plain) 2017-04-07 09:58 UTC, Kashyap Chamarthy	no flags	Details
`virsh domcapabilities` output from L1 guest (3.75 KB, text/plain) 2017-04-07 10:03 UTC, Kashyap Chamarthy	no flags	Details
View All

Description Kashyap Chamarthy 2017-04-06 22:05:53 UTC

Description
-----------

[This is a nested KVM environment.]

Trying to import a disk image into libvirt, as a nested guest (L2), with
the guest hypervisor's (L1) CPU mode as either 'host-passthrough' or
'host-model', results in libvirt (on L1) incorrectly trying to enable
the 'INVPCID' CPU instruction, and failing to import the nested guest.

The baremetal host has the 'INVPCID' instruction enabled:

        "invpcid": true

And the level-1 guest does not have it enabled:

        "invpcid": false

Confirmed it by running the 'query-cpu-model-expansion' instruction (thanks:
Eduardo Habkost) on both L0 & L1:

        $ qemu-system-x86_64 -machine pc-i440fx-2.9,accel=kvm -display none \
                 -nodefconfig -nodefaults -m 512  -device virtio-scsi-pci,id=scsi \
                 -device virtio-serial-pci  \
             -blockdev node-name=foo,driver=qcow2,file.driver=file,file.filename=./cirros-0.3.5.qcow2 \
                 -qmp-pretty stdio 

        {"execute": "query-cpu-model-expansion", "arguments": {"model": {"name": "max"}, "type": "full"}}

Version
-------

- L0 and L1 libvirt & QEMU:

    $ rpm -q libvirt-daemon-kvm qemu-system-x86
    libvirt-daemon-kvm-3.2.0-1.fc25.x86_64
    qemu-system-x86-2.9.0-0.1.rc3.fc25.x86_64

- L0 Kernel: 4.9.14-200.fc25.x86_64
- L1 Kernel: 4.10.8-200.fc25.x86_64


Steps to reproduce
------------------

(1) Boot a level-1 Fedora guest, and ensure to enable either
        'host-passthrough' (or 'host-model') attributes to the level1 guest
        (requires reboot of level-1 guest): 

    $ virt-xml guest-hyp \
    --edit \
    --cpu host-passthrough,clearxml=yes

        
(2) On the level-1 guest hypervisor (which should have /dev/kvm exposed within
    now), try to import a Fedora 25 disk image

    $ virt-install --name f25-l2 --ram 512 \
        --disk path=./Fedora-Cloud-Base-25-1.3.x86_64.qcow2 \
        --nographics --import --os-variant fedora25

Actual result
-------------

Import of the nested guest fails with:

    $ virt-install --name f25-l2 --ram 512 \
        --disk path=./Fedora-Cloud-Base-25-1.3.x86_64.qcow2 \
        --nographics --import --os-variant fedora25
    Starting install...
    ERROR    the CPU is incompatible with host CPU: Host CPU does not provide required features: invpcid

Expected result
---------------

Libvirt on guest hypervisor, when using 'host-passthrough' / 'host-model' should not enable the 'INVPCID' CPU instruction -- when QEMU 


Additional info
---------------

Using a named CPU model (like '--cpu IvyBridge') with `virt-install` succeeds .

Comment 1 Kashyap Chamarthy 2017-04-06 22:11:34 UTC

Created attachment 1269492 [details]
Output of 'quer-cpu-model-expansion' QMP command from bare metal

Comment 2 Kashyap Chamarthy 2017-04-06 22:18:07 UTC

Created attachment 1269493 [details]
Output of 'query-cpu-model-expansion' QMP command from L1 guest

Comment 3 Jiri Denemark 2017-04-07 06:31:27 UTC

Would you mind attaching the domain XMLs for both guest-hyp and f25-l2 as well as virsh capabilities and virsh domcapabilities from both the host and guest-hyp?

Comment 4 Kashyap Chamarthy 2017-04-07 08:31:48 UTC

Created attachment 1269610 [details]
Guest hypervisor ('l1-f25') libvirt XML

Comment 5 Kashyap Chamarthy 2017-04-07 08:36:00 UTC

Created attachment 1269611 [details]
Nested guest ('l2-f25') libvirt XML - extracted from libvirt debug log with log filters

Comment 6 Kashyap Chamarthy 2017-04-07 08:39:44 UTC

Created attachment 1269613 [details]
`virsh capabilities` output from bare metal host (L0)

Comment 7 Kashyap Chamarthy 2017-04-07 08:44:34 UTC

Created attachment 1269617 [details]
`virsh-capabilities` output from L1 guest

Comment 8 Kashyap Chamarthy 2017-04-07 08:54:34 UTC

Looking at v.3.2.0 libvirt code, seems to be coming from line 1707 in function virCPUx86Compare(), from src/cpu/cpu_x86.c:

   [...]
   1699         if (failIncompatible) {
   1700             ret = VIR_CPU_COMPARE_ERROR;
   1701             if (message) {
   1702                 if (noTSX) {
   1703                     virReportError(VIR_ERR_CPU_INCOMPATIBLE,
   1704                                    _("%s; try using '%s-noTSX' CPU model"),
   1705                                    message, cpu->model);
   1706                 } else {
   1707                     virReportError(VIR_ERR_CPU_INCOMPATIBLE, "%s", message);
   1708                 }
   1709             } else {
   1710                 if (noTSX) {
   1711                     virReportError(VIR_ERR_CPU_INCOMPATIBLE,
   1712                                    _("try using '%s-noTSX' CPU model"),
   1713                                    cpu->model);
   1714                 } else {
   1715                     virReportError(VIR_ERR_CPU_INCOMPATIBLE, NULL);
   1716                 }
   1717             }
   1718         }
   1719     }
   [...]


$ git show 7f127ded --stat
commit 7f127ded657b24e0e55cd5f3539ef5b2dc935908
Author: Jiri Denemark <jdenemar>
Date:   Tue Aug 9 13:26:53 2016 +0200

    cpu: Rework cpuCompare* APIs
    
    Both cpuCompare* APIs are renamed to virCPUCompare*. And they should now
    work for any guest CPU definition, i.e., even for host-passthrough
    (trivial) and host-model CPUs. The implementation in x86 driver is
    enhanced to provide a hint about -noTSX Broadwell and Haswell models
    when appropriate.
    
    Signed-off-by: Jiri Denemark <jdenemar>

 src/cpu/cpu.c            | 42 ++++++++++++++++++++++++++----------------
 src/cpu/cpu.h            | 21 +++++++++++----------
 src/cpu/cpu_arm.c        |  8 ++++----
 src/cpu/cpu_ppc64.c      | 15 +++++++++++++--
 src/cpu/cpu_x86.c        | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------
 src/libvirt_private.syms |  4 ++--
 src/libxl/libxl_driver.c | 14 ++------------
 src/qemu/qemu_driver.c   | 14 ++------------
 tests/cputest.c          |  4 ++--
 9 files changed, 126 insertions(+), 80 deletions(-)

Comment 9 Kashyap Chamarthy 2017-04-07 09:58:10 UTC

Created attachment 1269663 [details]
`virsh domcapabilities` output from bare metal host (L0)

Comment 10 Kashyap Chamarthy 2017-04-07 10:03:17 UTC

Created attachment 1269665 [details]
`virsh domcapabilities` output from L1 guest

Comment 11 Kashyap Chamarthy 2017-04-07 10:12:11 UTC

L1's QEMU command-line, generated by libvirt:

----
qemu     13920     1  2 Apr06 ?        00:29:40 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name guest=l1-f25,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-7-l1-f25/master-key.aes -machine pc-i440fx-2.7,accel=kvm,usb=off,dump-guest-core=off -cpu host -m 8192 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid f8ba0b36-0121-4b86-9fc5-78ec297bd90a -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-7-l1-f25/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/home/kashyapc/vmimages/l1-f25.raw,format=raw,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,host_mtu=1500,netdev=hostnet0,id=net0,mac=52:54:00:3e:c7:0f,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-7-l1-f25/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -msg timestamp=on
----

Comment 12 Jiri Denemark 2017-04-07 10:45:36 UTC

Interesting, so in other words, the host CPU supports invpcid and both QEMU
and libvirt agree:

    - libvirt detected host CPU as Haswell-noTSX (which contains invpcid)
    - QEMU reports "invpcid": true for "max" CPU model and libvirt correctly
      parses it as can be seen in domcapabilities XML

But when we try all that in the L1 guest, situation changes:

    - libvirt detects L1 CPU as Haswell-noTSX which means invpcid CPUID bit
      should be set
    - but QEMU reports "invpcid": false for "max" CPU and libvirt correctly
      parses it and adds <feature policy='disable' name='invpcid'/> in
      domcapabilities XML

The question is why QEMU doesn't want to enable invpcid for L2.

So, could you please check a few more things?

    - check /proc/cpuinfo in L1 (it should list invpcid)
    - run "qemu-system-x86_64 -machine pc,accel=kvm -cpu Haswell-noTSX,enforce"
      in L1 guest and see if it complains about unsupported invpcid


BTW, IvyBridge CPU model works because it doesn't enable invpcid.

Comment 13 Kashyap Chamarthy 2017-04-07 12:05:11 UTC

(In reply to Jiri Denemark from comment #12)
> Interesting, so in other words, the host CPU supports invpcid and both QEMU
> and libvirt agree:
> 
>     - libvirt detected host CPU as Haswell-noTSX (which contains invpcid)
>     - QEMU reports "invpcid": true for "max" CPU model and libvirt correctly
>       parses it as can be seen in domcapabilities XML
> 
> But when we try all that in the L1 guest, situation changes:
> 
>     - libvirt detects L1 CPU as Haswell-noTSX which means invpcid CPUID bit
>       should be set
>     - but QEMU reports "invpcid": false for "max" CPU and libvirt correctly
>       parses it and adds <feature policy='disable' name='invpcid'/> in
>       domcapabilities XML
> 
> The question is why QEMU doesn't want to enable invpcid for L2.

Irrespe
> 
> So, could you please check a few more things?
> 
>     - check /proc/cpuinfo in L1 (it should list invpcid)
>     - run "qemu-system-x86_64 -machine pc,accel=kvm -cpu
> Haswell-noTSX,enforce"
>       in L1 guest and see if it complains about unsupported invpcid

(1) Yes, /proc/cpuinfo on L1 _does_ list 'invpcid':

[l1-f25] $ cat /proc/cpuinfo
[...]
model name      : Intel(R) Core(TM) i5-4670T CPU @ 2.30GHz
[...]
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat
[...]

I also ran the `vmxcap` on L0 and L1:

`vmxcap` on L0:

    Enable INVPCID yes

`vmxcap` on L1:

    Enable INVPCID  no

(2) Also yes, QEMU does complain about unsupported 'invpcid':

  $ qemu-system-x86_64 -machine pc,accel=kvm -cpu Haswell-noTSX,enforce
  Unable to init server: Could not connect: Connection refused
  warning: host doesn't support requested feature: CPUID.07H:EBX.invpcid [bit 10]
  qemu-system-x86_64: Host doesn't support requested features


> BTW, IvyBridge CPU model works because it doesn't enable invpcid.

Yeah, I noticed it later :-)  Thanks for confirming.

Comment 14 Eduardo Habkost 2017-04-07 12:11:09 UTC

(In reply to Kashyap Chamarthy from comment #13)
> I also ran the `vmxcap` on L0 and L1:
> 
> `vmxcap` on L0:
> 
>     Enable INVPCID yes
> 
> `vmxcap` on L1:
> 
>     Enable INVPCID  no

Note that this bit is required to be able to virtualize invpcid, so KVM+QEMU really should report it as unavailable. Probably the L0 host doesn't have the ability to emulate this VMX capability yet.

Comment 15 Eduardo Habkost 2017-04-07 12:12:03 UTC

(In reply to Kashyap Chamarthy from comment #13)
> I also ran the `vmxcap` on L0 and L1:
> 
> `vmxcap` on L0:
> 
>     Enable INVPCID yes
> 
> `vmxcap` on L1:
> 
>     Enable INVPCID  no

Note that this bit is required to be able to virtualize invpcid, so KVM+QEMU really should report it as unavailable. Probably the L0 host doesn't have the ability to emulate this VMX capability yet.

Comment 16 Jiri Denemark 2017-04-07 12:30:29 UTC

So this is caused by the changes which aimed to fix host-model CPUs. Libvirt used to check host CPU features itself and used the result for both host-model and checking guest/host CPU compatibility.

Currently we ask QEMU for the host CPU features so that the CPU we use for host-model matches what QEMU could do on the host. And the CPU specs we get from QEMU is used for checking guest/host CPU compatibility. This is more correct, but it introduces a regression: when a host CPU supports some feature which QEMU/KVM will filter out, current libvirt will report an error when someone tries to enable it for a guest while older libvirt would happily pass it to QEMU which would filter it out, i.e., the guest would start, but would not get the feature.

We can't really get back to what old libvirt was doing since QEMU/KVM can even enable some features the host does not support and we don't want to refuse to start domains which want these features. I think we need to use the CPU from QEMU for host-model and a union of the CPU from QEMU and the CPU we probed for checking whether a given guest CPU can run on the host.

BTW, you could work around this bug by adding check='none' attribute to the L2 domain XML:

  <cpu mode="custom" match="exact" check='none'>
    <model>Haswell-noTSX</model>
  </cpu>

Comment 17 Kashyap Chamarthy 2017-04-07 14:18:52 UTC

(In reply to Jiri Denemark from comment #16)
> So this is caused by the changes which aimed to fix host-model CPUs. Libvirt
> used to check host CPU features itself and used the result for both
> host-model and checking guest/host CPU compatibility.

I see.  I think it's this series: 

  https://www.redhat.com/archives/libvir-list/2017-February/msg01295.html
  [PATCH v3 00/28] qemu: Detect host CPU model by asking QEMU on x86_64

> Currently we ask QEMU for the host CPU features so that the CPU we use for
> host-model matches what QEMU could do on the host. And the CPU specs we get
> from QEMU is used for checking guest/host CPU compatibility. This is more
> correct, but it introduces a regression: when a host CPU supports some
> feature which QEMU/KVM will filter out, current libvirt will report an error
> when someone tries to enable it for a guest while older libvirt would
> happily pass it to QEMU which would filter it out, i.e., the guest would
> start, but would not get the feature.

Interesting, thanks for the explanation.

[Just noting for my own edification here, from our conversation from IRC]:

When you write above "[...] when a host CPU supports some feature which QEMU/KVM will filter out [...]", the possible _reasons_ why QEMU / KVM could filter out are:

  - QEMU/KVM will filter it out because it doesn't _yet_ support 
    the said feature

Or:

  - The CPU does not support something else which is needed to virtualize 
    the feature (which is what seems to have happened with the "INVPCID")

> We can't really get back to what old libvirt was doing since QEMU/KVM can
> even enable some features the host does not support and we don't want to
> refuse to start domains which want these features. I think we need to use
> the CPU from QEMU for host-model and a union of the CPU from QEMU and the
> CPU we probed for checking whether a given guest CPU can run on the host.

> BTW, you could work around this bug by adding check='none' attribute to the
> L2 domain XML:
> 
>   <cpu mode="custom" match="exact" check='none'>
>     <model>Haswell-noTSX</model>
>   </cpu>

Oh, I see, the check='none' attribute fixes because the behavior is to leave it to QEMU, which will start the guest _anyway_, as its default behavior, even if the requested CPU feature is not available:

From http://libvirt.org/formatdomain.html#elementsCPU:

"Libvirt does no checking and it is up to the hypervisor to refuse to start the domain if it cannot provide the requested CPU. With QEMU this means no checking is done at all since the default behavior of QEMU is to emit warnings, but start the domain anyway."

Thanks!

Comment 18 Jiri Denemark 2017-04-19 14:51:51 UTC

This is now fixed by

commit 5b4a6adb5ca24a6cb91cdc55c31506fb278d3a91
Refs: v3.2.0-197-g5b4a6adb5
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Apr 11 20:46:05 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Apr 19 16:36:38 2017 +0200

    qemu: Use more data for comparing CPUs

    With QEMU older than 2.9.0 libvirt uses CPUID instruction to determine
    what CPU features are supported on the host. This was later used when
    checking compatibility of guest CPUs. Since QEMU 2.9.0 we ask QEMU for
    the host CPU data. But the two methods we use usually provide disjoint
    sets of CPU features because QEMU/KVM does not support all features
    provided by the host CPU and on the other hand it can enable some
    feature even if the host CPU does not support them.

    So if there is a domain which requires a CPU features disabled by
    QEMU/KVM, libvirt will refuse to start it with QEMU > 2.9.0 as its guest
    CPU is incompatible with the host CPU data we got from QEMU. But such
    domain would happily start on older QEMU (of course, the features would
    be missing the guest CPU). To fix this regression, we need to combine
    both CPU feature sets when checking guest CPU compatibility.

    https://bugzilla.redhat.com/show_bug.cgi?id=1439933

    Signed-off-by: Jiri Denemark <jdenemar>

Note You need to log in before you can comment on or make changes to this bug.