Bug 1474874

Summary: libvirt qemu cache needs to be cleared to notice kvm module nested= setting change
Product: [Fedora] Fedora Reporter: jniederm
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 26CC: agedosier, berrange, clalancette, crobinso, itamar, laine, libvirt-maint, phrdina, pkotas, veillard, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-29 12:12:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log_and_xml_dump.tar.gz none

Description jniederm 2017-07-25 14:30:36 UTC
Created attachment 1304262 [details]
log_and_xml_dump.tar.gz

Description of problem:
Nested virtualization doesn't work, virtualized host doesn't get 'vmx' cpu flag.

Version-Release number of selected component (if applicable):
libvirt-daemon-config-network-3.5.0-1.fc26.x86_64
libvirt-daemon-driver-storage-scsi-3.5.0-1.fc26.x86_64
libvirt-daemon-driver-network-3.5.0-1.fc26.x86_64
libvirt-gconfig-1.0.0-2.fc26.x86_64
libvirt-libs-3.5.0-1.fc26.x86_64
libvirt-daemon-driver-nodedev-3.5.0-1.fc26.x86_64
libvirt-glib-1.0.0-2.fc26.x86_64
libvirt-daemon-driver-storage-3.5.0-1.fc26.x86_64
libvirt-daemon-driver-storage-core-3.5.0-1.fc26.x86_64
libvirt-daemon-driver-storage-iscsi-3.5.0-1.fc26.x86_64
libvirt-daemon-kvm-3.5.0-1.fc26.x86_64
libvirt-daemon-driver-storage-rbd-3.5.0-1.fc26.x86_64
libvirt-daemon-driver-nwfilter-3.5.0-1.fc26.x86_64
libvirt-gobject-1.0.0-2.fc26.x86_64
libvirt-daemon-driver-qemu-3.5.0-1.fc26.x86_64
libvirt-daemon-driver-storage-disk-3.5.0-1.fc26.x86_64
libvirt-daemon-3.5.0-1.fc26.x86_64
libvirt-daemon-driver-storage-logical-3.5.0-1.fc26.x86_64
libvirt-daemon-driver-interface-3.5.0-1.fc26.x86_64
libvirt-daemon-driver-secret-3.5.0-1.fc26.x86_64
libvirt-daemon-driver-storage-gluster-3.5.0-1.fc26.x86_64
libvirt-daemon-driver-storage-mpath-3.5.0-1.fc26.x86_64
libvirt-python-3.5.0-2.fc26.x86_64
libvirt-client-3.5.0-1.fc26.x86_64
libvirt-daemon-driver-storage-sheepdog-3.5.0-1.fc26.x86_64
qemu-img-2.9.0-7.fc26.x86_64
qemu-block-ssh-2.9.0-7.fc26.x86_64
qemu-system-x86-2.9.0-7.fc26.x86_64
qemu-common-2.9.0-7.fc26.x86_64
qemu-block-iscsi-2.9.0-7.fc26.x86_64
qemu-guest-agent-2.9.0-7.fc26.x86_64
qemu-block-rbd-2.9.0-7.fc26.x86_64
qemu-system-x86-core-2.9.0-7.fc26.x86_64
qemu-block-dmg-2.9.0-7.fc26.x86_64
ipxe-roms-qemu-20161108-2.gitb991c67.fc26.noarch
qemu-block-curl-2.9.0-7.fc26.x86_64
libvirt-daemon-driver-qemu-3.5.0-1.fc26.x86_64
qemu-kvm-2.9.0-7.fc26.x86_64
qemu-block-nfs-2.9.0-7.fc26.x86_64
qemu-block-gluster-2.9.0-7.fc26.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Install virt-manager
2. Make sure virtualization is enabled (`grep vmx /proc/cpuinfo` finds something)
3. Make sure nested virtualization is enabled (`cat /sys/module/kvm_intel/parameters/nested` prints 'Y', https://fedoraproject.org/wiki/How_to_enable_nested_virtualization_in_KVM ) 
4. Create a VM in virt-manager, run it, install latest CentOS in it
5. Check that cpu flax vmx is available inside the vm by: `grep vmx /proc/cpuinfo`

Actual results:
grep finds nothing

Expected results:
CPU of the VM has a 'vmx' flag

Additional info:
libvirt xml or stopped and running VM attached

Comment 1 Daniel Berrangé 2017-07-25 14:35:15 UTC
The CPU model shown is

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Haswell-noTSX</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='f16c'/>
    <feature policy='require' name='rdrand'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='abm'/>
  </cpu>


The Haswell-noTSX model does not include 'vmx' and nor is it listed explicitly. So this is simply a configuration problem.

I don't know whether virt-manager explicitly leaves out vmx or not.

Comment 2 jniederm 2017-07-25 14:51:54 UTC
Is it really problem of virt-manager, when the cpu tag looks like

  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
  </cpu>

before the VM is started and is changed to the block mentioned in comment 1 only for the time of run of the VM? Or am I missing something?

Comment 3 jniederm 2017-07-25 15:05:05 UTC
It looks like replacing block mentioned in comment 2 by

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Haswell-noTSX</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='f16c'/>
    <feature policy='require' name='rdrand'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='abm'/>
  </cpu>

when VM is down is a workaround (for Haswell-noTSX CPUs).

Comment 4 Pavel Hrdina 2017-07-25 15:11:24 UTC
Moving back to libvirt since this has nothing to do with virt-manager.  Libvirt is the one who replaces the

  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
  </cpu>

with

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Haswell-noTSX</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='f16c'/>
    <feature policy='require' name='rdrand'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='abm'/>
  </cpu>

Comment 5 Pavel Hrdina 2017-07-25 15:19:23 UTC
What happens with libvirt is that you've probably started libvirtd before the
nested virtualization was enabled.  To speedup libvirtd startup it caches
capabilities into "/var/cache/libvirt/qemu/capabilities/" and these cached
capabilities are valid until you update/downgrade libvirt/qemu.

Even though you've enabled nested virtualization the cached capabilities are
still valid, because there was no change of libvirt/qemu binaries, libvirt uses
these cached capabilities where "vmx" feature is marked as unavailable.

To fix this issue you can simply remove all files inside the cache folder
"/var/cache/libvirt/qemu/capabilities/" and restart libvirtd.  After this step
the:

  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
  </cpu>

should work as expected.

Comment 6 Petr Kotas 2017-07-25 15:43:06 UTC
(In reply to Pavel Hrdina from comment #5)
> What happens with libvirt is that you've probably started libvirtd before the
> nested virtualization was enabled.  To speedup libvirtd startup it caches
> capabilities into "/var/cache/libvirt/qemu/capabilities/" and these cached
> capabilities are valid until you update/downgrade libvirt/qemu.
> 
> Even though you've enabled nested virtualization the cached capabilities are
> still valid, because there was no change of libvirt/qemu binaries, libvirt
> uses
> these cached capabilities where "vmx" feature is marked as unavailable.
> 
> To fix this issue you can simply remove all files inside the cache folder
> "/var/cache/libvirt/qemu/capabilities/" and restart libvirtd.  After this
> step
> the:
> 
>   <cpu mode='host-model' check='partial'>
>     <model fallback='allow'/>
>   </cpu>
> 
> should work as expected.

I had the same issue. I can verify that this solution works. Clearing the cache worked for me.

Comment 7 jniederm 2017-07-25 20:31:55 UTC
Deleting files in "/var/cache/libvirt/qemu/capabilities/" and restarting libvirtd as mentioned in comment 5 works for me as well.

Comment 8 Cole Robinson 2017-08-03 21:12:26 UTC
By 'enabling nested' are you guys talking about the /etc/modprobe.d/kvm.conf kvm_intel nested=1 bit? If so, seems like this is going to be a recurring problem that would be nice to find a fix, but not sure how we are going to know to recache CPU data when a module parameter changes...

Comment 9 jniederm 2017-08-03 21:38:07 UTC
Hi Cole, yes 'enabling nested' refers to re-inserting kvm_intel module with 'nested' bit set, provided host CPU already has virtualization enabled in BIOS. A solution checking could be checking '/sys/module/kvm_intel/parameters/nested' and invalidating cache in case of change.

In any case it is worth documenting that one needs to delete the cache once the module is reinserted. I find that quite difficult to expect.

Comment 10 Fedora End Of Life 2018-05-03 07:51:16 UTC
This message is a reminder that Fedora 26 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 26. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '26'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 26 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 11 Fedora End Of Life 2018-05-29 12:12:25 UTC
Fedora 26 changed to end-of-life (EOL) status on 2018-05-29. Fedora 26
is no longer maintained, which means that it will not receive any
further security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.