Red Hat Bugzilla – Bug 998713
Using Nested KVM freezes L1 host
Last modified: 2014-04-10 09:22:48 EDT
When I try to use nested KVM with Fedora 19, I'm experiencing a hang of the L1
vm. I eventually have to do a "Force Off" from virt-manager on L0 since the L1
vm is completely unresponsive.
Both my L0 and L1 are F19, x86_64. L2 doesn't really seem to matter as I can't
even get the guest to start before L1 completely locks up. Even just
attempting to define a L2 guest that uses kvm virtualization locks up L1
Here are some more details.
On L0 (i've rebooted L0 after enabling the nested parameter):
[root@dublin ~]# uname -a
Linux dublin 3.10.7-200.fc19.x86_64 #1 SMP Thu Aug 15 23:19:45 UTC 2013 x86_64
x86_64 x86_64 GNU/Linux
[root@dublin ~]# cat /sys/module/kvm_intel/parameters/nested
[root@dublin ~]# rpm -q libvirt
[root@dublin ~]# lsmod | grep kvm
kvm_intel 138528 3
kvm 422809 1 kvm_intel
cpu section of libvirt xml for L1 vm:
<cpu mode='custom' match='exact'>
<feature policy='require' name='vmx'/>
[root@localhost jslagle]# uname -a
Linux localhost.localdomain 3.10.7-200.fc19.x86_64 #1 SMP Thu Aug 15 23:19:45
UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost jslagle]# rpm -q libvirt
[root@localhost jslagle]# lsmod | grep kvm
kvm_intel 138528 0
kvm 422809 1 kvm_intel
I've looked in dmesg output, /var/log/messages, and libvirt logs on both L0 and
L1 and can't see any errors.
Would be happy to help debug this further. Wondering if this is a known issue or if nested kvm works for other folks on F19?
The main Fedora kernel maintainers don't test nested KVM. You might want to discuss it on the fedora virtualization list and/or with the upstream KVM maintainers.
I asked on the virt list, the general consensus is that my L0 physical hardware is old (it's a Nehalem), so not much surprise that I'm having issues.
Feel free to close the bug WONTFIX or whatever is appropriate.
Here's the fedora virt thread:
Some discussion there but culprit tracked down.
Gleb, I know you've been working on nested VMX recently, any suggestions for further debugging? Should James take this upstream?
(In reply to Cole Robinson from comment #3)
> Here's the fedora virt thread:
> Some discussion there but culprit tracked down.
> Gleb, I know you've been working on nested VMX recently, any suggestions for
> further debugging? Should James take this upstream?
Wasn't CCed on this bug so missed that.
Nehalem is not old. I do most of my development on it. Nested VMX is still
experimental though so it is alway a good idea to try latest upstream. In your case adding emulate_invalid_guest_state=0 parameter to kvm_intel module may help.
*********** MASS BUG UPDATE **************
We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.
Fedora 19 has now been rebased to 3.11.1-200.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.
If you experience different issues, please open a new bug report for those.
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.
FYI: I experienced pretty the same behavior as described in the bug. I've added emulate_invalid_guest_state=0 parameter to kvm_intel module as was suggested above, and it now works with no vm hanged.
Probably experiencing the same issue here. Happened last friday resulting in total freeze of the system. No more caps-lock (led) response, only hard-reset could revive the system. The system just froze again, but I'm not on site so I don't have all the details (yet).
Intel(R) Core(TM) i7-2600 CPU
Last friday we started playing with (virtualized) ovirt, before that the system was rock solid.
Some more details: Yesterday's freeze was the same as Friday;s freeze: total lockup, no caps-lock(led) response, completely black screen.
However, the situation is different from the initial report: L0 (host) and L1 and L2 (guests) run all happy for a while, but at some point L0 completely locks up. We have no clear indication what's the trigger.
Closed the bug again. We have an L0 issue, create a specific bug for that: bug 1085895