998713 – Using Nested KVM freezes L1 host

Bug 998713 - Using Nested KVM freezes L1 host

Summary: Using Nested KVM freezes L1 host

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	19
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-19 20:30 UTC by James Slagle
Modified:	2014-04-10 13:22 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-04-09 15:23:50 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description James Slagle 2013-08-19 20:30:52 UTC

When I try to use nested KVM with Fedora 19, I'm experiencing a hang of the L1
vm.  I eventually have to do a "Force Off" from virt-manager on L0 since the L1
vm is completely unresponsive.

Both my L0 and L1 are F19, x86_64.  L2 doesn't really seem to matter as I can't
even get the guest to start before L1 completely locks up.  Even just
attempting to define a L2 guest that uses kvm virtualization locks up L1
completely.

Here are some more details.

On L0 (i've rebooted L0 after enabling the nested parameter):
[root@dublin ~]# uname -a
Linux dublin 3.10.7-200.fc19.x86_64 #1 SMP Thu Aug 15 23:19:45 UTC 2013 x86_64
x86_64 x86_64 GNU/Linux
[root@dublin ~]# cat /sys/module/kvm_intel/parameters/nested
Y
[root@dublin ~]# rpm -q libvirt
libvirt-1.0.5.5-1.fc19.x86_64
[root@dublin ~]# lsmod | grep kvm
kvm_intel             138528  3
kvm                   422809  1 kvm_intel

cpu section of libvirt xml for L1 vm:
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>Nehalem</model>
    <feature policy='require' name='vmx'/>
  </cpu>


On L1:
[root@localhost jslagle]# uname -a
Linux localhost.localdomain 3.10.7-200.fc19.x86_64 #1 SMP Thu Aug 15 23:19:45
UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost jslagle]# rpm -q libvirt
libvirt-1.0.5.5-1.fc19.x86_64
[root@localhost jslagle]# lsmod | grep kvm
kvm_intel             138528  0
kvm                   422809  1 kvm_intel


I've looked in dmesg output, /var/log/messages, and libvirt logs on both L0 and
L1 and can't see any errors.

Would be happy to help debug this further.  Wondering if this is a known issue or if nested kvm works for other folks on F19?

Comment 1 Josh Boyer 2013-08-20 13:08:58 UTC

The main Fedora kernel maintainers don't test nested KVM.  You might want to discuss it on the fedora virtualization list and/or with the upstream KVM maintainers.

Comment 2 James Slagle 2013-08-20 14:01:58 UTC

I asked on the virt list, the general consensus is that my L0 physical hardware is old (it's a Nehalem), so not much surprise that I'm having issues.

Feel free to close the bug WONTFIX or whatever is appropriate.

Comment 3 Cole Robinson 2013-08-20 14:43:26 UTC

Here's the fedora virt thread:

https://lists.fedoraproject.org/pipermail/virt/2013-August/003768.html

Some discussion there but culprit tracked down.

Gleb, I know you've been working on nested VMX recently, any suggestions for further debugging? Should James take this upstream?

Comment 4 Gleb Natapov 2013-09-09 06:42:13 UTC

(In reply to Cole Robinson from comment #3)
> Here's the fedora virt thread:
> 
> https://lists.fedoraproject.org/pipermail/virt/2013-August/003768.html
> 
> Some discussion there but culprit tracked down.
> 
> Gleb, I know you've been working on nested VMX recently, any suggestions for
> further debugging? Should James take this upstream?

Wasn't CCed on this bug so missed that.

Nehalem is not old. I do most of my development on it. Nested VMX is still
experimental though so it is alway a good idea to try latest upstream. In your case adding emulate_invalid_guest_state=0 parameter to kvm_intel module may help.

Comment 5 Josh Boyer 2013-09-18 20:33:20 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 6 Josh Boyer 2013-10-08 16:13:57 UTC

This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 7 Ihar Hrachyshka 2013-12-09 08:52:45 UTC

FYI: I experienced pretty the same behavior as described in the bug. I've added emulate_invalid_guest_state=0 parameter to kvm_intel module as was suggested above, and it now works with no vm hanged.

Comment 8 Rolf Fokkens 2014-04-07 17:03:04 UTC

Probably experiencing the same issue here. Happened last friday resulting in total freeze of the system. No more caps-lock (led) response, only hard-reset could revive the system. The system just froze again, but I'm not on site so I don't have all the details (yet).

Some details:

Fedora 19
kernel-3.13.7-100.fc19.x86_64
Intel(R) Core(TM) i7-2600 CPU

Last friday we started playing with (virtualized) ovirt, before that the system was rock solid.

Comment 9 Rolf Fokkens 2014-04-08 12:49:23 UTC

Some more details: Yesterday's freeze was the same as Friday;s freeze: total lockup, no caps-lock(led) response, completely black screen.

However, the situation is different from the initial report: L0 (host) and L1 and L2 (guests) run all happy for a while, but at some point L0 completely locks up. We have no clear indication what's the trigger.

Comment 10 Rolf Fokkens 2014-04-09 15:23:50 UTC

Closed the bug again. We have an L0 issue, create a specific bug for that: bug 1085895

Note You need to log in before you can comment on or make changes to this bug.