Bug 1121939 - soft lockups and abrupt reboots in L1 guests on a nested KVM environment [NEEDINFO]
Summary: soft lockups and abrupt reboots in L1 guests on a nested KVM environment
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: fedora-kernel-kvm
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-22 07:58 UTC by Ken Sugawara
Modified: 2014-12-10 14:58 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-10 14:58:07 UTC
jforbes: needinfo?


Attachments (Terms of Use)

Description Ken Sugawara 2014-07-22 07:58:16 UTC
Description of problem:
I have a nested KVM environment where the host runs f20 and L1/L2 guests run RHEL6 (actually L1/2 guests are for a RHEV-M hosted engine setup). After I updated the host kernel to kernel 3.15.3-200.fc20 and later (incl. the latest 3.15.6-200.fc20), I get a lot of "soft lockup" in the L1 guest, after which the L1 guest's CPU usage is excessively high and the L2 guest is unusable. If I keep the L1 guest running with the errors, it sometimes reboots (

If I boot 3.14.9-200.fc200, the L1/L2 guests work without a hitch.


Version-Release number of selected component (if applicable):
host kernel: kernel-3.15.3-200.fc20 or 3-15.4-200.fc20 or 3.15.6-200.fc200


How reproducible:
Always


Steps to Reproduce:
1. On a pre-kernel-3.15 fedora 20 system, install a L1 guest w/--cpu=host option
2. In the L1 guest, install a L2 guest
3. Update the host kernel to 3.15.*-200.fc20, and start L1 and L2 guests

Actual results:
L1 guest gets "soft lockup" errors and sometimes reboots. L2 guest becomes unresponsive.

Expected results:
L1 and L2 guests continue to run without the soft lockup errors or reboots.

Additional info:
Jul 22 12:34:35 hehost2 kernel: BUG: soft lockup - CPU#2 stuck for 67s! [qemu-kvm:4324]
Jul 22 12:34:59 hehost2 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [qemu-kvm:4151]
Jul 22 12:35:59 hehost2 kernel: BUG: soft lockup - CPU#2 stuck for 67s! [qemu-kvm:4324]
Jul 22 12:36:23 hehost2 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [qemu-kvm:4151]
Jul 22 12:37:23 hehost2 kernel: BUG: soft lockup - CPU#2 stuck for 67s! [qemu-kvm:4324]
Jul 22 12:37:47 hehost2 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [qemu-kvm:4151]
Jul 22 12:38:47 hehost2 kernel: BUG: soft lockup - CPU#2 stuck for 67s! [qemu-kvm:4324]
Jul 22 12:39:11 hehost2 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [qemu-kvm:4151]
Jul 22 12:40:11 hehost2 kernel: BUG: soft lockup - CPU#2 stuck for 68s! [qemu-kvm:4324]
Jul 22 12:40:35 hehost2 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [qemu-kvm:4151]
Jul 22 12:41:35 hehost2 kernel: BUG: soft lockup - CPU#2 stuck for 67s! [qemu-kvm:4324]
Jul 22 12:41:59 hehost2 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [qemu-kvm:4151]
Jul 22 12:42:59 hehost2 kernel: BUG: soft lockup - CPU#2 stuck for 67s! [qemu-kvm:4324]
Jul 22 12:43:23 hehost2 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [qemu-kvm:4151]

The processes associated with the above error are usually qemu-kvm, ksmd, or vhost kthread.

Comment 1 Ken Sugawara 2014-07-22 08:50:39 UTC
It was SoftDog that was responsible for rebooting the L1 guest. Here are excerpt from serial console:

hehost2.localdomain login: SoftDog: Unexpected close, not stopping watchdog!
BUG: soft lockup - CPU#0 stuck for 67s! [qemu-kvm:3869]
BUG: soft lockup - CPU#3 stuck for 67s! [qemu-kvm:3977]
SoftDog: Initiating system reboot.
Press any key to continue.
Press any key to continue.
Press any key to continue.
Press any key to continue.
Press any key to continue.
Press any key to enter the menu


Booting Red Hat Enterprise Linux (2.6.32-431.el6.x86_64) in 0 seconds...�	Welcome to Red Hat Enterprise Linux Server

Comment 4 Justin M. Forbes 2014-11-13 15:57:10 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.17.2-200.fc20.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 21, and are still experiencing this issue, please change the version to Fedora 21.

If you experience different issues, please open a new bug report for those.

Comment 5 Justin M. Forbes 2014-12-10 14:58:07 UTC
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in over 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.


Note You need to log in before you can comment on or make changes to this bug.