Description of problem: I have a nested KVM environment where the host runs f20 and L1/L2 guests run RHEL6 (actually L1/2 guests are for a RHEV-M hosted engine setup). After I updated the host kernel to kernel 3.15.3-200.fc20 and later (incl. the latest 3.15.6-200.fc20), I get a lot of "soft lockup" in the L1 guest, after which the L1 guest's CPU usage is excessively high and the L2 guest is unusable. If I keep the L1 guest running with the errors, it sometimes reboots ( If I boot 3.14.9-200.fc200, the L1/L2 guests work without a hitch. Version-Release number of selected component (if applicable): host kernel: kernel-3.15.3-200.fc20 or 3-15.4-200.fc20 or 3.15.6-200.fc200 How reproducible: Always Steps to Reproduce: 1. On a pre-kernel-3.15 fedora 20 system, install a L1 guest w/--cpu=host option 2. In the L1 guest, install a L2 guest 3. Update the host kernel to 3.15.*-200.fc20, and start L1 and L2 guests Actual results: L1 guest gets "soft lockup" errors and sometimes reboots. L2 guest becomes unresponsive. Expected results: L1 and L2 guests continue to run without the soft lockup errors or reboots. Additional info: Jul 22 12:34:35 hehost2 kernel: BUG: soft lockup - CPU#2 stuck for 67s! [qemu-kvm:4324] Jul 22 12:34:59 hehost2 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [qemu-kvm:4151] Jul 22 12:35:59 hehost2 kernel: BUG: soft lockup - CPU#2 stuck for 67s! [qemu-kvm:4324] Jul 22 12:36:23 hehost2 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [qemu-kvm:4151] Jul 22 12:37:23 hehost2 kernel: BUG: soft lockup - CPU#2 stuck for 67s! [qemu-kvm:4324] Jul 22 12:37:47 hehost2 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [qemu-kvm:4151] Jul 22 12:38:47 hehost2 kernel: BUG: soft lockup - CPU#2 stuck for 67s! [qemu-kvm:4324] Jul 22 12:39:11 hehost2 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [qemu-kvm:4151] Jul 22 12:40:11 hehost2 kernel: BUG: soft lockup - CPU#2 stuck for 68s! [qemu-kvm:4324] Jul 22 12:40:35 hehost2 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [qemu-kvm:4151] Jul 22 12:41:35 hehost2 kernel: BUG: soft lockup - CPU#2 stuck for 67s! [qemu-kvm:4324] Jul 22 12:41:59 hehost2 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [qemu-kvm:4151] Jul 22 12:42:59 hehost2 kernel: BUG: soft lockup - CPU#2 stuck for 67s! [qemu-kvm:4324] Jul 22 12:43:23 hehost2 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [qemu-kvm:4151] The processes associated with the above error are usually qemu-kvm, ksmd, or vhost kthread.
It was SoftDog that was responsible for rebooting the L1 guest. Here are excerpt from serial console: hehost2.localdomain login: SoftDog: Unexpected close, not stopping watchdog! BUG: soft lockup - CPU#0 stuck for 67s! [qemu-kvm:3869] BUG: soft lockup - CPU#3 stuck for 67s! [qemu-kvm:3977] SoftDog: Initiating system reboot. Press any key to continue. Press any key to continue. Press any key to continue. Press any key to continue. Press any key to continue. Press any key to enter the menu Booting Red Hat Enterprise Linux (2.6.32-431.el6.x86_64) in 0 seconds...� Welcome to Red Hat Enterprise Linux Server
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs. Fedora 20 has now been rebased to 3.17.2-200.fc20. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 21, and are still experiencing this issue, please change the version to Fedora 21. If you experience different issues, please open a new bug report for those.
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in over 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.