Red Hat Bugzilla – Bug 592446
[RHEL6] qemu-kvm BUG: NMI Watchdog detected LOCKUP on CPU6
Last modified: 2013-01-09 17:34:55 EST
Description of problem:
While running tests with the latest R6 kernel we received a BUG: NMI Watchdog detected LOCKUP on CPU6
Version-Release number of selected component (if applicable):
BUG: NMI Watchdog detected LOCKUP on CPU6, ip 7fff6f5ff850, registers:
Modules linked in: tun(U) nls_utf8(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) auth_rpcgss(U) ipt_MASQUERADE(U) iptable_nat(U) nf_nat(U) autofs4(U) sunrpc(U) cpufreq_ondemand(U) acpi_cpufreq(U) freq_table(U) bridge(U) stp(U) llc(U) ipv6(U) dm_mirror(U) dm_region_hash(U) dm_log(U) kvm_intel(U) kvm(U) ibmpex(U) ibmaem(U) ipmi_msghandler(U) bnx2(U) i5k_amb(U) hwmon(U) ics932s401(U) serio_raw(U) iTCO_wdt(U) iTCO_vendor_support(U) i5000_edac(U) ioatdma(U) edac_core(U) shpchp(U) sg(U) ses(U) sr_mod(U) cdrom(U) enclosure(U) i2c_i801(U) e1000e(U) ixgbe(U) dca(U) mdio(U) ext4(U) mbcache(U) jbd2(U) dm_multipath(U) sd_mod(U) crc_t10dif(U) ata_generic(U) pata_acpi(U) lpfc(U) scsi_transport_fc(U) ata_piix(U) aacraid(U) scsi_tgt(U) radeon(U) ttm(U) drm_kms_helper(U) drm(U) i2c_algo_bit(U) i2c_core(U) dm_mod(U) [last unloaded: scsi_wait_scan]
Pid: 9572, comm: qemu-kvm Not tainted 2.6.32-26.el6.x86_64 #1 IBM System x3650 -[7979AC1]-
RIP: 0033:[<00007fff6f5ff850>] [<00007fff6f5ff850>] 0x7fff6f5ff850
RSP: 002b:00007fff6f563b00 EFLAGS: 00000212
RAX: 27ae055a179d8618 RBX: 00007fff6f563b50 RCX: 0000000000000000
RDX: 00000000a116940f RSI: 000000004bedbfd8 RDI: 00007fff6f563b50
RBP: 00007fff6f563b10 R08: 00007fff6f563a70 R09: 0000000000002564
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000a15fe8
R13: 0000000000d27c88 R14: 0000000000000001 R15: 0000000000000001
FS: 00007fd16d7c3740(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fa88f149000 CR3: 0000000852d61000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-kvm (pid: 9572, threadinfo ffff88085d404000, task ffff880821324b30)
---[ end trace 64c3b6d72cb86555 ]---
Kernel panic - not syncing: Non maskable interrupt
Pid: 9572, comm: qemu-kvm Tainted: G D 2.6.32-26.el6.x86_64 #1
<NMI> [<ffffffff814c7fb5>] panic+0x78/0x137
[<ffffffff81066f73>] ? print_oops_end_marker+0x23/0x30
[drm:drm_fb_helper_panic] *ERROR* panic occurred, switching back to text console
While the test was running, I noticed the guests were not making any progress. I ssh'd into the system to look at a few things. I launched virt-manager then lost connection. I looked at the serial console for the host and saw the above.
Just to add my notes to this bug after I tried debugging it a little bit
The RIP instruction looked a little strange for an NMI watchdog, but Dave A. pointed out that this fits within the VDSO of the guest.
Now normally when I see an NMI lockup message from a userspace app, I just assume the nmi watchdog is broken. But because it is hung in the VDSO, I wouldn't be surprised if kvm played some tricks here to speed up guest/HV communication. The only thing though is can a guest disable interrupts from userspace? or does it have to pass that message to the HV and have the HV do it?
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release. Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release. This request is not yet committed for
(In reply to comment #2)
> Just to add my notes to this bug after I tried debugging it a little bit
> The RIP instruction looked a little strange for an NMI watchdog, but Dave A.
> pointed out that this fits within the VDSO of the guest.
RIP is from the host.
> Now normally when I see an NMI lockup message from a userspace app, I just
> assume the nmi watchdog is broken. But because it is hung in the VDSO, I
> wouldn't be surprised if kvm played some tricks here to speed up guest/HV
No. The RIP is from qemu-kvm process.
> The only thing though is can a guest disable interrupts from
> userspace? or does it have to pass that message to the HV and have the HV do
It can't. The guest interruptibility state is separate from the host and only affects interrupt/nmi injection to the guest.
NMI's are never blocked and always cause a vmexit immediately so the host can handle it.
Reassigning to you as this report seems to indicate NMI watchdog from userspace.
I can't see anything wrong with KVM. Will consider this a spurious NMI warning.
Please reopen bug if this is reproducible.