RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 592446 - [RHEL6] qemu-kvm BUG: NMI Watchdog detected LOCKUP on CPU6
Summary: [RHEL6] qemu-kvm BUG: NMI Watchdog detected LOCKUP on CPU6
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.0
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Don Zickus
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-14 21:10 UTC by Jeff Burke
Modified: 2013-01-09 22:34 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-07-06 15:24:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jeff Burke 2010-05-14 21:10:27 UTC
Description of problem:
 While running tests with the latest R6 kernel we received a BUG: NMI Watchdog detected LOCKUP on CPU6

Version-Release number of selected component (if applicable):
 2.6.32-26.el6.x86_64

How reproducible:
 Unknown

Actual results:

BUG: NMI Watchdog detected LOCKUP on CPU6, ip 7fff6f5ff850, registers:
CPU 6 
Modules linked in: tun(U) nls_utf8(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) auth_rpcgss(U) ipt_MASQUERADE(U) iptable_nat(U) nf_nat(U) autofs4(U) sunrpc(U) cpufreq_ondemand(U) acpi_cpufreq(U) freq_table(U) bridge(U) stp(U) llc(U) ipv6(U) dm_mirror(U) dm_region_hash(U) dm_log(U) kvm_intel(U) kvm(U) ibmpex(U) ibmaem(U) ipmi_msghandler(U) bnx2(U) i5k_amb(U) hwmon(U) ics932s401(U) serio_raw(U) iTCO_wdt(U) iTCO_vendor_support(U) i5000_edac(U) ioatdma(U) edac_core(U) shpchp(U) sg(U) ses(U) sr_mod(U) cdrom(U) enclosure(U) i2c_i801(U) e1000e(U) ixgbe(U) dca(U) mdio(U) ext4(U) mbcache(U) jbd2(U) dm_multipath(U) sd_mod(U) crc_t10dif(U) ata_generic(U) pata_acpi(U) lpfc(U) scsi_transport_fc(U) ata_piix(U) aacraid(U) scsi_tgt(U) radeon(U) ttm(U) drm_kms_helper(U) drm(U) i2c_algo_bit(U) i2c_core(U) dm_mod(U) [last unloaded: scsi_wait_scan]
Pid: 9572, comm: qemu-kvm Not tainted 2.6.32-26.el6.x86_64 #1 IBM System x3650 -[7979AC1]-
RIP: 0033:[<00007fff6f5ff850>]  [<00007fff6f5ff850>] 0x7fff6f5ff850
RSP: 002b:00007fff6f563b00  EFLAGS: 00000212
RAX: 27ae055a179d8618 RBX: 00007fff6f563b50 RCX: 0000000000000000
RDX: 00000000a116940f RSI: 000000004bedbfd8 RDI: 00007fff6f563b50
RBP: 00007fff6f563b10 R08: 00007fff6f563a70 R09: 0000000000002564
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000a15fe8
R13: 0000000000d27c88 R14: 0000000000000001 R15: 0000000000000001
FS:  00007fd16d7c3740(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fa88f149000 CR3: 0000000852d61000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-kvm (pid: 9572, threadinfo ffff88085d404000, task ffff880821324b30)

---[ end trace 64c3b6d72cb86555 ]---
Kernel panic - not syncing: Non maskable interrupt
Pid: 9572, comm: qemu-kvm Tainted: G      D    2.6.32-26.el6.x86_64 #1
Call Trace:
 <NMI>  [<ffffffff814c7fb5>] panic+0x78/0x137
 [<ffffffff81066f73>] ? print_oops_end_marker+0x23/0x30
 [<ffffffff814cc13c>] die_nmi+0xfc/0x100
 [<ffffffff814cc6ea>] nmi_watchdog_tick+0x1aa/0x200
 [<ffffffff814cbc73>] do_nmi+0x1a3/0x2d0
 [<ffffffff814cb550>] nmi+0x20/0x30
 <<EOE>> 
[drm:drm_fb_helper_panic] *ERROR* panic occurred, switching back to text console

Additional info:
 While the test was running, I noticed the guests were not making any progress. I ssh'd into the system to look at a few things. I launched virt-manager then lost connection. I looked at the serial console for the host and saw the above.

Comment 2 Don Zickus 2010-05-19 14:06:36 UTC
Just to add my notes to this bug after I tried debugging it a little bit

The RIP instruction looked a little strange for an NMI watchdog, but Dave A. pointed out that this fits within the VDSO of the guest.

Now normally when I see an NMI lockup message from a userspace app, I just assume the nmi watchdog is broken.  But because it is hung in the VDSO, I wouldn't be surprised if kvm played some tricks here to speed up guest/HV communication.  The only thing though is can a guest disable interrupts from userspace? or does it have to pass that message to the HV and have the HV do it?

Cheers,
Don

Comment 3 RHEL Program Management 2010-05-28 10:55:41 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 4 Marcelo Tosatti 2010-06-04 21:20:58 UTC
Don,

(In reply to comment #2)
> Just to add my notes to this bug after I tried debugging it a little bit
> 
> The RIP instruction looked a little strange for an NMI watchdog, but Dave A.
> pointed out that this fits within the VDSO of the guest.

RIP is from the host.

> Now normally when I see an NMI lockup message from a userspace app, I just
> assume the nmi watchdog is broken.  But because it is hung in the VDSO, I
> wouldn't be surprised if kvm played some tricks here to speed up guest/HV
> communication.  

No. The RIP is from qemu-kvm process.

> The only thing though is can a guest disable interrupts from
> userspace? or does it have to pass that message to the HV and have the HV do
> it?

It can't. The guest interruptibility state is separate from the host and only affects interrupt/nmi injection to the guest.

NMI's are never blocked and always cause a vmexit immediately so the host can handle it.

Reassigning to you as this report seems to indicate NMI watchdog from userspace.

Comment 5 Marcelo Tosatti 2010-07-06 15:24:29 UTC
I can't see anything wrong with KVM. Will consider this a spurious NMI warning.

Please reopen bug if this is reproducible.


Note You need to log in before you can comment on or make changes to this bug.