Bug 437273 - NMI appears to be stuck
NMI appears to be stuck
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.2
All Linux
low Severity low
: rc
: 5.5
Assigned To: Justin M. Forbes
Martin Jenner
:
Depends On:
Blocks: 492568
  Show dependency treegraph
 
Reported: 2008-03-13 06:42 EDT by Matthew Booth
Modified: 2009-07-27 10:56 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-07-27 10:56:59 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matthew Booth 2008-03-13 06:42:19 EDT
Description of problem:
After upgrading to RHEL 5.2 beta, on every boot I get the following in the logs:
Testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (0->0)!

[root@mbooth ~]# cat /proc/cmdline 
ro root=/dev/vg_local/root crashkernel=64M@16M audit=1 nmi_watchdog=2

This is on a Lenovo X60s laptop.

Version-Release number of selected component (if applicable):
2.6.18-84.el5xen

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Matthew Booth 2008-03-13 06:55:45 EDT
Thinking about it, this might be related to bug 437274.
Comment 2 Bill Burns 2008-04-16 21:01:00 EDT
It may well be related. Please try the latest kernel, -90 and let us know if
that fixes the issue. Thanks.
Comment 4 Bill Burns 2009-01-14 14:07:02 EST
Another report of this message in the RHTS log at:
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=5596480
Comment 5 Jeff Burke 2009-01-15 09:09:02 EST
Using the RHEL5.3 kernel 2.6.18-128.el5 this message still get printed when booting.

I see that this BZ has kernel-xen. I see this issue on a HVM guest running the bare metal kernel.


Command Line = ro root=/dev/VolGroup00/LogVol00 console=tty0 console=ttyS0,115200
Comment 6 Chris Lalancette 2009-01-15 09:22:20 EST
(In reply to comment #5)
> Using the RHEL5.3 kernel 2.6.18-128.el5 this message still get printed when
> booting.
> 
> I see that this BZ has kernel-xen. I see this issue on a HVM guest running the
> bare metal kernel.
> 
> 
> Command Line = ro root=/dev/VolGroup00/LogVol00 console=tty0
> console=ttyS0,115200

I think this is probably different than the HVM message.  If I remember correctly (and I'd have to go back to the code to check), with HVM the hypervisor just "fake" emulates the PERFCTR registers that NMI uses.  What this means is that the HVM guest programs the perfctr registers, but Xen just throws away the data.  Now, the guest domain waits for the PERFCTR register to overflow and generate an interrupt (which is how NMI works), but this never happens since we didn't program anything.

The issue on bare-metal is completely different; the dom0 should be able to program those MSR's successfully, so something else is going on.

Chris Lalancette
Comment 8 Chris Lalancette 2009-07-07 04:29:35 EDT
Matt, additional fixes went into 5.4 to help the NMI stuck messages.  If you get a chance, could you give the 5.4 kernel a whirl on your laptop and see if it helps the issue any?

Thanks,
Chris Lalancette
Comment 9 Matthew Booth 2009-07-20 04:43:23 EDT
Chris,

Realistically that's unlikely to happen at this stage as I've been running Fedora for a while now. I can confirm that the message doesn't appear in Fedora. Otherwise, don't keep this bug open on my account.

Thanks,

Matt
Comment 10 Chris Lalancette 2009-07-27 10:56:59 EDT
(In reply to comment #9)
> Chris,
> 
> Realistically that's unlikely to happen at this stage as I've been running
> Fedora for a while now. I can confirm that the message doesn't appear in
> Fedora. Otherwise, don't keep this bug open on my account.

OK, thanks for the info.  I'll close this out for now, and if other users see it, we'll just have them open up new bugs.

Thanks,
Chris Lalancette

Note You need to log in before you can comment on or make changes to this bug.