Bug 488018 - NMI appears to be stuck (460) - NMI received for unknown reason 21
NMI appears to be stuck (460) - NMI received for unknown reason 21
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.7.z
All Linux
high Severity high
: rc
: ---
Assigned To: Aristeu Rozanski
Red Hat Kernel QE team
: Regression
: 488269 (view as bug list)
Depends On: 458859
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-02 02:41 EST by CAI Qian
Modified: 2010-10-23 03:59 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-18 15:15:06 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description CAI Qian 2009-03-02 02:41:08 EST
Description of problem:
It looks recently 4.7.z update introduced a regression for an x86-64 machine -- some bad messages,

 ...
 AMD Opteron(tm) Processor 846 stepping 08
 Total of 4 processors activated (15942.96 BogoMIPS).
 Using local APIC timer interrupts.
 Detected 12.452 MHz APIC timer.
 checking TSC synchronization across 4 CPUs: passed.
 Brought up 4 CPUs
 Disabling vsyscall due to use of PM timer
 time.c: Using PM based timekeeping.
 testing NMI watchdog ... CPU#1: NMI appears to be stuck (460)!
 checking if image is initramfs... it is
 NET: Registered protocol family 16
 PCI: Using configuration type 1
 mtrr: v2.0 (20020519)
 Uhhuh. NMI received for unknown reason 21.
 Dazed and confused, but trying to continue
 Do you have a strange power saving mode enabled?
 ACPI: Subsystem revision 20040816
 ACPI: Interpreter enabled
 ACPI: Using IOAPIC for interrupt routing
 ...

The previous released kernel 2.6.9-78.0.13.EL has no such problem. I have seen this on both smp and largesmp kernels.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-78.0.15.EL
kernel-largesmp-2.6.9-78.0.15.EL

How reproducible:
always

Steps to Reproduce:
1. reserve bigisis.rhts.bos.redhat.com (x86-64) from RHTS
2. boot kernel-smp-2.6.9-78.0.15.EL
3. grep -i nmi /var/log/dmesg
  
Actual results:
testing NMI watchdog ... CPU#1: NMI appears to be stuck (460)!
Uhhuh. NMI received for unknown reason 21.

Expected results:
No such information

Additional info:
RHTS links,
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=7040616

Bad dmesg,
dmesg.2.6.9-78.0.15.ELsmp

Good dmesg using kernel-smp-2.6.9-78.0.13.EL,
dmesg.orig
Comment 1 CAI Qian 2009-03-02 02:51:20 EST
The same problem has been seen since 2.6.9-78.0.14.EL kernel.
Comment 3 Don Howard 2009-03-02 19:11:01 EST
Should this be marked as a duplicate of bz 479184 ?
Comment 11 CAI Qian 2009-03-04 22:05:03 EST
Thanks Aristeu. Does kernel-smp-2.6.9-82.EL.488018_2.x86_64.rpm suppose to contain the patch in comment #9? If so, I confirm that it solves the problem on that machine.
Comment 12 Aristeu Rozanski 2009-03-05 08:14:23 EST
CAI, yes, that kernel contains the patch attached on this BZ.
Comment 14 Prarit Bhargava 2009-03-10 11:16:14 EDT
*** Bug 488269 has been marked as a duplicate of this bug. ***
Comment 17 RHEL Product and Program Management 2009-03-16 11:38:40 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 19 Vivek Goyal 2009-03-17 11:01:44 EDT
Committed in 84.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 22 Luo Fei 2009-04-20 03:52:47 EDT
The tests passed for 2.6.9-88.EL(up,smp,largesmp) on bigisis.rhts.bos.redhat.com(x86_64)...
[root@bigisis ~]# grep -i nmi /var/log/dmesg
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
testing NMI watchdog ... OK.
Comment 23 Luo Fei 2009-04-20 03:59:22 EDT
The tests passed for 2.6.9-88.EL(up,smp,largesmp) on bigisis.rhts.bos.redhat.com(x86_64)...
[root@bigisis ~]# grep -i nmi /var/log/dmesg
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
testing NMI watchdog ... OK.
Comment 25 errata-xmlrpc 2009-05-18 15:15:06 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html

Note You need to log in before you can comment on or make changes to this bug.