Bug 488018

Summary: NMI appears to be stuck (460) - NMI received for unknown reason 21
Product: Red Hat Enterprise Linux 4 Reporter: Qian Cai <qcai>
Component: kernelAssignee: Aristeu Rozanski <arozansk>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 4.7.zCC: dhoward, fluo, mgahagan, syeghiay, tao, vmayatsk
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 19:15:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 458859    
Bug Blocks:    

Description Qian Cai 2009-03-02 07:41:08 UTC
Description of problem:
It looks recently 4.7.z update introduced a regression for an x86-64 machine -- some bad messages,

 ...
 AMD Opteron(tm) Processor 846 stepping 08
 Total of 4 processors activated (15942.96 BogoMIPS).
 Using local APIC timer interrupts.
 Detected 12.452 MHz APIC timer.
 checking TSC synchronization across 4 CPUs: passed.
 Brought up 4 CPUs
 Disabling vsyscall due to use of PM timer
 time.c: Using PM based timekeeping.
 testing NMI watchdog ... CPU#1: NMI appears to be stuck (460)!
 checking if image is initramfs... it is
 NET: Registered protocol family 16
 PCI: Using configuration type 1
 mtrr: v2.0 (20020519)
 Uhhuh. NMI received for unknown reason 21.
 Dazed and confused, but trying to continue
 Do you have a strange power saving mode enabled?
 ACPI: Subsystem revision 20040816
 ACPI: Interpreter enabled
 ACPI: Using IOAPIC for interrupt routing
 ...

The previous released kernel 2.6.9-78.0.13.EL has no such problem. I have seen this on both smp and largesmp kernels.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-78.0.15.EL
kernel-largesmp-2.6.9-78.0.15.EL

How reproducible:
always

Steps to Reproduce:
1. reserve bigisis.rhts.bos.redhat.com (x86-64) from RHTS
2. boot kernel-smp-2.6.9-78.0.15.EL
3. grep -i nmi /var/log/dmesg
  
Actual results:
testing NMI watchdog ... CPU#1: NMI appears to be stuck (460)!
Uhhuh. NMI received for unknown reason 21.

Expected results:
No such information

Additional info:
RHTS links,
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=7040616

Bad dmesg,
dmesg.2.6.9-78.0.15.ELsmp

Good dmesg using kernel-smp-2.6.9-78.0.13.EL,
dmesg.orig

Comment 1 Qian Cai 2009-03-02 07:51:20 UTC
The same problem has been seen since 2.6.9-78.0.14.EL kernel.

Comment 3 Don Howard 2009-03-03 00:11:01 UTC
Should this be marked as a duplicate of bz 479184 ?

Comment 11 Qian Cai 2009-03-05 03:05:03 UTC
Thanks Aristeu. Does kernel-smp-2.6.9-82.EL.488018_2.x86_64.rpm suppose to contain the patch in comment #9? If so, I confirm that it solves the problem on that machine.

Comment 12 Aristeu Rozanski 2009-03-05 13:14:23 UTC
CAI, yes, that kernel contains the patch attached on this BZ.

Comment 14 Prarit Bhargava 2009-03-10 15:16:14 UTC
*** Bug 488269 has been marked as a duplicate of this bug. ***

Comment 17 RHEL Program Management 2009-03-16 15:38:40 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 19 Vivek Goyal 2009-03-17 15:01:44 UTC
Committed in 84.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 22 Luo Fei 2009-04-20 07:52:47 UTC
The tests passed for 2.6.9-88.EL(up,smp,largesmp) on bigisis.rhts.bos.redhat.com(x86_64)...
[root@bigisis ~]# grep -i nmi /var/log/dmesg
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
testing NMI watchdog ... OK.

Comment 23 Luo Fei 2009-04-20 07:59:22 UTC
The tests passed for 2.6.9-88.EL(up,smp,largesmp) on bigisis.rhts.bos.redhat.com(x86_64)...
[root@bigisis ~]# grep -i nmi /var/log/dmesg
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
testing NMI watchdog ... OK.

Comment 25 errata-xmlrpc 2009-05-18 19:15:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html