Bug 479184

Summary: RHEL 4.7: unknown NMI errors on x86_64 on DL580 G5
Product: Red Hat Enterprise Linux 4 Reporter: RHEL Program Management <pm-rhel>
Component: kernelAssignee: Vitaly Mayatskikh <vmayatsk>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.7CC: ahecox, arozansk, dhoward, duck, fluo, james.brown, jtluka, nagananda.chumbalkar, peterm, pm-eus, qcai, sfolkwil, tao, tcamuso, tumeya, vgoyal
Target Milestone: rcKeywords: Regression, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-04-30 21:25:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 458859, 497330    
Bug Blocks:    
Attachments:
Description Flags
dmesg for kernel-largesmp-78.0.20.EL
none
dmesg for kernel-largesmp-2.6.9-89.EL none

Description RHEL Program Management 2009-01-07 19:39:34 UTC
This bug has been copied from bug #458859 and has been proposed
to be backported to 4.7 z-stream (EUS).

Comment 2 RHEL Program Management 2009-01-07 20:12:58 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 5 Vitaly Mayatskikh 2009-01-21 14:17:13 UTC
Committed in 78.0.14.EL

Comment 10 Vitaly Mayatskikh 2009-03-25 07:57:11 UTC
Committed in 78.0.18.EL

Comment 12 Qian Cai 2009-04-23 09:19:56 UTC
Created attachment 340909 [details]
dmesg for kernel-largesmp-78.0.20.EL

Comment 13 Qian Cai 2009-04-23 09:21:52 UTC
Created attachment 340911 [details]
dmesg for kernel-largesmp-2.6.9-89.EL

Comment 14 Qian Cai 2009-04-23 09:31:50 UTC
This problem is also there for UP kernel.

[root@hp-dl580g5-01 ~]# grep -i nmi /var/log/dmesg
ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
activating NMI Watchdog ... done.
testing NMI watchdog ... CPU#0: NMI appears to be stuck (1)!
NMI watchdog: disabling NMI delivery on LINT0 for all CPUs
[root@hp-dl580g5-01 ~]# uname -ra
Linux hp-dl580g5-01.rhts.bos.redhat.com 2.6.9-78.0.20.EL #1 Thu Apr 16 13:37:56 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

Comment 16 Qian Cai 2009-04-23 10:44:33 UTC
After a while, the following console messages have been displayed with the RHEL4.7.z kernel.

hp-dl580g5-01.rhts.bos.redhat.com login: warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts
rip __do_softirq+0x4d/0xd0
Falling back to HPET

Comment 17 Vitaly Mayatskikh 2009-04-23 11:44:43 UTC
I'm trying to bisect what we miss in 4.7.z.

Comment 18 Vitaly Mayatskikh 2009-04-23 12:09:55 UTC
We need patch linux-2.6.9-nehalem-ex-support.patch from bz 491338, with this patch 4.7.z kernel works as desired.

Comment 19 Jiri Skrabal 2009-04-23 12:29:39 UTC
Patch from BZ 491338 approved for 4.7.z. See Bug #497330

Comment 23 errata-xmlrpc 2009-04-30 21:25:30 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0459.html