Bug 458133 - edac_mc reporting errors on Intel 5000 based systems
edac_mc reporting errors on Intel 5000 based systems
Status: CLOSED DUPLICATE of bug 471933
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
i686 Linux
medium Severity medium
: rc
: ---
Assigned To: Aristeu Rozanski
Martin Jenner
:
: 450737 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-06 12:21 EDT by Brian C. Lane
Modified: 2008-12-10 10:51 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-11-21 13:31:06 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
patch 1/2 (9.00 KB, patch)
2008-10-23 11:02 EDT, Aristeu Rozanski
no flags Details | Diff
patch 2/2 (1.74 KB, patch)
2008-10-23 11:02 EDT, Aristeu Rozanski
no flags Details | Diff

  None (edit)
Description Brian C. Lane 2008-08-06 12:21:05 EDT
Description of problem:
EDAC kernel module is reporting errors:


EDAC i5000 MC0: FATAL ERRORS Found!!! 1st FATAL Err Reg= 0x4
EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled
EDAC MC0: UE row 2, channel-a= 0 channel-b= 1 labels "-": (Branch=0 DRAM-Bank=3 RDWR=Read RAS=156 CAS=0 FATAL Err=0x4)
EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
EDAC i5000:     NORTHBOUND CRC  Error, bits= 0x20000
EDAC i5000 MC0: FATAL ERRORS Found!!! 1st FATAL Err Reg= 0x4
EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled
EDAC MC0: UE row 3, channel-a= 1 channel-b= 2 labels "-": (Branch=0 DRAM-Bank=0 RDWR=Read RAS=1204 CAS=0 FATAL Err=0x4)
EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
EDAC i5000:     NORTHBOUND CRC  Error, bits= 0x20000

Version-Release number of selected component (if applicable):

Linux sp-49.etelos.com 2.6.18-92.1.6.el5 #1 SMP Wed Jun 25 13:49:24 EDT 2008 i686 i686 i386 GNU/Linux

How reproducible:

Intermittent.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Jarod Wilson 2008-08-06 13:28:26 EDT
Summary from irc conversation with Aris:

This is a BIOS thing, its not critical. Technical explanation: there's memory throttling to avoid the memory to get hot. You can do it yourself or let the chipset do it. This message is because the BIOS initialized it to do it itself and the temperature just got past the middle. Its a known issue, being addressed upstream.

Per Aris, an sosreport would be nice to have, but per Brian, the reporter (from another irc conversation) it might not be possible to get the whole thing out, due to policy and whatnot... If there's specific info needed that can be easily sanitised, we can probably get that though.
Comment 5 Aristeu Rozanski 2008-10-23 11:02:12 EDT
Created attachment 321300 [details]
patch 1/2
Comment 6 Aristeu Rozanski 2008-10-23 11:02:41 EDT
Created attachment 321301 [details]
patch 2/2
Comment 7 Aristeu Rozanski 2008-10-23 16:53:16 EDT
test packages available at
http://people.redhat.com/arozansk/bz458133/
Comment 8 Aristeu Rozanski 2008-10-23 16:53:44 EDT
Please test and tell me how it goes.
Comment 9 Brian C. Lane 2008-10-23 17:23:39 EDT
I no longer have access to the effected systems. Hopefully someone else can give this a try. Thanks!
Comment 10 Aristeu Rozanski 2008-11-21 13:31:06 EST

*** This bug has been marked as a duplicate of bug 471933 ***
Comment 11 Aristeu Rozanski 2008-12-09 10:43:51 EST
*** Bug 450737 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.