Bug 138192 - gart errors when using 2.4.21-20.EL on HP DL585
Summary: gart errors when using 2.4.21-20.EL on HP DL585
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jim Paradis
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 156320
TreeView+ depends on / blocked
 
Reported: 2004-11-05 15:02 UTC by Chris Williams
Modified: 2007-11-30 22:07 UTC (History)
6 users (show)

Fixed In Version: RHSA-2005-663
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-09-28 14:31:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:663 0 qe-ready SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 6 2005-09-28 04:00:00 UTC

Description Chris Williams 2004-11-05 15:02:04 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3)
Gecko/20040922

Description of problem:
HP reports they are seeing these MCEs under heavy IO. Originally
reported in Bug #:   131029  


Sep 24 11:35:45 localhost kernel: CPU 1: Silent Northbridge MCE
Sep 24 11:35:45 localhost kernel: Northbridge status a60000010005001b
Sep 24 11:35:45 localhost kernel:     GART TLB error generic level generic
Sep 24 11:35:45 localhost kernel:     extended error gart error
Sep 24 11:35:45 localhost kernel:     link number 0
Sep 24 11:35:45 localhost kernel:     err cpu1
Sep 24 11:35:45 localhost kernel:     processor context corrupt
Sep 24 11:35:45 localhost kernel:     error address valid
Sep 24 11:35:45 localhost kernel:     error uncorrected
Sep 24 11:35:45 localhost kernel:     previous error lost
Sep 24 11:35:45 localhost kernel:     error address 0000000037ff0048

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
Load RHEL 3 U3, subject machine to heavy IO
    

Actual Results:  MCE

Expected Results:  no MCE

Additional info:

Comment 1 Radovan Balcar 2004-12-20 15:21:56 UTC
hi all,
anybody working this case ? any guess on ETA ?

Comment 4 Christopher P Johnson 2005-03-09 01:08:07 UTC
Note that this is related to speculative tlb reloading -
when it is disabled in bios, the error does not occur.

Comment 8 Christopher P Johnson 2005-04-14 18:30:03 UTC
We have seen incidences of disk data corruption running the rhr2 CORE memory
tests (tests fail with binary file differs), associated with this error message,
whether or not the BIOS option for speculative TLB load was enabled or disabled.

When moving to rhel3 update 4, the disk corruption errors were no longer seen,
with speculative TLB load disabled in the BIOS.

We'd like to know what was fixed that resolved the disk corruption error, and if
there are any additional
error scenarios associated with the message, so we can instruct our customers.

Is it truly only an informational error message at this point?


Comment 14 Jim Paradis 2005-06-07 13:30:46 UTC
See Bug 131029, comments 29 and 30.  This "bug" is actually a misreporting for
which fixes were made upstream.  The patch is supplied in that bug report.


*** This bug has been marked as a duplicate of 131029 ***

Comment 15 Brian Maly 2005-06-09 20:29:53 UTC
patch posted for review 6/9/2005

Comment 16 Brian Maly 2005-06-10 18:55:30 UTC
devel ACK for U6

Comment 19 Ernie Petrides 2005-07-12 01:01:43 UTC
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.10.EL).


Comment 20 Red Hat Bugzilla 2005-09-28 14:31:50 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html



Note You need to log in before you can comment on or make changes to this bug.