Bug 138192 - gart errors when using 2.4.21-20.EL on HP DL585
gart errors when using 2.4.21-20.EL on HP DL585
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Jim Paradis
Depends On:
Blocks: 156320
  Show dependency treegraph
Reported: 2004-11-05 10:02 EST by Chris Williams
Modified: 2007-11-30 17:07 EST (History)
6 users (show)

See Also:
Fixed In Version: RHSA-2005-663
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-09-28 10:31:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Chris Williams 2004-11-05 10:02:04 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3)

Description of problem:
HP reports they are seeing these MCEs under heavy IO. Originally
reported in Bug #:   131029  

Sep 24 11:35:45 localhost kernel: CPU 1: Silent Northbridge MCE
Sep 24 11:35:45 localhost kernel: Northbridge status a60000010005001b
Sep 24 11:35:45 localhost kernel:     GART TLB error generic level generic
Sep 24 11:35:45 localhost kernel:     extended error gart error
Sep 24 11:35:45 localhost kernel:     link number 0
Sep 24 11:35:45 localhost kernel:     err cpu1
Sep 24 11:35:45 localhost kernel:     processor context corrupt
Sep 24 11:35:45 localhost kernel:     error address valid
Sep 24 11:35:45 localhost kernel:     error uncorrected
Sep 24 11:35:45 localhost kernel:     previous error lost
Sep 24 11:35:45 localhost kernel:     error address 0000000037ff0048

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Load RHEL 3 U3, subject machine to heavy IO

Actual Results:  MCE

Expected Results:  no MCE

Additional info:
Comment 1 Radovan Balcar 2004-12-20 10:21:56 EST
hi all,
anybody working this case ? any guess on ETA ?
Comment 4 Christopher P Johnson 2005-03-08 20:08:07 EST
Note that this is related to speculative tlb reloading -
when it is disabled in bios, the error does not occur.
Comment 8 Christopher P Johnson 2005-04-14 14:30:03 EDT
We have seen incidences of disk data corruption running the rhr2 CORE memory
tests (tests fail with binary file differs), associated with this error message,
whether or not the BIOS option for speculative TLB load was enabled or disabled.

When moving to rhel3 update 4, the disk corruption errors were no longer seen,
with speculative TLB load disabled in the BIOS.

We'd like to know what was fixed that resolved the disk corruption error, and if
there are any additional
error scenarios associated with the message, so we can instruct our customers.

Is it truly only an informational error message at this point?
Comment 14 Jim Paradis 2005-06-07 09:30:46 EDT
See Bug 131029, comments 29 and 30.  This "bug" is actually a misreporting for
which fixes were made upstream.  The patch is supplied in that bug report.

*** This bug has been marked as a duplicate of 131029 ***
Comment 15 Brian Maly 2005-06-09 16:29:53 EDT
patch posted for review 6/9/2005
Comment 16 Brian Maly 2005-06-10 14:55:30 EDT
devel ACK for U6
Comment 19 Ernie Petrides 2005-07-11 21:01:43 EDT
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.10.EL).
Comment 20 Red Hat Bugzilla 2005-09-28 10:31:50 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.