Red Hat Bugzilla – Bug 138192
gart errors when using 2.4.21-20.EL on HP DL585
Last modified: 2007-11-30 17:07:04 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3)
Description of problem:
HP reports they are seeing these MCEs under heavy IO. Originally
reported in Bug #: 131029
Sep 24 11:35:45 localhost kernel: CPU 1: Silent Northbridge MCE
Sep 24 11:35:45 localhost kernel: Northbridge status a60000010005001b
Sep 24 11:35:45 localhost kernel: GART TLB error generic level generic
Sep 24 11:35:45 localhost kernel: extended error gart error
Sep 24 11:35:45 localhost kernel: link number 0
Sep 24 11:35:45 localhost kernel: err cpu1
Sep 24 11:35:45 localhost kernel: processor context corrupt
Sep 24 11:35:45 localhost kernel: error address valid
Sep 24 11:35:45 localhost kernel: error uncorrected
Sep 24 11:35:45 localhost kernel: previous error lost
Sep 24 11:35:45 localhost kernel: error address 0000000037ff0048
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Load RHEL 3 U3, subject machine to heavy IO
Actual Results: MCE
Expected Results: no MCE
anybody working this case ? any guess on ETA ?
Note that this is related to speculative tlb reloading -
when it is disabled in bios, the error does not occur.
We have seen incidences of disk data corruption running the rhr2 CORE memory
tests (tests fail with binary file differs), associated with this error message,
whether or not the BIOS option for speculative TLB load was enabled or disabled.
When moving to rhel3 update 4, the disk corruption errors were no longer seen,
with speculative TLB load disabled in the BIOS.
We'd like to know what was fixed that resolved the disk corruption error, and if
there are any additional
error scenarios associated with the message, so we can instruct our customers.
Is it truly only an informational error message at this point?
See Bug 131029, comments 29 and 30. This "bug" is actually a misreporting for
which fixes were made upstream. The patch is supplied in that bug report.
*** This bug has been marked as a duplicate of 131029 ***
patch posted for review 6/9/2005
devel ACK for U6
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.10.EL).
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.