From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20040922 Description of problem: HP reports they are seeing these MCEs under heavy IO. Originally reported in Bug #: 131029 Sep 24 11:35:45 localhost kernel: CPU 1: Silent Northbridge MCE Sep 24 11:35:45 localhost kernel: Northbridge status a60000010005001b Sep 24 11:35:45 localhost kernel: GART TLB error generic level generic Sep 24 11:35:45 localhost kernel: extended error gart error Sep 24 11:35:45 localhost kernel: link number 0 Sep 24 11:35:45 localhost kernel: err cpu1 Sep 24 11:35:45 localhost kernel: processor context corrupt Sep 24 11:35:45 localhost kernel: error address valid Sep 24 11:35:45 localhost kernel: error uncorrected Sep 24 11:35:45 localhost kernel: previous error lost Sep 24 11:35:45 localhost kernel: error address 0000000037ff0048 Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: Load RHEL 3 U3, subject machine to heavy IO Actual Results: MCE Expected Results: no MCE Additional info:
hi all, anybody working this case ? any guess on ETA ?
Note that this is related to speculative tlb reloading - when it is disabled in bios, the error does not occur.
We have seen incidences of disk data corruption running the rhr2 CORE memory tests (tests fail with binary file differs), associated with this error message, whether or not the BIOS option for speculative TLB load was enabled or disabled. When moving to rhel3 update 4, the disk corruption errors were no longer seen, with speculative TLB load disabled in the BIOS. We'd like to know what was fixed that resolved the disk corruption error, and if there are any additional error scenarios associated with the message, so we can instruct our customers. Is it truly only an informational error message at this point?
See Bug 131029, comments 29 and 30. This "bug" is actually a misreporting for which fixes were made upstream. The patch is supplied in that bug report. *** This bug has been marked as a duplicate of 131029 ***
patch posted for review 6/9/2005
devel ACK for U6
A fix for this problem has just been committed to the RHEL3 U6 patch pool this evening (in kernel version 2.4.21-32.10.EL).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-663.html