This bug has been copied from bug #569938 and has been proposed to be backported to 5.5 z-stream (EUS).
in 2.6.18-194.1.1.el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0398.html
I can't see the original bug report, but the problem reported in the erratum appears not to be fixed in the 2.6.18-194.3.1.el5 kernel, like this (one of many reports on multiple Barcelona nodes in our cluster): Jun 24 00:32:39 lvgig025 kernel: Northbridge Error, node 1, core: -1 Jun 24 00:32:39 lvgig025 kernel: K8 ECC error. Jun 24 00:32:39 lvgig025 kernel: EDAC amd64 MC1: CE ERROR_ADDRESS= 0x375a77cc0 Jun 24 00:32:39 lvgig025 kernel: EDAC MC1: CE page 0x375a77, offset 0xcc0, grain 0, syndrome 0x4951, row 7, channel 0, label "": amd64_edac Jun 24 00:32:39 lvgig025 kernel: EDAC MC1: CE - no information available: amd64_edacError Overflow
The problem is not fixed with 2.6.18-274.7.1.el5 either. Linux racdbmc1ldv 2.6.18-274.7.1.el5 #1 SMP Mon Oct 17 11:57:14 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux Apr 15 16:05:54 racdbmc1ldv kernel: EDAC amd64 MC0: CE ERROR_ADDRESS= 0x404b9f3a0 Apr 15 16:05:54 racdbmc1ldv kernel: EDAC MC0: CE page 0x404b9f, offset 0x3a0, grain 0, syndrome 0x2b8, row 4, channel 0, label "": amd64_edac Apr 15 16:05:58 racdbmc1ldv kernel: EDAC amd64 MC0: CE ERROR_ADDRESS= 0x28ea0dd40 Apr 15 16:05:58 racdbmc1ldv kernel: EDAC MC0: CE page 0x28ea0d, offset 0xd40, grain 0, syndrome 0x2b8, row 4, channel 0, label "": amd64_edac Apr 15 16:06:04 racdbmc1ldv kernel: EDAC amd64 MC0: CE ERROR_ADDRESS= 0x40382bfa0 Apr 15 16:06:04 racdbmc1ldv kernel: EDAC MC0: CE page 0x40382b, offset 0xfa0, grain 0, syndrome 0x2b8, row 4, channel 0, label "": amd64_edac Apr 15 16:06:18 racdbmc1ldv kernel: EDAC amd64 MC0: CE ERROR_ADDRESS= 0x212788140 Apr 15 16:06:18 racdbmc1ldv kernel: EDAC MC0: CE page 0x212788, offset 0x140, grain 0, syndrome 0x2b8, row 4, channel 0, label "": amd64_edac Apr 15 16:06:34 racdbmc1ldv kernel: EDAC amd64 MC0: CE ERROR_ADDRESS= 0x2d222c6e0 Apr 15 16:06:34 racdbmc1ldv kernel: EDAC MC0: CE page 0x2d222c, offset 0x6e0, grain 0, syndrome 0x2b8, row 4, channel 0, label "": amd64_edac Apr 15 16:07:34 racdbmc1ldv kernel: EDAC amd64 MC0: CE ERROR_ADDRESS= 0x3d491c8f0 Apr 15 16:07:34 racdbmc1ldv kernel: EDAC MC0: CE page 0x3d491c, offset 0x8f0, grain 0, syndrome 0x2b8, row 4, channel 0, label "": amd64_edac Apr 15 16:07:46 racdbmc1ldv kernel: EDAC amd64 MC0: CE ERROR_ADDRESS= 0x330d2d000 Apr 15 16:07:46 racdbmc1ldv kernel: EDAC MC0: CE page 0x330d2d, offset 0x0, grain 0, syndrome 0x2b8, row 4, channel 0, label "": amd64_edac
kernel 2.6.18-398.el5 on RHEL 5.11 shows same kind of messages.