Red Hat Bugzilla – Bug 239484
EDAC kernel panic with 2.6.9-55.EL (RHEL4U5)
Last modified: 2008-01-07 10:49:05 EST
The kernel panic right after device modules went loaded by rc.sysinit:
EDAC k8 MC0: uncorrected error
Kernel panic - not syncing: MC1 Uncorrected Error
This is MICRO-STAR K8T Master2-FAR, Phoenix BIOS 6.00 PG (04/30/2004) with 4x
256MB RAM chips, two Opteron 242. Runs in 32 bit mode with UP kernel because
stability troubles with 32bit SMP kernel (64bit SMP kernel was ok, but 32 bit
SMP was not so we went to 32bit UP system until we update to the RHEL5).
I booted the machine with 'noedac' but now I'm not able to find this option in
the kernel source. Maybe I should to shedule reboot of the system to see what
happens without the 'noedac' option.
The kernel uses edac_mc and k8_edac modules.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
The machine is able to (re)boot without 'noedac'. So this seems that the
reported kernel panic has been somewhat random issue.
Could you copy and paste the panic you're seeing in this BZ?
I have no more info from the kernel panic except I wrote when I opened this BZ.
There were no OOPS messages, just lines I wrote above.
Could you boot with "debug" on the bootline and see if you get any more output?
There seem to be a few reports of this on redhat-list as well.
Any ideas? I'm doing a binary search through kernel versions now.
Prarit, I'm adding Aris as he worked on the original back port of the EDAC modules.
Milan, please pass "panic_on_ue=0 debug" on the bootline. When your system
boots up, please attach the entire dmesg in this BZ.
I dont know if this is the same bug but I'm also seeing EDAC errors since
upgrading to this kernel (2.6.9-55). This is dual processor, dual core AMD 275.
The errors in dmesg look like this:
EDAC k8 MC0: general bus error: participating processor(local node origin),
time-out(no timeout) memory transaction type(generic read), mem or i/o(mem
access), cache level(generic)
MC0: CE page 0xfa77f, offset 0x130, grain 8, syndrome 0x2242, row 1, channel 0,
label "": k8_edac
MC0: CE - no information available: k8_edac Error Overflow set
EDAC k8 MC0: extended error code: ECC chipkill x4 error
Greg Matthews, please check if your memory modules are OK with a tool like
memtest86+. We introduced EDAC support in RHEL-4.5 and it's possible that there're
a hardware problem and you just didn't knew because EDAC wasn't enabled.
Once you're sure the memory modules are OK and if the error persists, let me know.
(In reply to comment #24)
This was indeed a bad memory module. It has now been replaced and the errors are
no longer generated. Thanks for the steer.
As per comment #33, this is NOTABUG.