The kernel panic right after device modules went loaded by rc.sysinit: EDAC k8 MC0: uncorrected error Kernel panic - not syncing: MC1 Uncorrected Error This is MICRO-STAR K8T Master2-FAR, Phoenix BIOS 6.00 PG (04/30/2004) with 4x 256MB RAM chips, two Opteron 242. Runs in 32 bit mode with UP kernel because stability troubles with 32bit SMP kernel (64bit SMP kernel was ok, but 32 bit SMP was not so we went to 32bit UP system until we update to the RHEL5).
I booted the machine with 'noedac' but now I'm not able to find this option in the kernel source. Maybe I should to shedule reboot of the system to see what happens without the 'noedac' option. The kernel uses edac_mc and k8_edac modules.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
The machine is able to (re)boot without 'noedac'. So this seems that the reported kernel panic has been somewhat random issue.
Milan, Could you copy and paste the panic you're seeing in this BZ? Thanks, P.
I have no more info from the kernel panic except I wrote when I opened this BZ. There were no OOPS messages, just lines I wrote above.
Milan, Could you boot with "debug" on the bootline and see if you get any more output? Thanks, P.
Bhavana, There seem to be a few reports of this on redhat-list as well. https://www.redhat.com/archives/redhat-list/2007-May/msg00108.html Any ideas? I'm doing a binary search through kernel versions now. P.
Prarit, I'm adding Aris as he worked on the original back port of the EDAC modules.
Milan, please pass "panic_on_ue=0 debug" on the bootline. When your system boots up, please attach the entire dmesg in this BZ. Thanks, P.
I dont know if this is the same bug but I'm also seeing EDAC errors since upgrading to this kernel (2.6.9-55). This is dual processor, dual core AMD 275. The errors in dmesg look like this: EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic) MC0: CE page 0xfa77f, offset 0x130, grain 8, syndrome 0x2242, row 1, channel 0, label "": k8_edac MC0: CE - no information available: k8_edac Error Overflow set EDAC k8 MC0: extended error code: ECC chipkill x4 error
Greg Matthews, please check if your memory modules are OK with a tool like memtest86+. We introduced EDAC support in RHEL-4.5 and it's possible that there're a hardware problem and you just didn't knew because EDAC wasn't enabled. Once you're sure the memory modules are OK and if the error persists, let me know.
(In reply to comment #24) This was indeed a bad memory module. It has now been replaced and the errors are no longer generated. Thanks for the steer.
As per comment #33, this is NOTABUG. P.