Bug 239484 - EDAC kernel panic with 2.6.9-55.EL (RHEL4U5)
Summary: EDAC kernel panic with 2.6.9-55.EL (RHEL4U5)
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.5
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Prarit Bhargava
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-05-08 19:30 UTC by Milan Kerslager
Modified: 2008-01-07 15:49 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-06-11 12:28:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Milan Kerslager 2007-05-08 19:30:49 UTC
The kernel panic right after device modules went loaded by rc.sysinit:

EDAC k8 MC0: uncorrected error
Kernel panic - not syncing: MC1 Uncorrected Error

This is MICRO-STAR K8T Master2-FAR, Phoenix BIOS 6.00 PG (04/30/2004) with 4x
256MB RAM chips, two Opteron 242. Runs in 32 bit mode with UP kernel because
stability troubles with 32bit SMP kernel (64bit SMP kernel was ok, but 32 bit
SMP was not so we went to 32bit UP system until we update to the RHEL5).

Comment 1 Milan Kerslager 2007-05-08 19:44:04 UTC
I booted the machine with 'noedac' but now I'm not able to find this option in
the kernel source. Maybe I should to shedule reboot of the system to see what
happens without the 'noedac' option.

The kernel uses edac_mc and k8_edac modules.

Comment 3 RHEL Program Management 2007-05-25 16:24:30 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Milan Kerslager 2007-05-26 07:36:37 UTC
The machine is able to (re)boot without 'noedac'. So this seems that the
reported kernel panic has been somewhat random issue.

Comment 8 Prarit Bhargava 2007-05-26 11:55:45 UTC
Milan,

Could you copy and paste the panic you're seeing in this BZ?

Thanks,

P.

Comment 9 Milan Kerslager 2007-05-29 13:19:23 UTC
I have no more info from the kernel panic except I wrote when I opened this BZ.
There were no OOPS messages, just lines I wrote above.

Comment 10 Prarit Bhargava 2007-05-29 13:28:39 UTC
Milan,

Could you boot with "debug" on the bootline and see if you get any more output?

Thanks,

P.

Comment 12 Prarit Bhargava 2007-05-29 18:43:03 UTC
Bhavana,

There seem to be a few reports of this on redhat-list as well.

https://www.redhat.com/archives/redhat-list/2007-May/msg00108.html

Any ideas?  I'm doing a binary search through kernel versions now.

P.

Comment 13 Bhavna Sarathy 2007-05-29 21:08:21 UTC
Prarit, I'm adding Aris as he worked on the original back port of the EDAC modules.

Comment 16 Prarit Bhargava 2007-05-30 13:25:59 UTC
Milan, please pass "panic_on_ue=0 debug" on the bootline.  When your system
boots up, please attach the entire dmesg in this BZ.

Thanks,

P.

Comment 22 greg matthews 2007-06-05 16:11:51 UTC
I dont know if this is the same bug but I'm also seeing EDAC errors since
upgrading to this kernel (2.6.9-55). This is dual processor, dual core AMD 275.
The errors in dmesg look like this:

EDAC k8 MC0: general bus error: participating processor(local node origin),
time-out(no timeout) memory transaction type(generic read), mem or i/o(mem
access), cache level(generic)
MC0: CE page 0xfa77f, offset 0x130, grain 8, syndrome 0x2242, row 1, channel 0,
label "": k8_edac
MC0: CE - no information available: k8_edac Error Overflow set
EDAC k8 MC0: extended error code: ECC chipkill x4 error


Comment 24 Aristeu Rozanski 2007-06-05 18:14:50 UTC
Greg Matthews, please check if your memory modules are OK with a tool like
memtest86+. We introduced EDAC support in RHEL-4.5 and it's possible that there're
a hardware problem and you just didn't knew because EDAC wasn't enabled.
Once you're sure the memory modules are OK and if the error persists, let me know.


Comment 33 greg matthews 2007-06-08 09:38:58 UTC
(In reply to comment #24)
This was indeed a bad memory module. It has now been replaced and the errors are
no longer generated. Thanks for the steer.


Comment 37 Prarit Bhargava 2007-06-11 12:28:29 UTC
As per comment #33, this is NOTABUG.

P.


Note You need to log in before you can comment on or make changes to this bug.