Bug 438571

Summary: [5.2] hp-bl460c-01 not installable since RHEL5.2-Server-20080320.0
Product: Red Hat Enterprise Linux 5 Reporter: Qian Cai <qcai>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: arozansk, dzickus, jburke, rpacheco
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-03-24 13:23:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sosreport none

Description Qian Cai 2008-03-22 03:30:54 UTC
Description of problem:
It is not possible to install hp-bl460c-01.rhts.boston.redhat.com with
RHEL5.2-Server-20080320.0 tree in RHTS, although it is fine with
RHEL5.2-Server-20080313.1.

Running anaconda, the Red Hat Enterprise Linux Server system installer - please
wait...
Probing for video card:   ATI Technologies Inc ES1000
CPU 3: Machine Check Exception: 0000000000000005
CPU 2: Machine Check Exception: 0000000000000004
Uhhuh. NMI received for unknown reason b1 on CPU 0.
You probably have a hardware problem with your RAM chips
Bank 4: b200000000060151
Dazed and confused, but trying to continue
Bank 5: b20000300c000e0f
Kernel panic - not syncing: CPU context corrupt

How reproducible:
Always

Comment 1 Qian Cai 2008-03-22 03:30:54 UTC
Created attachment 298819 [details]
sosreport

Comment 7 Prarit Bhargava 2008-03-24 13:23:44 UTC
Jeff Burke cannot reproduce this issue.

NOTABUG.

P.

Comment 8 Don Zickus 2008-03-24 14:20:23 UTC
Well NOTABUG is not true.  This is a bug.  Just not in the software but probably
the hardware.  In fact it is probably transient which is why people can't
reproduce it.  I mean corrupted memory bits only happen once in a blue moon, so
unless you test this a billion times a row you may never reproduce this problem.  

The odd thing about this problem is that EDAC should have diagnosed and possibly
fixed this issue (that's the whole reason for its existence is to catch/handle
the memory problems).  Perhaps that is where the software bug is.  

I'll talk to Aris about this.  But unfortunately I don't expect much to come out
of it.  

Cai, please continue to file these types of reports because problems like this
are trappable by the kernel and should be recoverable too, I think.

Cheers,
Don


Comment 9 Prarit Bhargava 2008-03-24 14:33:14 UTC
Really this should have been closed as INSUFFICIENT_DATA -- I was being lazy ;)

P.

Comment 11 Tony Camuso 2008-03-24 15:05:45 UTC
Please try re-seating the memory DIMMs.

This system was working fine when I installed it and for at least a few days
afterwards. 

I don't think the different kernel versions matter as much as what may be an
intermittent contact on the DIMMs or even a flakey DIMM.