Bug 438571 - [5.2] hp-bl460c-01 not installable since RHEL5.2-Server-20080320.0
[5.2] hp-bl460c-01 not installable since RHEL5.2-Server-20080320.0
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Prarit Bhargava
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-21 23:30 EDT by CAI Qian
Modified: 2008-03-24 11:05 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-03-24 09:23:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sosreport (2.15 MB, application/octet-stream)
2008-03-21 23:30 EDT, CAI Qian
no flags Details

  None (edit)
Description CAI Qian 2008-03-21 23:30:54 EDT
Description of problem:
It is not possible to install hp-bl460c-01.rhts.boston.redhat.com with
RHEL5.2-Server-20080320.0 tree in RHTS, although it is fine with
RHEL5.2-Server-20080313.1.

Running anaconda, the Red Hat Enterprise Linux Server system installer - please
wait...
Probing for video card:   ATI Technologies Inc ES1000
CPU 3: Machine Check Exception: 0000000000000005
CPU 2: Machine Check Exception: 0000000000000004
Uhhuh. NMI received for unknown reason b1 on CPU 0.
You probably have a hardware problem with your RAM chips
Bank 4: b200000000060151
Dazed and confused, but trying to continue
Bank 5: b20000300c000e0f
Kernel panic - not syncing: CPU context corrupt

How reproducible:
Always
Comment 1 CAI Qian 2008-03-21 23:30:54 EDT
Created attachment 298819 [details]
sosreport
Comment 7 Prarit Bhargava 2008-03-24 09:23:44 EDT
Jeff Burke cannot reproduce this issue.

NOTABUG.

P.
Comment 8 Don Zickus 2008-03-24 10:20:23 EDT
Well NOTABUG is not true.  This is a bug.  Just not in the software but probably
the hardware.  In fact it is probably transient which is why people can't
reproduce it.  I mean corrupted memory bits only happen once in a blue moon, so
unless you test this a billion times a row you may never reproduce this problem.  

The odd thing about this problem is that EDAC should have diagnosed and possibly
fixed this issue (that's the whole reason for its existence is to catch/handle
the memory problems).  Perhaps that is where the software bug is.  

I'll talk to Aris about this.  But unfortunately I don't expect much to come out
of it.  

Cai, please continue to file these types of reports because problems like this
are trappable by the kernel and should be recoverable too, I think.

Cheers,
Don
Comment 9 Prarit Bhargava 2008-03-24 10:33:14 EDT
Really this should have been closed as INSUFFICIENT_DATA -- I was being lazy ;)

P.
Comment 11 Tony Camuso 2008-03-24 11:05:45 EDT
Please try re-seating the memory DIMMs.

This system was working fine when I installed it and for at least a few days
afterwards. 

I don't think the different kernel versions matter as much as what may be an
intermittent contact on the DIMMs or even a flakey DIMM.

Note You need to log in before you can comment on or make changes to this bug.