Bug 455310 - LS21 locks up booting MRG RT kernel
Summary: LS21 locks up booting MRG RT kernel
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: 1.0
Hardware: All
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Red Hat Real Time Maintenance
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-07-14 19:29 UTC by Clark Williams
Modified: 2008-08-14 21:31 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-08-14 21:31:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Boot log for LS21 lockup (4.74 KB, text/plain)
2008-07-14 19:29 UTC, Clark Williams
no flags Details

Description Clark Williams 2008-07-14 19:29:41 UTC
Description of problem:
RT kernel fails to boot on blade1 of HSV bladecenter (blade2 boots the kernel
and runs fine). Same blade boots and runs RHEL5.2 kernel.

Version-Release number of selected component (if applicable):

kernel-rt-2.6.24.7-72.el5rt

How reproducible:

Every time.

Steps to Reproduce:
1. Install RHEL5.2
2. Install MRG RT kernel
3. Boot RT kernel
  
Actual results:

Kernel hangs after reporting amount of memory available (see attached console
output).


Expected results:

Running kernel

Additional info:

Debbugging printk's indicate that the hang is occuring in
calibrate_delay_direct(). Jiffies are not incrementing, so the calibration loop
never terminates.

Comment 1 Clark Williams 2008-07-14 19:29:41 UTC
Created attachment 311759 [details]
Boot log for LS21 lockup

Comment 2 Clark Williams 2008-07-14 21:17:04 UTC
I swapped the two LS21's that were in slots 1 & 2 and the failing blade
(formerly in slot 1) reported double bit errors on DIMM slots 5 & 6, disabled
the two slots and then booted on up. 

Here's a cut-n-paste from the web interface to the event log:

1  E  BLADE_02 	 07/14/08, 21:07:55 	(SN#YK10A269W03L) DIMM number 5 failed.
2  E  BLADE_02 	 07/14/08, 21:07:55 	(SN#YK10A269W03L) POSTBIOS: 289 Board 1
DIMM Pair 3 Double Bit Error.
3  E  BLADE_02 	 07/14/08, 21:07:54 	(SN#YK10A269W03L) DIMM number 6 failed.
4  E  BLADE_02 	 07/14/08, 21:07:54 	(SN#YK10A269W03L) POSTBIOS: 289 Board 1
DIMM Pair 3 Double Bit Error.
5  I  BLADE_02 	 07/14/08, 21:07:25 	(SN#YK10A269W03L) System Reboot



Comment 3 Clark Williams 2008-08-14 21:31:18 UTC
Closing due to confirmed h/w error


Note You need to log in before you can comment on or make changes to this bug.