Bug 455310

Summary: LS21 locks up booting MRG RT kernel
Product: Red Hat Enterprise MRG Reporter: Clark Williams <williams>
Component: realtime-kernelAssignee: Red Hat Real Time Maintenance <rt-maint>
Status: CLOSED NOTABUG QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: 1.0CC: bhu
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-08-14 21:31:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Boot log for LS21 lockup none

Description Clark Williams 2008-07-14 19:29:41 UTC
Description of problem:
RT kernel fails to boot on blade1 of HSV bladecenter (blade2 boots the kernel
and runs fine). Same blade boots and runs RHEL5.2 kernel.

Version-Release number of selected component (if applicable):

kernel-rt-2.6.24.7-72.el5rt

How reproducible:

Every time.

Steps to Reproduce:
1. Install RHEL5.2
2. Install MRG RT kernel
3. Boot RT kernel
  
Actual results:

Kernel hangs after reporting amount of memory available (see attached console
output).


Expected results:

Running kernel

Additional info:

Debbugging printk's indicate that the hang is occuring in
calibrate_delay_direct(). Jiffies are not incrementing, so the calibration loop
never terminates.

Comment 1 Clark Williams 2008-07-14 19:29:41 UTC
Created attachment 311759 [details]
Boot log for LS21 lockup

Comment 2 Clark Williams 2008-07-14 21:17:04 UTC
I swapped the two LS21's that were in slots 1 & 2 and the failing blade
(formerly in slot 1) reported double bit errors on DIMM slots 5 & 6, disabled
the two slots and then booted on up. 

Here's a cut-n-paste from the web interface to the event log:

1  E  BLADE_02 	 07/14/08, 21:07:55 	(SN#YK10A269W03L) DIMM number 5 failed.
2  E  BLADE_02 	 07/14/08, 21:07:55 	(SN#YK10A269W03L) POSTBIOS: 289 Board 1
DIMM Pair 3 Double Bit Error.
3  E  BLADE_02 	 07/14/08, 21:07:54 	(SN#YK10A269W03L) DIMM number 6 failed.
4  E  BLADE_02 	 07/14/08, 21:07:54 	(SN#YK10A269W03L) POSTBIOS: 289 Board 1
DIMM Pair 3 Double Bit Error.
5  I  BLADE_02 	 07/14/08, 21:07:25 	(SN#YK10A269W03L) System Reboot



Comment 3 Clark Williams 2008-08-14 21:31:18 UTC
Closing due to confirmed h/w error