Bug 174991 - Kernel hangs when booting on a system with more than 8 logical processors.
Summary: Kernel hangs when booting on a system with more than 8 logical processors.
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Brian Maly
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-12-05 17:03 UTC by Paul Waterman
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-04-19 19:54:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Paul Waterman 2005-12-05 17:03:14 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915

Description of problem:
Kernel hangs when booting on a system with more than 8 logical processors.

When booting RHEL 3.0 on an x86_64 system with more than 8 logical processors, the kernel will hang. This occurs for both the boot kernel and the default kernel.

This problem was observed on a S3E3143 model Intel Software Development Platform, which has four dual core Intel Xeon (Paxville MP) processors. 

When running with hyperthreading turned off, this system has only eight logical processors, and the kernel boots and runs fine.

When running with hyperthreading turned on, this system has sixteen logical processors, and the hang on boot is observed.

The following is output of an attempted PXE-based kickstart with hyperthreading turned on:

---begin---
Broadcom UNDI PXE-2.1 v7.7.5
Copyright (C) 2000-2004 Broadcom Corporation
Copyright (C) 1997-2000 Intel Corporation
All rights reserved.
booting x86_64 kernel...
CLIENT MAC ADDR: 00 0E 0C 42 93 50  GUID: A8271196 F8FC 11D9 B138 000BAB01F3DF
CLIENT IP: 10.17.255.45  MASK: 255.255.255.0  DHCP IP: 10.17.255.28
GATEWAY IP: 10.17.255.254

PXELINUX 2.08 2003-12-12  Copyright (C) 1994-2003 H. Peter Anvin
UNDI data segment at:   00093990
UNDI data segment size: 4EF0
UNDI code segment at:   00098880
UNDI code segment size: 6A48
PXE entry point found (we hope) at 9888:00DA
My IP address seems to be 0A11FF2D 10.17.255.45
ip=10.17.255.45:10.17.192.11:10.17.255.254:255.255.255.0
TFTP prefix:
Trying to load: pxelinux.cfg/01-00-0e-0c-42-93-50
Trying to load: pxelinux.cfg/0A11FF2D
Loading rhel-3.0-u5-64/vmlinuz..........................
Loading rhel-3.0-u5-64/initrd.img......................................................
Ready.
.
Decompressing Linux...done.
Booting the kernel.
----end----

The system hangs at this point.

This exact same kickstart runs completely normally when hyperthreading is disabled, and the system runs fine after it is loaded. If hyperthreading is subsequently turned back on, the hang occurs again, as follows:

---begin---
  Booting 'Red Hat Enterprise Linux AS (2.4.21-37.EL)'
booting x86_64 kernel...
root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83
kernel /vmlinuz-2.4.21-37.EL ro root=LABEL=/
   [Linux-bzImage, setup=0x1400, size=0x162755]
initrd /initrd-2.4.21-37.EL.img
   [Linux-initrd@ 0x37f34000, 0xbb538 bytes]

.
Decompressing Linux...done.
Booting the kernel.
----end----

The system hangs at this point.

Note that the first transcript is for an attempt with Update 5. The second is for Update 6. Both hang in the same manner and at the same point.

Note too that the 32-bit (x86) kernels work fine.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Attempt to boot RHEL 3.0 x86_64 on a system with more than eight logical processors.


Actual Results:  The system hangs.

Expected Results:  The system should boot.

Additional info:

Please note that I have opened a separate feature request bug (Bug 174759) requesting support for more than eight logical processors in x86_64. If that work is completed, it should also resolve this bug. If that work is not completed, however, the problem of hang on boot should still be addressed.

Comment 1 Jim Paradis 2005-12-07 21:32:44 UTC
Could you try this again as follows:  set up for serial console capture, then
boot with the option "earlyprintk=ttyS0,115200" (or use your favorite baud
rate).  This should hopefully produce more output between the "Booting" message
and the hang.


Comment 2 Paul Waterman 2005-12-12 17:32:02 UTC
Unfortunately, I can't... The only system we had which had more than eight
logical processors (and the system on which this problem was observed) was a
S3E3143 model Intel Software Development Platform on loan from Intel. This box
has since been returned to Intel.

I'll check with my Intel rep and see if they can follow up on this, though...

Comment 3 Red Hat Bugzilla 2007-03-18 22:36:24 UTC
User jparadis's account has been closed


Note You need to log in before you can comment on or make changes to this bug.