Bug 452986 - Only one logical CPU seen on beta Tylersburg-EP systems
Summary: Only one logical CPU seen on beta Tylersburg-EP systems
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.7
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Peter Martuccelli
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On: 452912
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-06-26 13:33 UTC by Peter Martuccelli
Modified: 2008-07-08 13:46 UTC (History)
7 users (show)

Fixed In Version: 4.7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-08 13:46:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Peter Martuccelli 2008-06-26 13:33:57 UTC
+++ This bug was initially created as a clone of Bug #452912 +++

Description of problem:
When booting either the bare metal or Xen 2.6.18-92.1.6 kernel, only one CPU is
seen on the new beta Tylersburg-EP systems. There should be 16 CPUs with HT
enabled as they're dual quad-core boxes.

Version-Release number of selected component (if applicable):
Kernel and Kernel-xen 2.6.18-92.1.6

How reproducible:
Every time

Steps to Reproduce:
1. Install RHEL5.2
2. Boot system
3. Only CPU0 can be seen.
4. Update to latest kernels
5. Still only one CPU seen.  

Actual results:
1 logical CPU

Expected results:
16 logical CPUs

Additional info:
The system showed 1 logical CPU with the GA 5.2 xen kernel as well. I looked
through the BIOS, but couldn't see any setting that would account for the single
CPU state.

-- Additional comment from gcase on 2008-06-25 16:50 EST --
Turbo mode and HT BIOS settings don't seem to affect this. I tried turning them
off but still only 1 CPU is seen. 

There are interesting dmesg entries including a soft lockup, and MCE error and
the phrase "weird, boot CPU (#0) not listed by the BIOS". Will the MCE bank
error be addressed by "Bugzilla Bug 446673: FEAT: RHEL 5.3 extend MCE banks
support for Dunnington, Nehalem, and beyond"? I'm attaching dmesg and a
sosreport, but here's the relevant excerpt from dmesg that I'm talking about.


CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 256K
CPU: L3 cache: 8192K
using mwait in idle threads.
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
MCE: warning: using only 9 banks
CPU0: Thermal monitoring enabled (TM1)
SMP alternatives: switching to UP code
Freeing SMP alternatives: 32k freed
weird, boot CPU (#0) not listed by the BIOS.
SMP motherboard not detected.
Using local APIC timer interrupts.
result 8333944
Detected 8.333 MHz APIC timer.
testing NMI watchdog ... OK.
SMP disabled
Brought up 1 CPUs
testing NMI watchdog ... <3>BUG: soft lockup - CPU#0 stuck for 10s! [swapper:1]
CPU 0:
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18-92.1.6.el5 #1
RIP: 0010:[<ffffffff8000c680>]  [<ffffffff8000c680>] __delay+0x6/0x10
RSP: 0000:ffff810110ad9e98  EFLAGS: 00000283
RAX: 000000000008d14c RBX: 0000000000002710 RCX: 00000000236ddd4e
RDX: 0000000000000030 RSI: ffff810110a562c0 RDI: 0000000000252b88
RBP: ffffffff803e5c5c R08: 0000000000000006 R09: ffff8100010203d4
R10: 0000000000000097 R11: ffffffff8015c24b R12: ffffffff803e5c5c
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff8039f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff803e5bc0>] check_nmi_watchdog+0x12e/0x1ca
 [<ffffffff803e4f26>] smp_cpus_done+0x25/0x2c
 [<ffffffff803db956>] init+0xf9/0x2f7
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff803db85d>] init+0x0/0x2f7
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

OK.
time.c: Using 1.193182 MHz WALL PIT GTOD PIT/TSC timer.
time.c: Detected 2400.178 MHz processor.
sizeof(vma)=176 bytes
sizeof(page)=56 bytes
sizeof(inode)=560 bytes
sizeof(dentry)=216 bytes
sizeof(ext3inode)=760 bytes
sizeof(buffer_head)=96 bytes
sizeof(skbuff)=240 bytes


-- Additional comment from gcase on 2008-06-25 16:52 EST --
Created an attachment (id=310295)
5.2 non-xen dmesg 

This is the entire dmesg output from 2.6.18-92.1.6 non-xen

-- Additional comment from gcase on 2008-06-25 16:54 EST --
Created an attachment (id=310296)
sosreport from affected machine

SOSreport from the Tylersburg-EP beta system

-- Additional comment from gcase on 2008-06-25 17:11 EST --
Toggling the "Limit CPUID Maximum" setting in BIOS also appears to have no
effect on this issue.


-- Additional comment from jvillalo on 2008-06-25 17:27 EST --
This is a known issue with the current SuperMicro BIOS ACPI Tables.  I have read
that SuperMicro is currently working on this issue.

Recommend filing an issue with Intel Premier support so that you will be
notified as soon as a new BIOS is available.

Comment 1 Peter Martuccelli 2008-07-03 15:04:05 UTC
We have received a BIOS update, testing on one Intel SDV system w/ the
supermicro BIOS showed all 16 CPUS.  Will be upgrading the remaining SDVs and
racking them for testing.

Comment 2 Peter Martuccelli 2008-07-08 13:46:09 UTC
Remaining systems have been upgraded.  Closing out the request as current
release as the BIOS update is available from Intel.



Note You need to log in before you can comment on or make changes to this bug.