Bug 124012

Summary: Sometimes only 3 CPUs reported on 2xXeon w/HT - Dell precision 450
Description Pawel Salek 2004-05-22 22:08:29 UTC
Description of problem:
Double Xeon box (Dell Precision 450) with Hyperthreading enabled
reports only three processors available.

Version-Release number of selected component (if applicable):

How reproducible:

Actual results:
/proc/cpuinfo and dmesg output attached.

Expected results:
Four logical processors should be reported.

Comment 1 Pawel Salek 2004-05-22 22:09:41 UTC
Created attachment 100464 [details]
dmesg output

Comment 2 Pawel Salek 2004-05-22 22:10:35 UTC
Created attachment 100465 [details]

Comment 3 Arjan van de Ven 2004-05-23 06:44:55 UTC
unfortionately the top of the dmesg is chopped off, which for this
case actually has the important information.
This top is also saved to the /var/log/messages file, could you cut
and paste the first bit of the bootup stuff (starting with the E820
table) to this bug ?

Comment 4 Pawel Salek 2004-05-23 07:24:55 UTC
Created attachment 100471 [details]

I attach the missing part of the boot up sequence. I guess the problem is that
one of the processors is "not responding".

Comment 5 Pawel Salek 2004-05-24 08:06:22 UTC
Messages generated on a twin machine by 2.4.20-31.9smp kernel (I seem
not to be able to attach the file: the system keeps claiming I am
logged out or I do not specify the file).

Linux version 2.4.20-31.9smp (bhcompile@daffy.perf.redhat.com) (gcc
version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)) #1 SMP Tue Apr 13
17:40:10 EDT 2004
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003ff75000 (usable)
 BIOS-e820: 000000003ff75000 - 000000003ff77000 (ACPI NVS)
 BIOS-e820: 000000003ff77000 - 000000003ff98000 (ACPI data)
 BIOS-e820: 000000003ff98000 - 0000000040000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec90000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
 BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
klogd startup succeeded
127MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000fe710
hm, page 000fe000 reserved twice.
hm, page 000ff000 reserved twice.
hm, page 000f0000 reserved twice.
On node 0 totalpages: 262005
zone(0): 4096 pages.
zone(1): 225280 pages.
zone(2): 32629 pages.
ACPI: Searched entire block, no RSDP was found.
ACPI: RSDP located at physical address c00feb90
RSD PTR  v0 [DELL  ]
__va_range(0xfd4e2, 0x68): idx=10 mapped at ffff5000
ACPI table found: RSDT v1 [DELL   WS 450  0.6]
__va_range(0xfd51a, 0x24): idx=10 mapped at ffff5000
__va_range(0xfd51a, 0x74): idx=10 mapped at ffff5000
ACPI table found: FACP v1 [DELL   WS 450  0.6]
__va_range(0xfffd1f11, 0x24): idx=10 mapped at ffff5000
__va_range(0xfffd1f11, 0xa7): idx=10 mapped at ffff5000
ACPI table found: SSDT v1 [DELL st_ex 0.4096]
__va_range(0xfd58e, 0x24): idx=10 mapped at ffff5000
__va_range(0xfd58e, 0x84): idx=10 mapped at ffff5000
ACPI table found: APIC v1 [DELL   WS 450  0.6]
__va_range(0xfd58e, 0x84): idx=10 mapped at ffff5000
LAPIC (acpi_id[0x0001] id[0x0] enabled[1])
CPU 0 (0x0000) enabledProcessor #0 Pentium 4(tm) XEON(tm) APIC version 16

LAPIC (acpi_id[0x0002] id[0x2] enabled[1])
CPU 1 (0x0200) enabledProcessor #2 Pentium 4(tm) XEON(tm) APIC version 16

LAPIC (acpi_id[0x0003] id[0x1] enabled[1])
CPU 2 (0x0100) enabledProcessor #1 Pentium 4(tm) XEON(tm) APIC version 16

LAPIC (acpi_id[0x0004] id[0x3] enabled[1])
CPU 3 (0x0300) enabledProcessor #3 Pentium 4(tm) XEON(tm) APIC version 16

IOAPIC (id[0x4] address[0xfec00000] global_irq_base[0x0])
IOAPIC (id[0x5] address[0xfec80000] global_irq_base[0x18])
IOAPIC (id[0x6] address[0xfec80800] global_irq_base[0x30])
INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0] trigger[0x0])
INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x1] trigger[0x3])
4 CPUs total
Local APIC address fee00000
__va_range(0xfd612, 0x24): idx=10 mapped at ffff5000
__va_range(0xfd612, 0x28): idx=10 mapped at ffff5000
ACPI table found: BOOT v1 [DELL   WS 450  0.6]
__va_range(0xfd63a, 0x24): idx=10 mapped at ffff5000
__va_range(0xfd63a, 0x67): idx=10 mapped at ffff5000
ACPI table found: ASF! v16 [DELL   WS 450  0.6]
ACPI: Unsupported table ASF!
Enabling the CPU's according to the ACPI table
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID: DELL     Product ID: WS 450       APIC at: 0xFEE00000
I/O APIC #4 Version 32 at 0xFEC00000.
I/O APIC #5 Version 32 at 0xFEC80000.
I/O APIC #6 Version 32 at 0xFEC80800.
Enabling APIC mode: Flat.^IUsing 3 I/O APICs
Processors: 4
Kernel command line: ro root=LABEL=/ hdc=ide-scsi
ide_setup: hdc=ide-scsi
Initializing CPU#0
Detected 2790.800 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 5570.56 BogoMIPS
Memory: 1026260k/1048020k available (1493k kernel code, 18184k
reserved, 1099k data, 156k init, 130516k highmem)
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode cache hash table entries: 65536 (order: 7, 524288 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer-cache hash table entries: 65536 (order: 6, 262144 bytes)
Page-cache hash table entries: 262144 (order: 8, 1048576 bytes)
CPU: Trace cache: 12K uops, L1 D cache: 8K

Comment 6 Pawel Salek 2004-05-28 12:12:45 UTC
It is interesting that after a reboot, four logical processors were
discovered. Looks like an intermediate problem. I could compare the
boot log with the faulty one to look for differences, if that would be

Comment 7 Len Brown 2004-05-28 15:21:54 UTC
yes, it would be useful to compare the successful and failure boot logs. 
Or you could simply attach the full dmesg for the failure and success case. 
But I expect that the only difference you'll see is that sometimes 
a processor is not responding, and sometimes it is.  The next thing 
to note is if it is always the same processor not responding; as 
well as the frequency of the failure seen on repeated reboots. 
Also, if you physically exchange package 0 and package 1, 
does the issue follow the package, or stay with a socket? 
Please verify that you're running the latest BIOS.  Might as well 
attach dmidecode output to identify it while you're at it. 
Also, can you attach the output from lspci -vv to show all the system components? 

Comment 8 Len Brown 2004-06-04 05:01:09 UTC
Is this system running the latest BIOS? 
Can you attach the output from lspci and dmidecode? 

Comment 9 Pawel Salek 2004-06-04 06:43:33 UTC
Created attachment 100858 [details]

Comment 10 Pawel Salek 2004-06-04 06:45:12 UTC
Created attachment 100859 [details]

The system is NOT running a latest bios. I guess I need to upgrade it but for
that I need to find a bootable DOS floppy and Windows to extract the bios. Stay

Comment 11 Pawel Salek 2004-06-04 08:01:05 UTC
I have upgraded the BIOS to A04 - original was. System booted up with
4 logical processors active.

Comment 12 Len Brown 2004-06-04 14:18:41 UTC
Thanks for testing w/ updated BIOS. 
As the issue was intermittant with the old BIOS, it may take a number 
of boot cycles before we have some confidence that the BIOS fixed it. 
If it comes back, please re-open this bug.