Bug 89928

Summary: kernel 2.4.9-e.18enterprise can use only one CPU on 4-way Dell 6650
Product: Red Hat Enterprise Linux 2.1 Reporter: Aleksandr Brezhnev <brezhnev>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 2.1CC: msw
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 16:00:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
portion of /var/log/messages showing 2.4.9-e.18enterprise boot process
none
apcitable fixmap fix
none
dmidecode output from Dell 6650 with BIOS A09
none
cpuinfo and meminfo
none
/var/log/messages from 4-way Dell 6650 BIOS A06 HT disabled
none
e.16enterprise ht-disabled
none
messages from e.16enterprise ht-enabled
none
lspci output from one of 4-way Dell 6650 systems none

Description Aleksandr Brezhnev 2003-04-29 21:53:59 UTC
Description of problem:
Kernel 2.4.9-e.18enterprise can't use more than one cpu on
a Dell 6650 machines with 4 Xeon 1.4GHz and 12GB RAM.

Kernel 2.4.9-e.16enterprise recognises 8 CPU if hyperthreading
is enabled or 4 CPU if it is disabled.

Version-Release number of selected component (if applicable):
kernel-2.4.9-e.18

How reproducible:
Always

Steps to Reproduce:
1. Subscribe system to RHN AS21 QU2 beta channel
2. up2date kernel-enterprise to 2.4.9-e.18
3. boot a new kernel
    

Actual Results:  System recognises only one cpu

Expected Results:  It should use 4 (or 8) cpu

Additional info:

Comment 1 Aleksandr Brezhnev 2003-04-29 21:56:00 UTC
Created attachment 91405 [details]
portion of /var/log/messages showing 2.4.9-e.18enterprise boot process

Comment 2 Arjan van de Ven 2003-04-29 21:57:26 UTC
/me eyes the summit updates

Comment 3 Jason Baron 2003-05-01 15:49:10 UTC
ok, i booted e.18 on two 6650s:

1) perf70, 4 cpus, 16GB
2) build-base, 8 cpus, 8 GB

so i'm really confused now. perhaps somehow the up kernel is running????

Comment 4 Michael K. Johnson 2003-05-01 15:55:54 UTC
(just pointing out that those cpu numbers are logical CPUs with HT,
we know that an 6650 can only have 4 physical CPUs...)

Comment 5 Michael K. Johnson 2003-05-01 15:57:56 UTC
dmidecode will give BIOS version numbers, we should probably compare.

Comment 6 Jason Baron 2003-05-01 16:53:06 UTC
we have BIOS version: A06

Comment 7 Aleksandr Brezhnev 2003-05-01 17:25:02 UTC
My system has A09.

Comment 8 Michael K. Johnson 2003-05-01 17:56:08 UTC
Do you see the same problem both with hot and cold reboots?

Comment 9 Matt Wilson 2003-05-01 18:14:58 UTC
Created attachment 91451 [details]
apcitable fixmap fix

Comment 10 Matt Wilson 2003-05-01 18:15:35 UTC
I have just attached a patch that *might* fix this problem.  If you can build a
kernel with the change and test it to verfiy I would appreciate it.


Comment 11 Aleksandr Brezhnev 2003-05-01 18:46:20 UTC
Created attachment 91453 [details]
dmidecode output from Dell 6650 with BIOS A09

I made cold and hot reboots with e.18enterprise and Dell BIOS A09 and A06.
No difference. The kernel supports only one CPU.
I attached dmidecode output.

Comment 12 Aleksandr Brezhnev 2003-05-01 21:11:36 UTC
Created attachment 91458 [details]
cpuinfo and meminfo

I built a custom kernel with apcitable patch but nothing changed.
See the output from uname -a, cat /proc/cpuinfo and cat /proc/meminfo in 
the attachment.

Comment 13 Aleksandr Brezhnev 2003-05-01 21:17:06 UTC
Some extraction from the system running e.18custom /var/log/messages:

May  1 16:57:28 qaddb2 kernel: ACPI: Searched entire block, no RSDP was found.
May  1 16:57:28 qaddb2 kernel: ACPI: RSDP located at physical address c00fdc20
May  1 16:57:28 qaddb2 kernel: RSD PTR  v0 [DELL  ]
May  1 16:57:28 qaddb2 kernel: ACPI table found: RSDT v1 [DELL   PE6650   0.1]
May  1 16:57:28 qaddb2 kernel: init.c:148: bad pte c0004fb0(00000000000fd163).
May  1 16:57:28 qaddb2 kernel: init.c:148: bad pte c0004fb0(00000000000fd163).
May  1 16:57:28 qaddb2 kernel: ACPI table found: FACP v1 [DELL   PE6650   0.1]
May  1 16:57:28 qaddb2 kernel: init.c:148: bad pte c0004fb0(00000000000fd163).
May  1 16:57:28 qaddb2 kernel: init.c:148: bad pte c0004fb0(00000000000fd163).
May  1 16:57:28 qaddb2 kernel: ACPI table found: APIC v1 [DELL   PE6650   0.1]
May  1 16:57:28 qaddb2 kernel: init.c:148: bad pte c0004fb0(00000000000fd163).
May  1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0001] id[0x0] enabled[1])
May  1 16:57:28 qaddb2 kernel: CPU 0 (0x0000) enabledProcessor #0 Unknown CPU
[15:1] APIC version 16
May  1 16:57:28 qaddb2 kernel: 
May  1 16:57:28 qaddb2 rpc.statd[856]: Version 0.3.3 Starting
May  1 16:57:28 qaddb2 nfslock: rpc.statd startup succeeded
May  1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0002] id[0x2] enabled[1])
May  1 16:57:28 qaddb2 kernel: CPU 1 (0x0200) enabledProcessor #2 Unknown CPU
[15:1] APIC version 16
May  1 16:57:28 qaddb2 kernel: 
May  1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0003] id[0x4] enabled[1])
May  1 16:57:28 qaddb2 kernel: CPU 2 (0x0400) enabledProcessor #4 Unknown CPU
[15:1] APIC version 16
May  1 16:57:28 qaddb2 kernel: 
May  1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0004] id[0x6] enabled[1])
May  1 16:57:28 qaddb2 kernel: CPU 3 (0x0600) enabledProcessor #6 Unknown CPU
[15:1] APIC version 16
May  1 16:57:28 qaddb2 kernel: 
May  1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0005] id[0x1] enabled[1])
May  1 16:57:28 qaddb2 kernel: CPU 4 (0x0100) enabledProcessor #1 Unknown CPU
[15:1] APIC version 16
May  1 16:57:28 qaddb2 kernel: 
May  1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0006] id[0x3] enabled[1])
May  1 16:57:28 qaddb2 kernel: CPU 5 (0x0300) enabledProcessor #3 Unknown CPU
[15:1] APIC version 16
May  1 16:57:28 qaddb2 kernel: 
May  1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0007] id[0x5] enabled[1])
May  1 16:57:28 qaddb2 kernel: CPU 6 (0x0500) enabledProcessor #5 Unknown CPU
[15:1] APIC version 16
May  1 16:57:28 qaddb2 kernel: 
May  1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0008] id[0x7] enabled[1])
May  1 16:57:28 qaddb2 kernel: CPU 7 (0x0700) enabledProcessor #7 Unknown CPU
[15:1] APIC version 16


Comment 14 Matt Wilson 2003-05-01 21:21:00 UTC
What are the results with HT disabled?


Comment 15 Aleksandr Brezhnev 2003-05-01 21:47:53 UTC
Created attachment 91460 [details]
/var/log/messages from 4-way Dell 6650 BIOS A06 HT disabled

Only one cpu is usable with kernel-2.4.9-e.18custom on 4-way Dell BIOS A06 
HT disabled.

The same kernel can see 4 (virtual) cpu on 2-way Dell 2600 BIOS A02 HT enabled.

Comment 16 Matt Wilson 2003-05-01 21:59:03 UTC
can you attach the boot log from a 2.4.9-e.16enterprise kernel with both HT
enabled and disabled?


Comment 17 Aleksandr Brezhnev 2003-05-01 22:25:10 UTC
Created attachment 91464 [details]
e.16enterprise ht-disabled

e.16enterprise can see 4 cpu

Comment 18 Aleksandr Brezhnev 2003-05-01 22:27:30 UTC
Created attachment 91465 [details]
messages from e.16enterprise ht-enabled

e.16enterprise can see 8 cpu

Comment 19 Aleksandr Brezhnev 2003-05-01 22:39:20 UTC
We reproduced the problem with kernel 2.4.9-e.18enterprise 
on 4-way Dell 6650 with 1.5GHz CPUs, BIOS A09.

Comment 20 Matt Wilson 2003-05-02 02:37:47 UTC
these are two different machines, right -- just to be paranoid, did you test
booting 2.4.9-e.18enterprise on qaddb2 (where 2.4.9-e.16enterprise is working)


Comment 21 Aleksandr Brezhnev 2003-05-02 12:17:37 UTC
Yes, I tried e.18enterprise on qaddb2. It does not work.


Comment 22 Matt Wilson 2003-05-02 16:34:17 UTC
can you back out the linux-2.4.9e12_summit-2003-03-14.patch that is included in
the kernel-2.4.9-e.18.src.rpm?  Just 'patch -p1 -R <
linux-2.4.9e12_summit-2003-03-14.patch' in the /usr/src/linux-2.4.9 directory

Comment 23 Aleksandr Brezhnev 2003-05-02 18:24:34 UTC
I rolled back linux-2.4.9_e16-summit-2003-03-14.patch and the kernel
recognized all CPU.

Comment 24 Doug Ledford 2003-05-02 20:44:54 UTC
I need to know if the i686 2.4.9-e.18smp kernel sees all the CPUs (use the stock
kernel, not the one with the summit patch backed out).

Comment 25 Matt Wilson 2003-05-02 20:46:33 UTC
that is, the "smp" kernel instead of the "enterprise" kernel, right doug?

Comment 26 Doug Ledford 2003-05-02 21:20:10 UTC
Yep.

Comment 27 Aleksandr Brezhnev 2003-05-03 14:38:24 UTC
Stock e.18smp can see only one cpu.

Comment 28 Jason Baron 2003-05-05 17:11:42 UTC
we separted out the summit patches, can we try the latest kernel build on 

porkchop://mnt/redhat/beehive/comps/dist/2.1AS-errata-candidate/kernel/2.4.9-e.18.6/i686/kernel-enterprise-2.4.9-e.18.6.i686.rpm

Comment 30 Aleksandr Brezhnev 2003-05-05 18:12:50 UTC
2.4.9-e.18.6enterprise can see all CPU on Dell 6650 with 1.5GHz Xeons.
I think it should be fine with 1.4GHz also but my test systems are busy now.
I will check ASAP.

Comment 31 Eric Hagberg 2003-05-05 18:17:09 UTC
I wonder what the difference between your system and mine is - as I have a 6650
4-way that can see all the virtual cpus just fine. It has bios version A09,
dated 03/04/2003.

Comment 32 Matt Wilson 2003-05-05 18:19:37 UTC
it's very likely that the PCI configuration (number of cards in slots, types of
cards) could cause a change in behavior in this area.


Comment 33 Aleksandr Brezhnev 2003-05-05 18:26:43 UTC
I don't know what is the difference. I have four 4-way systems with 1.4HGz cpus
and two 4-way systems with 1.5GHz cpus here. All of them behaive the same way.
I tried them with BIOS A09 3/4/2003 and A06. No difference.

Comment 34 Aleksandr Brezhnev 2003-05-05 18:29:22 UTC
Created attachment 91501 [details]
lspci output from one of 4-way Dell 6650 systems

You can also look at the attachment #91453 [details]. It is dmidecode output.

Comment 35 Aleksandr Brezhnev 2003-05-06 13:17:47 UTC
I checked 2.4.9-e.18.6enterprise on all my Dell 6650 systems.
Everything looks good. Thanks guys!  

Comment 36 Eric Hagberg 2003-05-06 19:05:34 UTC
Are you saying that the summit patches will again be ripped out and used to
generate a separate kernel?

Or is someone looking at getting this fixed before QU2 is rolled out? Many
disappointed customers would have to deal with a separate summit kernel still.

Is this the only issue holding up QU2's release, out of curiousity?

Comment 37 Matt Wilson 2003-05-06 19:12:02 UTC
We had never planned to have a unified summit kernel for the QU2 timeframe.  We
were hoping to clean things up a bit by being able to apply the summit patch to
the kernel source (thus to the generic kernel-soruce package).

As you can see from just the instability that the generic changes that the
summit patch introduced to the non-summit codepaths (i.e., those not part of
#ifdef CONFIG_SUMMIT or similar blocks), it would be ireesponsible for us to
integrate the summit changes into the main kernels.  Even if the non-summit
paths are fixed, we still would have to deal with vetting the summit codepaths
to ensure that they will not also destabilize the non-summit path if they're not
excluded via a compile time option.

This is no trivial task.


Comment 38 Eric Hagberg 2003-05-06 20:27:17 UTC
I wasn't saying it was easy, but was under the impression that since the summit
kernel has been separate for about a year and the plan (as told to me over a
year ago) was to move toward summit support in the enterprise kernel, that the
mention of summit patches above were an indication that the merge was happening
for QU2.

Sorry.

Comment 39 Matt Wilson 2003-05-06 20:47:41 UTC
Right, currently we're investigating xapic support (and thus summit support) in
non-summit kernels for RHEL 3.


Comment 40 Jiri Pallich 2012-06-20 16:00:45 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.