Bug 89928
Description
Aleksandr Brezhnev
2003-04-29 21:53:59 UTC
Created attachment 91405 [details]
portion of /var/log/messages showing 2.4.9-e.18enterprise boot process
/me eyes the summit updates ok, i booted e.18 on two 6650s: 1) perf70, 4 cpus, 16GB 2) build-base, 8 cpus, 8 GB so i'm really confused now. perhaps somehow the up kernel is running???? (just pointing out that those cpu numbers are logical CPUs with HT, we know that an 6650 can only have 4 physical CPUs...) dmidecode will give BIOS version numbers, we should probably compare. we have BIOS version: A06 My system has A09. Do you see the same problem both with hot and cold reboots? Created attachment 91451 [details]
apcitable fixmap fix
I have just attached a patch that *might* fix this problem. If you can build a kernel with the change and test it to verfiy I would appreciate it. Created attachment 91453 [details]
dmidecode output from Dell 6650 with BIOS A09
I made cold and hot reboots with e.18enterprise and Dell BIOS A09 and A06.
No difference. The kernel supports only one CPU.
I attached dmidecode output.
Created attachment 91458 [details]
cpuinfo and meminfo
I built a custom kernel with apcitable patch but nothing changed.
See the output from uname -a, cat /proc/cpuinfo and cat /proc/meminfo in
the attachment.
Some extraction from the system running e.18custom /var/log/messages: May 1 16:57:28 qaddb2 kernel: ACPI: Searched entire block, no RSDP was found. May 1 16:57:28 qaddb2 kernel: ACPI: RSDP located at physical address c00fdc20 May 1 16:57:28 qaddb2 kernel: RSD PTR v0 [DELL ] May 1 16:57:28 qaddb2 kernel: ACPI table found: RSDT v1 [DELL PE6650 0.1] May 1 16:57:28 qaddb2 kernel: init.c:148: bad pte c0004fb0(00000000000fd163). May 1 16:57:28 qaddb2 kernel: init.c:148: bad pte c0004fb0(00000000000fd163). May 1 16:57:28 qaddb2 kernel: ACPI table found: FACP v1 [DELL PE6650 0.1] May 1 16:57:28 qaddb2 kernel: init.c:148: bad pte c0004fb0(00000000000fd163). May 1 16:57:28 qaddb2 kernel: init.c:148: bad pte c0004fb0(00000000000fd163). May 1 16:57:28 qaddb2 kernel: ACPI table found: APIC v1 [DELL PE6650 0.1] May 1 16:57:28 qaddb2 kernel: init.c:148: bad pte c0004fb0(00000000000fd163). May 1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0001] id[0x0] enabled[1]) May 1 16:57:28 qaddb2 kernel: CPU 0 (0x0000) enabledProcessor #0 Unknown CPU [15:1] APIC version 16 May 1 16:57:28 qaddb2 kernel: May 1 16:57:28 qaddb2 rpc.statd[856]: Version 0.3.3 Starting May 1 16:57:28 qaddb2 nfslock: rpc.statd startup succeeded May 1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0002] id[0x2] enabled[1]) May 1 16:57:28 qaddb2 kernel: CPU 1 (0x0200) enabledProcessor #2 Unknown CPU [15:1] APIC version 16 May 1 16:57:28 qaddb2 kernel: May 1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0003] id[0x4] enabled[1]) May 1 16:57:28 qaddb2 kernel: CPU 2 (0x0400) enabledProcessor #4 Unknown CPU [15:1] APIC version 16 May 1 16:57:28 qaddb2 kernel: May 1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0004] id[0x6] enabled[1]) May 1 16:57:28 qaddb2 kernel: CPU 3 (0x0600) enabledProcessor #6 Unknown CPU [15:1] APIC version 16 May 1 16:57:28 qaddb2 kernel: May 1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0005] id[0x1] enabled[1]) May 1 16:57:28 qaddb2 kernel: CPU 4 (0x0100) enabledProcessor #1 Unknown CPU [15:1] APIC version 16 May 1 16:57:28 qaddb2 kernel: May 1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0006] id[0x3] enabled[1]) May 1 16:57:28 qaddb2 kernel: CPU 5 (0x0300) enabledProcessor #3 Unknown CPU [15:1] APIC version 16 May 1 16:57:28 qaddb2 kernel: May 1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0007] id[0x5] enabled[1]) May 1 16:57:28 qaddb2 kernel: CPU 6 (0x0500) enabledProcessor #5 Unknown CPU [15:1] APIC version 16 May 1 16:57:28 qaddb2 kernel: May 1 16:57:28 qaddb2 kernel: LAPIC (acpi_id[0x0008] id[0x7] enabled[1]) May 1 16:57:28 qaddb2 kernel: CPU 7 (0x0700) enabledProcessor #7 Unknown CPU [15:1] APIC version 16 What are the results with HT disabled? Created attachment 91460 [details]
/var/log/messages from 4-way Dell 6650 BIOS A06 HT disabled
Only one cpu is usable with kernel-2.4.9-e.18custom on 4-way Dell BIOS A06
HT disabled.
The same kernel can see 4 (virtual) cpu on 2-way Dell 2600 BIOS A02 HT enabled.
can you attach the boot log from a 2.4.9-e.16enterprise kernel with both HT enabled and disabled? Created attachment 91464 [details]
e.16enterprise ht-disabled
e.16enterprise can see 4 cpu
Created attachment 91465 [details]
messages from e.16enterprise ht-enabled
e.16enterprise can see 8 cpu
We reproduced the problem with kernel 2.4.9-e.18enterprise on 4-way Dell 6650 with 1.5GHz CPUs, BIOS A09. these are two different machines, right -- just to be paranoid, did you test booting 2.4.9-e.18enterprise on qaddb2 (where 2.4.9-e.16enterprise is working) Yes, I tried e.18enterprise on qaddb2. It does not work. can you back out the linux-2.4.9e12_summit-2003-03-14.patch that is included in the kernel-2.4.9-e.18.src.rpm? Just 'patch -p1 -R < linux-2.4.9e12_summit-2003-03-14.patch' in the /usr/src/linux-2.4.9 directory I rolled back linux-2.4.9_e16-summit-2003-03-14.patch and the kernel recognized all CPU. I need to know if the i686 2.4.9-e.18smp kernel sees all the CPUs (use the stock kernel, not the one with the summit patch backed out). that is, the "smp" kernel instead of the "enterprise" kernel, right doug? Yep. Stock e.18smp can see only one cpu. we separted out the summit patches, can we try the latest kernel build on porkchop://mnt/redhat/beehive/comps/dist/2.1AS-errata-candidate/kernel/2.4.9-e.18.6/i686/kernel-enterprise-2.4.9-e.18.6.i686.rpm 2.4.9-e.18.6enterprise can see all CPU on Dell 6650 with 1.5GHz Xeons. I think it should be fine with 1.4GHz also but my test systems are busy now. I will check ASAP. I wonder what the difference between your system and mine is - as I have a 6650 4-way that can see all the virtual cpus just fine. It has bios version A09, dated 03/04/2003. it's very likely that the PCI configuration (number of cards in slots, types of cards) could cause a change in behavior in this area. I don't know what is the difference. I have four 4-way systems with 1.4HGz cpus and two 4-way systems with 1.5GHz cpus here. All of them behaive the same way. I tried them with BIOS A09 3/4/2003 and A06. No difference. Created attachment 91501 [details] lspci output from one of 4-way Dell 6650 systems You can also look at the attachment #91453 [details]. It is dmidecode output. I checked 2.4.9-e.18.6enterprise on all my Dell 6650 systems. Everything looks good. Thanks guys! Are you saying that the summit patches will again be ripped out and used to generate a separate kernel? Or is someone looking at getting this fixed before QU2 is rolled out? Many disappointed customers would have to deal with a separate summit kernel still. Is this the only issue holding up QU2's release, out of curiousity? We had never planned to have a unified summit kernel for the QU2 timeframe. We were hoping to clean things up a bit by being able to apply the summit patch to the kernel source (thus to the generic kernel-soruce package). As you can see from just the instability that the generic changes that the summit patch introduced to the non-summit codepaths (i.e., those not part of #ifdef CONFIG_SUMMIT or similar blocks), it would be ireesponsible for us to integrate the summit changes into the main kernels. Even if the non-summit paths are fixed, we still would have to deal with vetting the summit codepaths to ensure that they will not also destabilize the non-summit path if they're not excluded via a compile time option. This is no trivial task. I wasn't saying it was easy, but was under the impression that since the summit kernel has been separate for about a year and the plan (as told to me over a year ago) was to move toward summit support in the enterprise kernel, that the mention of summit patches above were an indication that the merge was happening for QU2. Sorry. Right, currently we're investigating xapic support (and thus summit support) in non-summit kernels for RHEL 3. Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue. |