Bug 204357

Summary: Intel 946GZIS with Intel E6300 can't boot
Product: Red Hat Enterprise Linux 4 Reporter: Zenon Panoussis <redhatbugs>
Component: kernelAssignee: Peter Martuccelli <peterm>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.3CC: downloads, grgustaf
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 5.0.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-09-29 14:00:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zenon Panoussis 2006-08-28 17:35:51 UTC
Description of problem:

Attempting to boot any smp kernel hangs with
<pre>
CPU1: Intel(R) Core(TM)2 CPU   6300 @ 1.86GHz stepping 06
Total of 2 processors activated (7461.49 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=031 pin1=2 pin2=-1
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs
zapping low mappings.
checking if image is initramfs... it is
Freeing initrd memory: 379k freed
NET: Registered protocol family 16
PCI: Using MMCONFIG
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20040816
ACPI: Interpreter enabled
ACPI: USING IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
</pre>
I tried 2.6.9-22.ELsmp, 2.6.9-34.ELsmp and 2.6.9-42.0.2.ELsmp. 

With acpi=off to the kernel, 2.6.9-22 and 2.6.9-34 (smp and up) can boot, but
then they only see one CPU. 2.6.9-42.0.2 can't boot at all, not even with
acpi=off. Adding noapic makes no difference. 

I also tried the corresponding up kernels, as well as the i686 versions of
2.6.9-34 and 2.6.9-42.0.2, both up and smp. They all behave the exact same way. 

The machine is brand new and consists of motherboard, CPU, RAM, one ATA HDD, one
ATA DVD-RW, and no other devices in it. memtest86 says the RAM is good. 

Although a hardware fault is rather unlikely, I don't have a second identical
system so I cannot positively rule out the possibility. Additionally, I am using
CentOS and don't have access to real RHEL, so I can't rule out the hypothetical
chance that the error was introduced downstream. 

Obviously, the machine is not in production. If you would like to use it for
testing, it's all yours. If you'd be willing to lend me a few RHEL kernels to
test, I'm game too.

How reproducible:
Every time.

Comment 1 Zenon Panoussis 2006-08-28 20:14:40 UTC
I compiled 2.6.9-34.0.1.ELsmp locally with -march=nocona -mcpu=nocona. It
behaved as the 2.6.9-34.ELsmp, i.e. it hang without acpi=off and could boot with
acpi=off but would then only see one CPU. 

I saved the .config that was left in BUILD after rpmbuilding 2.6.9-34.0.1.ELsmp
(the last version built was largesmp) and used it unchanged to compile a vanilla
2.6.17.11. Now, that works just fine without any boot arguments and it
recognises and uses both CPUs. I guess that this rules out faulty hardware and
it also rules out a bad kernel config in the rpm packages. 

What's left to blame for the distribution kernel problems is either a bad patch
or a bug that has been fixed in the latest vanilla. My offer to you to use my
machine for testing stands, but in that case don't waste time because now that I
got it working it will soon go in production.


Comment 2 Peter Martuccelli 2006-09-29 14:00:40 UTC
Zenon if 2.6.17 worked for your system then you should use the latest FC6 kernel
which is based on 2.6.18.  Systems are certified by vendors against specific
RHEL releases, systems without that certification may or may not boot properly.
 I pinged the Intel developer to see if he had any input for you.

Glad to hear that vanilla 2.6.17 worked for you with Centos, you should consider
using Fedora.  I am closing out this request as resolved in currentrelease, as
it seems that RHEL5 will work properly based on your input.

Comment 3 Zenon Panoussis 2006-10-02 09:21:42 UTC
Ehum, FC6 does not exist yet, but there's a kernel nicknamed "fc6" at
ftp://download.fedora.redhat.com/pub/fedora/linux/core/development/x86_64/os/Fedora/RPMS/kernel-2.6.18-1.2708.fc6.x86_64.rpm
. It won't install on RHEL4 (clone) due to a pile of unsatisfied dependencies. I
installed it anyway with --nodeps and it booted OK and could see both cores. If
RHEL5 is based on this, the problem will be gone in RHEL5. 

However, RHEL4 has many years of life ahead and many Intel 946GZIS boards will
be sold in those years, also to people who can't upgrade to RHEL5. Therefore,
IMO, this bug should stay open for RHEL4 until either a kernel upgrade or a
backport fix it for this distro too. 

As for considering Fedora, I think that for most people who use RHEL, cloned or
not, Fedora is not an option. Its fast release cycles, short life span and
bleeding-edge tech simply don't match a conservative and prudent admin's
requirements in a high-demand environment. I think you should consider this when
you decide whether or not to fix the RHEL4 kernel for the 946GZIS.