Bug 109550 - SMP kernel boot failure on Dell Precision 410
SMP kernel boot failure on Dell Precision 410
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
1
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-11-09 08:09 EST by Tony Goelz
Modified: 2007-11-30 17:10 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-29 15:38:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Output from acpidmp tool from pmtools (25.81 KB, text/plain)
2004-03-24 10:23 EST, Roger Tragin
no flags Details
survive after detect a garbled MADT from BIOS (1.47 KB, text/plain)
2004-04-20 03:15 EDT, Chuyee
no flags Details
Intensive check for garbled MADT (6.08 KB, patch)
2004-04-21 05:00 EDT, Chuyee
no flags Details | Diff

  None (edit)
Description Tony Goelz 2003-11-09 08:09:52 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5)
Gecko/20031007 Firebird/0.7

Description of problem:
After a fresh install of Fedora Core 1, the supplied SMP
kernel fails to even load. After the GRUB screen, I see
very first loading kernel message and the nothing else. No
OOPS, no diagnostics, nothing. The machine requires a 
reset with the hardware switch (i.e) CTL-ALT-DEL has
no effect.

The non-smp kernel boots without problems. The latest 
2.6.0-test9 SMP kernel exhibits the same non booting
behavior but the 2.6.0-test9 non-SMP kernel boots 
without problems.

The GRUB menu item for the SMP kernel is unmodified from
what was installed:


title Fedora Core (2.4.22-1.2115.nptlsmp)
        root (hd0,1)
        kernel /vmlinuz-2.4.22-1.2115.nptlsmp ro root=LABEL=/
hda=ide-scsi rhgb
        initrd /initrd-2.4.22-1.2115.nptlsmp.img



Version-Release number of selected component (if applicable):
kernel-2.4.22-1.2115.nptl

How reproducible:
Always

Steps to Reproduce:
1.Turn on computer
2.Select SMP kernel from Grub menu
3. 
    

Actual Results:  System does not boot, hangs after very first loading
kernel message

Expected Results:  System boots normally as it does with the non-smp
version of the same kernel

Additional info:

HARDWARE:
  Dell Precision Workstation 410
  Dual 500Mhz Pentium III
  512 Megabytes Memory
  440BX/ZX/DX - 82443BX/DZ/DX Host bridge
  Adaptec AHA-2940U2/U2W/7890/7891 SCSI
  Radeon 7200 Graphics card
  Sound Blaster Live Audio card
  3C905 onboard ethernet
  Two SCSI LVD Hard Drives
  Two IDE CD-ROM Drives

Other OS:
  Windows 2000 installed and boots/functions normally

I have some experience building kernels so I can test
things with some guidance.
Comment 1 Mathieu Chouquet-Stringer 2003-12-03 15:02:48 EST
Just to let you know I have the same machine and the same exact
problem (pc boots fine with non-smp kernel but refuses to boot with
the smp flavor). I just upgraded to the latest 2129 and still sees the
same thing.

The boot process stops at:
Uncompressing Linux... Ok, booting the kernel
Comment 2 lynn wheeler 2004-01-30 02:23:10 EST
My two processor Dell precision 410 also. problem exists with both
2115 smp kernel and 2149 smp kernel
Comment 3 Evan Cooper 2004-02-17 08:21:42 EST
I have a Dell Percision 610 and the exact same problem. Problem 
exists with both 2115 smp kernel and 2149 smp kernel.
Comment 4 Stephen Lawrence Jr. 2004-02-19 11:18:27 EST
I have a dell 1650 with the 2166 kernel and had the same issues with
SMP kernel not booting. It hangs during mountin file systems.
Comment 5 Tim Keitt 2004-02-20 12:53:57 EST
My Precision 650 also hangs on boot. With no added kernel parameters,
it hangs at the "initializing firewire controller" message. If I add
"nofirewire" it hangs at "setting up swap space". I've tried
"acpi=off" and "noapic" and still cannot boot the smp kernel (2166, 2174).
Comment 6 Michael Ballard 2004-03-04 19:28:43 EST
I have a Precision 410 with the same problem. I've tried ticket 109693
(http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=109693) to no avail. 
Comment 7 Michael Ballard 2004-03-05 19:50:32 EST
Ran `yum update' (with fresh FC install) and it installed kernel-smp
2.4.22-1.2174.nptl.i686 and had the same problem.
Comment 8 Need Real Name 2004-03-21 20:14:19 EST
Same system here (Precision 410), same problem.  However, disabling
ACPI in the BIOS will allow the SMP kernels to boot fine.

Strangely, though, the Knoppix 3.3 bootable CD distro can boot its SMP
kernels on this system *without* disabling ACPI in the BIOS....
Comment 9 Evan Cooper 2004-03-22 05:25:22 EST
I disabled ACPI in BIOS, and it worked as well. SMP kernel now boots 
fine. Dell Percision 610.
Comment 10 Roger Tragin 2004-03-22 11:19:39 EST
I disabled "Power Management" in the BIOS, and it did not work.  Dell
Precision 210 dual 450MHz CPU.  I tried everything mentioned in this
defect, and it does not work for my machine.
Comment 11 Michael Ballard 2004-03-22 16:10:32 EST
Decided to try installins SuSE, it froze, so I went back to FC 1. Went
to install, decided to upgrade instead of a fresh install. 

Tried the BIOS tweak above, (I already had ACPI=off in grub.conf) and
it didn't do anything. Turned ACPI back on in BIOS, and it boots fine.
It didn't boot the SMP before I tried SuSE. Not sure if trying to
install SuSE matters, or if updating mattered. 

Coworker turned ACPI off in BIOS, and it works fine.
Comment 12 lynn wheeler 2004-03-22 17:51:52 EST
I disabled ACPI in BIOS, and SMP kernel now boots on two processor
precision 410
Comment 13 Len Brown 2004-03-23 06:32:26 EST
The Dell Precision 410 has garbled ACPI tables in the BIOS: 
http://bugzilla.kernel.org/show_bug.cgi?id=1434 
 
This box is older than the ACPI cutoff date, but there is 
a bug in Fedora Core 1, fixed in later kernels, such that 
and old BIOS sets acpi_disabled=1, but does not 
clear acpi_ht=0.  So the SMP kernel proceeds to parse 
the ACPI tables to see if it can enumerate the processors 
to enable HT.  The kernel crashes parsing the tables 
before any kernel output. 
 
manual "acpi=off" on the cmdline should prevent this, 
as will booting with the BIOS ACPI support disabled. 
 
That said, I think it is possible for ACPI to be hardened 
and detect this garbled table, and keep running.  In 
this case it is a bad MADT, so ACPI would run as if 
the user typed "pci=noacpi". 
 
I'll try to make Linux detect and survive the Dell 410 table 
and will post the fix to the bug report above if the 410 owners 
would like to try it out.  Unclear if the Dell 210 is the same issue, 
if you attach the output from acpidmp from pmtools, I can check it out: 
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/ 
 
I can't explain the run-time Dell 650 and 1650 failures above -- 
different bug.  Let me know if they work with "acpi=off" 
but fail otherwise. 
 
thanks, 
-Len 
 
 
 
Comment 14 Michael Ballard 2004-03-23 18:07:10 EST
This work-around is very inconsistent. It works, then it doesn't. 

Having ACPI off in BIOS and no ACPI command in GRUB will boot, then it
won't. Then maybe you'll have to turn off ACPI in BIOS, and put the
command in GRUB and it'll work, or it won't...

Doesn't seem to have any rhyme nor reason to it. I've been trying to
make sense of it all day.

Have vga=791 on one machine, coworker doesn't have that, both are just
as goofy. Same systems.
Comment 15 Roger Tragin 2004-03-24 10:23:56 EST
Created attachment 98828 [details]
Output from acpidmp tool from pmtools

If you need more information, I will do anything I can
Comment 16 Chuyee 2004-04-20 03:11:35 EDT
Hi Roger,

Your dell 210 MADT is garbled the same way as above dell 410 and 610.

ACPI: APIC (v001 DELL    WS 210  0x00000002 ASL  0x00000061) @ 0x(nil)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
ACPI: IOAPIC (id[0x02] address[0xfec00000] global_irq_base[0x0])
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x4])
Checksum OK

Since Len said there was a "disable acpi_ht" bug in fedora core 1 
kernel, would you please try a vanilla 2.6.5 kernel and patch with 
below patch to see if it boots your system? Please provide dmesg if 
succeed.

Thanks,
-yi
Comment 17 Chuyee 2004-04-20 03:15:11 EDT
Created attachment 99551 [details]
survive after detect a garbled MADT from BIOS

The patch will try to survive after detecting a garbled MADT.

If the garbled entry is a lapic entry
  both acpi_lapic and acpi_ioapic are set to zero (disabled)
else if the grabled entry is a ioapic entry
  only acpi_ioapic is set to zero (disabled)
Comment 18 Chuyee 2004-04-21 05:00:09 EDT
Created attachment 99588 [details]
Intensive check for garbled MADT

Hi,

For anyone who can reproduce the bug, would you please help to test this patch?
This patch will do intensive check for a grabled MADT and is supposed to
survive the system after that.

This patch is against 2.6.5 kernel. Let me know if you need a 2.4 kernel patch.


Really need your help and thanks in advance!

-yi
Comment 19 David Lawrence 2004-09-29 15:38:15 EDT
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.