Bug 175671 - System no longer boots without acpi=off
System no longer boots without acpi=off
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Dave Jones
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-12-13 15:23 EST by Orion Poplawski
Modified: 2015-01-04 17:23 EST (History)
3 users (show)

See Also:
Fixed In Version: 2.6.15-1.1830_FC4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-02-06 17:48:48 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmidecode output (17.01 KB, text/plain)
2006-01-05 17:13 EST, Orion Poplawski
no flags Details
dmesg output (19.24 KB, text/plain)
2006-01-05 17:14 EST, Orion Poplawski
no flags Details
dmesg output (19.96 KB, text/plain)
2006-01-18 15:33 EST, Orion Poplawski
no flags Details

  None (edit)
Description Orion Poplawski 2005-12-13 15:23:01 EST
Description of problem:
System had been running fine with 2.6.13-1.1532_FC4smp.  Rebooted into
2.6.14-1.1644_FC4smp but system quickly restarts after some ACPI messages.  Now
cannot boot into older kernels as well (back to kernel-smp-2.6.12-1.1447_FC4). 
Can boot 2.6.14-1.1644_FC4smp okay with acpi=off.

How reproducible:
very

# lspci
00:00.0 Host bridge: Intel Corporation 82875P/E7210 Memory Controller Hub (rev 02)
00:01.0 PCI bridge: Intel Corporation 82875P Processor to AGP Controller (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI
Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface
Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller
(rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R)
AC'97 Audio Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200]
(rev a1)
02:08.0 Ethernet controller: Intel Corporation 82562EZ 10/100 Ethernet
Controller (rev 02)
Comment 1 Orion Poplawski 2005-12-13 15:38:47 EST
System also boots okay with hyper-threading turned off in BIOS (and ACPI on).
Comment 2 Dave Jones 2005-12-23 23:00:14 EST
can you attach the output of dmesg -s 128000, and dmidecode (as root)

for bonus points, if you have the means to use a serial console, the exact
messages it prints before it crashes would be really useful.
Comment 3 Orion Poplawski 2006-01-05 17:10:57 EST
Looks like maybe the CPU or motherboard has gone flakey as the kernel says it
cannot talk to CPU#0.  Here is the serial console output:

Linux version 2.6.14-1.1653_FC4smp (bhcompile@hs20-bc1-7.build.redhat.com) (gcc
version 4.0.2 20051125 (Red Hat 4.0.2-8)) #1 SMP Tue Dec 13 21:46:01 EST 2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007ff74000 (usable)
 BIOS-e820: 000000007ff74000 - 000000007ff76000 (ACPI NVS)
 BIOS-e820: 000000007ff76000 - 000000007ff97000 (ACPI data)
 BIOS-e820: 000000007ff97000 - 0000000080000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fecf0000 - 00000000fecf1000 (reserved)
 BIOS-e820: 00000000fed20000 - 00000000fed90000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
 BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
1151MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000fe710
Using x86 segment limits to approximate NX protection
DMI 2.3 present.
Using APIC driver default
ACPI: PM-Timer IO Port: 0x808
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] disabled)
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 88000000 (gap: 80000000:7ec00000)
Built 1 zonelists
Kernel command line: ro root=/dev/rootvg/root console=ttyS0,115200
Initializing CPU#0
CPU 0 irqstacks, hard=c0447000 soft=c0427000
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 3192.500 MHz processor.
Using pmtmr for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 2071240k/2096592k available (2167k kernel code, 23964k reserved, 810k
data, 224k init, 1179088k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 6390.94 BogoMIPS (lpj=12781895)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 512
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
mtrr: v2.0 (20020519)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Pentium(R) 4 CPU 3.20GHz stepping 09
Booting processor 1/1 eip 2000
CPU 1 irqstacks, hard=c0448000 soft=c0428000
Not responding.
Inquiring remote APIC #1...
... APIC #1 ID: failed
... APIC #1 VERSION: failed
... APIC #1 SPIV: failed
CPU #1 not responding - cannot use it.
Total of 1 processors activated (6390.94 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=-1
Brought up 1 CPUs
checking if image is initramfs... it is
Freeing initrd memory: 1760k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfbb30, last bus=2
PCI: Using configuration type 1
ACPI: Subsystem revision 20050916

then it reboots.  I'll attach dmesg -s 128000 from HT BIOS off boot and
dmidecode next.
Comment 4 Orion Poplawski 2006-01-05 17:13:59 EST
Created attachment 122846 [details]
dmidecode output
Comment 5 Orion Poplawski 2006-01-05 17:14:34 EST
Created attachment 122848 [details]
dmesg output
Comment 6 Orion Poplawski 2006-01-05 17:33:14 EST
Well, system is not totally messed up.  I can boot into Windows with HT on and
run Intel's Hyper-Threading test utility and it checks out okay.  Device manager
shows two processors.
Comment 7 Dave Jones 2006-01-05 21:29:04 EST
can you try the 2.6.15 based test kernel at
http://people.redhat.com/davej/kernels/Fedora/FC4/ and see if that makes it work
again with acpi ?
Comment 8 Linux ACPI Developers 2006-01-18 00:05:47 EST
> ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
> Processor #0 15:2 APIC version 20
> ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
> Processor #1 15:2 APIC version 20
> ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] disabled)
> ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] disabled)

Hmmm, dunno if the duplicate lapic_id is an issue if
the phantom one is disabled.

Does the failing kernel work with "maxcpus=2"?

dmesg from the latest kernel that boots properly in SMP+ ACPI mode
with no cmdline params may help.
Comment 9 Orion Poplawski 2006-01-18 15:33:37 EST
Created attachment 123400 [details]
dmesg output

Latest test kernel does not boot.

maxcpus=2 has no effect.

Note that I can nolonger boot to older kernels, but here is the dmesg output
from an older kernel when it could boot.
Comment 10 Dave Jones 2006-02-03 01:42:50 EST
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.
Comment 11 Orion Poplawski 2006-02-06 17:48:48 EST
System boots fine with HT on with 2.6.15-1.1830_FC4

Note You need to log in before you can comment on or make changes to this bug.