Bug 215154

Summary: local apic not enabled by default on systems that need it.
Product: [Fedora] Fedora Reporter: Trevor Cordes <trevor>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: bugzilla, deknuydt, djh, drhart, jarod, kevin.b.crocker, lance.list.7, mij, mrb, pierre.juhen, tlinden, tomek, whiteg, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.18-1.2868.fc6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-12-25 04:29:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screenshot of hang
none
dmidecode output for Asus P2B-DS motherboard
none
Console output from successful boot w/workaround
none
compaq 1600 dmesg boot smp
none
dmidecode output from Asus P2B-DS motherboard with dual 500 MHz PIII:s
none
output from dmidecode on 2.6.18-1.2798.fc6 none

Description Trevor Cordes 2006-11-11 16:59:51 UTC
Description of problem:

Boot hangs for 2.6.18-1.2239.fc5smp.  2.6.18-1.2200.fc5smp works fine.


Version-Release number of selected component (if applicable):
2.6.18-1.2239.fc5smp

How reproducible:
always

Steps to Reproduce:
1. upgrade to 2.6.18-1.2239.fc5smp
2. reboot
3.
  
Actual results:
hangs, no panic info

Expected results:
boots fine

Additional info:
This only happens on this one specific box so far.  This box has been running FC
for years without problem.  It was running 2200 100% ok for weeks.  If I reboot
back into 2200 it's all ok.

Mobo is Asus P2B-S, dual proc P2-400

The crash doesn't give much to go on.  No panic output.  See attached JPG.
Last message on screen is "NET: Registered protocol family 2"

Comment 1 Trevor Cordes 2006-11-11 16:59:51 UTC
Created attachment 140966 [details]
screenshot of hang

Comment 2 Dave Jones 2006-11-12 07:24:22 UTC
hmm, it looks like it only found one CPU too.  It'd be great if you could
capture the earlier boot messages.  If you don't have access to a serial
console, you can boot with boot_delay=1000 which makes a pause happen after each
line of text, which should make it easier to grab multiple screenshots of the
whole boot up.


Comment 3 Pierre Juhen 2006-11-14 22:20:58 UTC
Different problem :

Kernel panic.

My box has a single CPU, but uses raid + lvm

root is /dev/raid1/boot

Looks like there is no device scan, and the ram image is not able to
find a valid root device

There is message from mount saying it is unable to find /dev/root.

I did a fresh mkinitrd, but no cure...

Comment 4 Michael Jørgensen 2006-11-15 18:42:48 UTC
I have this problem on FC6.

Installing FC6 from the CD-ROMs works fine (kernel 2.6.18-1.2798), but then 
updating the kernel (and nothing else) to version 1.2849 I see the exact same 
behaviour: Last line printed is "NET: Registered protocol family 2". When 
booting the original kernel, the next printed line is "IP route cache hash 
table".

My machine is a Via EPIA EN12000 motherboard with a Via Eden processor and a 
Via VT8237R controller.

Comment 5 Steve 2006-11-16 00:56:48 UTC
Have the same description as comment #4, 
kernel /boot/vmlinuz-2.6.18-1.2849.fc6 ro root=LABEL=/  pci=biosirq acpi=force
also tried with pci=routeirq and without acpi=force.  All stop at the same line
and need the power switch to reboot.  works on 2.6.18-1.2798.fc6.

system is an upgrade from fc4, over fc3, over fc2, over fc1 over RH9
Clevo D410 laptop

Comment 6 Dan Carpenter 2006-11-17 04:32:14 UTC
Pierre, could you start a different bug for that?

Could someone post the dmesg from a successful boot?



Comment 7 Dennis Hart 2006-11-18 05:25:49 UTC
Similar problem to bug owner, but last line on frozen screen is "ACPI:Unable 
to locate RSDP".  Successful boot with 2.6.18-1.2200.fc5smp shows next line 
after the ACPI line is "MP-BIOS BUG: 8254 Timer not connected to IO-APIC".  
Looks like kernel 2200 finds a way around this bug, but 2239 hangs with no 
progress, no errors I see on screen or know how to find.  Single processor 
2239 kernel boots without any evident problems on this same box.
Hardware: dual P2-300 Compaq 1600.

Comment 8 George N. White III 2006-11-18 13:08:42 UTC
Dual PPro 200 box, has been running Fedora SMP kernels since FC2, currently 
using 2.6.18-1.2200.fc5smp.  I was pressed for time when I installed the 2239 
smp kernel, but only 1 CPU was found and the display was similar to the 
screenshot here.  

An odd message to the effect that CPU0 was not registered in BIOS scrolled off 
the screen.



Comment 9 Dan Carpenter 2006-11-20 06:24:02 UTC
It's possible that this bug and also bug 215249 are caused by race conditions
the init call process.  Andrew Morton fixed this upstream.

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=735a7ffb739b6efeaeb1e720306ba308eaaeb20e



Comment 10 Jarod Wilson 2006-11-20 21:56:19 UTC
I have this same board (Asus P2B-DS) in a box in my cube, and it experiences the
same problem. I'll roll a test kernel with the patch in comment #9 and see if
that resolves the problem...

Comment 11 Dave Jones 2006-11-20 22:38:11 UTC
I think that patch is a red herring.  It fixes a problem with multithreaded
probing, which was added in 2.6.19rc.   FC kernels don't have that yet.


Comment 12 Jarod Wilson 2006-11-21 16:46:28 UTC
Indeed, no CONFIG_PCI_MULTITHREAD_PROBE in the FC5 or 6 kernels, so I won't
bother with it... I'll poke at the changes between 2.6.18-1.2798.fc6 and 2849...

Comment 13 Jarod Wilson 2006-11-21 17:30:04 UTC
Looks to me like the new x86 apic auto-detection heuristics code is the culprit
here.

This is a new addition to arch/i386/Kconfig:
----8<----
config X86_APIC_AUTO
        bool "Use heuristics to enable/disable local APIC"
       depends on X86_LOCAL_APIC
       help
         This option uses some proven heuristics to automatically enable or
disable the local
         APIC. All decisions can be overriden by command line options.
         In a nutshell very old systems run better with APIC off and newer or
multiprocessor
         systems prefer APIC on
         This is a useful default for distribution kernels.
----8<----

And we have this in the .config file:
CONFIG_X86_APIC_AUTO=y

And this added to Documentation/kernel-parameters.txt:
       apic            [APIC,i386] Override default heuristics to enable/disable
the local
                       APIC by CONFIG_X86_APIC_AUTO. When this option is set the
kernel
                       will try to use the local APIC.

Plus a chunk of new code in arch/i386/kernel/apic.c that is #ifdef'd by
CONFIG_X86_APIC_AUTO.

It looks like apic should be enabled if dmi reports multiple cpus, but
apparently, that isn't happening on this board/bios (dmidecode only shows info
about one of two procs on my P2B-DS). Failing that, if its not an Intel BIOS or
a BIOS newer than 2001 inclusive, no apic. (dmidecode says Award is the vendor,
date is in 2000).

Comment 14 Jarod Wilson 2006-11-21 18:14:42 UTC
Not sure what the best way around this is without turning off
CONFIG_X86_APIC_AUTO. Its easy enough to work around it on a per-board basis
(add code along the lines of: board = dmi_get_system_info(DMI_BOARD_NAME);, then
compare w/a whitelist), but I'm not sure how many boards there are that this
impacts, so that could be a real beast to maintain...

Comment 15 Dave Jones 2006-11-21 18:45:52 UTC
I don't think it'll be that bad.  I mean, we're talking about a BIOS bug here,
given that it isn't enumerating the other CPU correctly.  Hopefully that won't
bite too many different cases.

Until we figure out an automatic workaround for the next update, booting with
just "apic" should allow this kernel to boot on the affected hardware. Can you
test that?

Trevor, can you attach your output of dmidecode ?
It's possible that the P2B-S will have sufficiently different strings to Jarods
P2B-DS that we may need two separate entries (or a wildcard).


Comment 16 Jarod Wilson 2006-11-21 19:08:26 UTC
Just noticed on re-reading that Trevor had a slightly different board, plus we
have a dual Pentium Pro system in comment #8, a dual P2 Compaq system in comment
#7, the Via Eden system in comment #4 Clevo D410 laptop in comment #5... So
already, we've got 6 different systems that may need work-arounds, thus my
thinking this could be less-than-fun to maintain... (Assuming they're all
failing to boot because they all need apic enabled... :)

Can all folks verify that adding 'apic' to their kernel command line lets their
system boot, and if so, also attach their dmidecode output?

I'll attach the P2B-DS output in a sec...

Comment 17 Jarod Wilson 2006-11-21 19:09:42 UTC
Created attachment 141809 [details]
dmidecode output for Asus P2B-DS motherboard

Comment 18 Jarod Wilson 2006-11-21 19:14:49 UTC
*** Bug 215249 has been marked as a duplicate of this bug. ***

Comment 19 Brian Morrison 2006-11-21 22:33:42 UTC
Well, I am seeing a boot problem with FC6 and kernel 2.6.18-1.2849.fc6, things
hang after the PCI setup starts, the last initcall_debug output is acpi_init<hex
string> then just hangs.

Machine is a Fujitsu-Siemens Amilo L7300 laptop, and works quite happily with
the 2798 kernel.

I have taken a dmidecode output, but adding apic into the kernel parameters does
not get the boot process any further, lapic doesn't work, and so far no other
parameters make any difference.

Please advise on what I can provide to help. This seems different from other
peoples' experience, so far adding apic to the parameters works for everyone
else I've read about.



Comment 20 Dave Jones 2006-11-21 23:04:26 UTC
brian, yes, sounds like an unrelated problem. can you open up a separate bug for
that one ? (Also first try the test kernel at
http://people.redhat.com/davej/kernels/Fedora/ )

Comment 21 Dennis Hart 2006-11-22 00:16:22 UTC
Compaq dual P2-300 boots successfully with APIC added to 2.6.18-1.2239.fc5smp 
kernel command line.  dmidecode does not find a SMBIOS or DMI entry point, and 
the default file location is there but contains zero bytes.  Any hints on 
where else to point for dmidecode info or the proper usage?

Comment 22 Dave Jones 2006-11-22 00:31:46 UTC
Ugh, the lack of DMI tables is pretty poor show. Is there a BIOS update for that
compaq perhaps ? (Might be tricky to find these days I guess).

Without DMI, we'll have to think up some alternative heuristic to determine that
it's SMP.  Can you attach your dmesg from a working boot ?


Comment 23 Jarod Wilson 2006-11-22 04:45:36 UTC
Created attachment 141869 [details]
Console output from successful boot w/workaround

I'm happy to report that the little hack I put together (inlined at end) does
"do the right thing" on my P2B-DS system... However, there's another hack for
this same board in arch/i386/kernel/acpi/boot.c, complete with a table of
systems to force-enable acpi on (which includes the P2B-DS) and a comment about
acpi=ht boxes and continuing long enough to enumerate LAPICs... Something we
can possibly tie into here? Looks like the function acpi_boot_init() should be
setting acpi_lapic too. Should the auto-apic code be run *after* the acpi
checks instead?

Functional patch:
----8<----
diff -Naur 2849/arch/i386/kernel/apic.c 2849.1/arch/i386/kernel/apic.c
--- 2849/arch/i386/kernel/apic.c	2006-11-21 11:43:29.000000000 -0500
+++ 2849.1/arch/i386/kernel/apic.c	2006-11-21 14:35:14.000000000 -0500
@@ -1343,6 +1343,7 @@
 {
	int year;
	int apic;
+	int board;
	char *vendor;
 
	/* If the machine has more than one CPU try to use APIC because it'll
@@ -1354,12 +1355,18 @@
 
	year = dmi_get_year(DMI_BIOS_DATE);
	vendor = dmi_get_system_info(DMI_BIOS_VENDOR);
+	board = dmi_get_system_info(DMI_BOARD_NAME);
	apic = 0;
 
	/* All Intel BIOS since 1998 assumed APIC on. Don't include 1998 itself

	   because we're not sure for that. */
	if (vendor && !strncmp(vendor, "Intel", 5))
		apic = 1;
+	/* Some boards have a buggy BIOS, need apic forced on */
+	else if (board && !strncmp(board, "P2B-", 4)) {
+		printk (KERN_INFO "Motherboard \"%s\" has a buggy BIOS,
force-enabling APIC...\n", board);
+		apic = 1;
+	}
	/* Use APIC for anything since 2001 */
	else if (year >= 2001)
		apic = 1;

Comment 24 Dennis Hart 2006-11-22 07:09:37 UTC
Created attachment 141879 [details]
compaq 1600 dmesg boot smp

Comment 25 Tomas Linden 2006-11-22 08:09:24 UTC
Created attachment 141885 [details]
dmidecode output from Asus P2B-DS motherboard with dual 500 MHz PIII:s

Comment 26 Tomasz Kepczynski 2006-11-22 08:23:47 UTC
I have IntelliStation M Pro with bios dated 20th of Feb 2004.
apic kernel option allows kernel 2.6.18-1.2849.fc6 to boot.
Without acpi=force kernel sees only one CPU (I have Intel(R)
Pentium(R) 4 CPU 3.00GHz with HT), acpi=ht does not help.
dmidecode shows:
# dmidecode 2.7
# No SMBIOS nor DMI entry point found, sorry.
On the same PC kernel 2.6.18-1.2849.fc6xen and xen start
without any additional options and both CPUs are started.

Comment 27 Trevor Cordes 2006-11-22 10:26:35 UTC
Sorry for the long delay: the box is not mine and it's tough to get onsite to do
testing.  I'm trying to get access soon and will submit the necessary info.

Note, now that you guys mention it, this board may be the P2B-DS -- didn't the D
simply mean "dual"?  This is for sure a dual-proc board.  If that's the case,
the results will probably be the same.


Comment 28 Jarod Wilson 2006-11-22 18:42:00 UTC
(In reply to comment #26)
> I have IntelliStation M Pro with bios dated 20th of Feb 2004.
> apic kernel option allows kernel 2.6.18-1.2849.fc6 to boot.
> Without acpi=force kernel sees only one CPU (I have Intel(R)
> Pentium(R) 4 CPU 3.00GHz with HT), acpi=ht does not help.
> dmidecode shows:
> # dmidecode 2.7
> # No SMBIOS nor DMI entry point found, sorry.

Ick. Another busticated BIOS... Any chance there is a BIOS update available for
that system? A well-behaving BIOS should be giving us DMI info, which the
auto-apic code relies upon...

Comment 29 Tomasz Kepczynski 2006-11-23 13:48:29 UTC
(In reply to comment #28)
> Ick. Another busticated BIOS... Any chance there is a BIOS update available for
> that system? A well-behaving BIOS should be giving us DMI info, which the
> auto-apic code relies upon...
I updated BIOS and all my problems are gone. I guess you don't need
dmidecode anymore...

Comment 30 josip 2006-11-23 16:41:15 UTC
Kernel-2.6.18-1.2849.fc6 locks up hard just after:

NET: Registered protocol family 2

This is on a dual Pentium III 450MHz machine (Asus P2B-D) with BIOS 1012B (which
could be upgraded to 1013, but 1012B has been stable for years).

Kernels up to and including kernel-2.6.18-1.2798.fc6 run fine on the same machine.

CAVEAT: Kernel should not trust nor be totally dependent on DMI data.  In
practice, vendor BIOSes generate incorrect DMI/SMBIOS tables most of the time,
and moreover, LinuxBIOS has NO DMI/SMBIOS information -- so it's a bad idea for
auto-apic to rely on DMI info.  Don't trust, VERIFY.

Please don't break what used to work in 2.6.18-1.2798.fc6 on account of false
hope that BIOS will provide correct DMI tables.  See
http://lkml.org/lkml/2006/7/5/168 for more information...



Comment 31 Brian Morrison 2006-11-24 23:01:51 UTC
I have reopened bug #216553 to track the problem I have where my laptop hangs
and apic does not get it booting.

With luck someone will find a way of helping me out.


Comment 32 Steve 2006-11-26 11:44:05 UTC
Created attachment 142131 [details]
output from dmidecode on 2.6.18-1.2798.fc6

Comment 33 Jarod Wilson 2006-12-07 16:04:21 UTC
Fully virtualized xen guests are also currently broken by the auto-apic code
(see bug 217700). That may well be xen's fault though, since it apparently does
try to present valid dmi tables. :)

So we now have potentially 3 special cases to work around: 1) broken bioses (a
la Asus P2B-DS), 2) system running LinuxBIOS and 3) fully-virt xen guests...

Comment 34 Dave Jones 2006-12-07 16:31:02 UTC
for the kernel currently in updates-testing, I've disabled the auto-apic patch.
It clearly needs some more thought before it's ready for prime time.


Comment 35 josip 2006-12-09 03:10:31 UTC
Thanks, Dave.

One more comment: The "dmidecode" man page says:

"More often than not, information contained in the DMI tables is inaccurate,
incomplete or simply wrong."

Therefore, Linux kernel should not rely on DMI tables.

Comment 36 Steve 2006-12-22 11:11:29 UTC
fc6 with 2.6.18-1.2868.fc6 boot fine

Comment 37 Jarod Wilson 2006-12-22 19:02:10 UTC
*** Bug 218130 has been marked as a duplicate of this bug. ***

Comment 38 Kevin Crocker 2006-12-23 17:03:35 UTC
Hm, so does this mean that I have to have a boot line that includes acpi=forced
or acpi=ht  To get this machine to boot up till now I've had to have acpi=off or
is it noacpi<br> I can't remember - the machine won't boot so I need to know how
to get it to boot, please

This is a Gateway M675 P4 3Ghz HT laptop

Comment 39 Kevin Crocker 2006-12-23 17:06:00 UTC
Sorry - I didn't add myself to the CC list

Comment 40 Jarod Wilson 2006-12-25 04:29:56 UTC
The latest kernel from updates has dropped the patch that was causing the
particular problem in this bug. This kernel ought to boot just the same as the
original FC6 kernel.