Bug 215154
Summary: | local apic not enabled by default on systems that need it. | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Trevor Cordes <trevor> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5 | CC: | bugzilla, deknuydt, djh, drhart, jarod, kevin.b.crocker, lance.list.7, mij, mrb, pierre.juhen, tlinden, tomek, whiteg, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 2.6.18-1.2868.fc6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-12-25 04:29:56 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Trevor Cordes
2006-11-11 16:59:51 UTC
Created attachment 140966 [details]
screenshot of hang
hmm, it looks like it only found one CPU too. It'd be great if you could capture the earlier boot messages. If you don't have access to a serial console, you can boot with boot_delay=1000 which makes a pause happen after each line of text, which should make it easier to grab multiple screenshots of the whole boot up. Different problem : Kernel panic. My box has a single CPU, but uses raid + lvm root is /dev/raid1/boot Looks like there is no device scan, and the ram image is not able to find a valid root device There is message from mount saying it is unable to find /dev/root. I did a fresh mkinitrd, but no cure... I have this problem on FC6. Installing FC6 from the CD-ROMs works fine (kernel 2.6.18-1.2798), but then updating the kernel (and nothing else) to version 1.2849 I see the exact same behaviour: Last line printed is "NET: Registered protocol family 2". When booting the original kernel, the next printed line is "IP route cache hash table". My machine is a Via EPIA EN12000 motherboard with a Via Eden processor and a Via VT8237R controller. Have the same description as comment #4, kernel /boot/vmlinuz-2.6.18-1.2849.fc6 ro root=LABEL=/ pci=biosirq acpi=force also tried with pci=routeirq and without acpi=force. All stop at the same line and need the power switch to reboot. works on 2.6.18-1.2798.fc6. system is an upgrade from fc4, over fc3, over fc2, over fc1 over RH9 Clevo D410 laptop Pierre, could you start a different bug for that? Could someone post the dmesg from a successful boot? Similar problem to bug owner, but last line on frozen screen is "ACPI:Unable to locate RSDP". Successful boot with 2.6.18-1.2200.fc5smp shows next line after the ACPI line is "MP-BIOS BUG: 8254 Timer not connected to IO-APIC". Looks like kernel 2200 finds a way around this bug, but 2239 hangs with no progress, no errors I see on screen or know how to find. Single processor 2239 kernel boots without any evident problems on this same box. Hardware: dual P2-300 Compaq 1600. Dual PPro 200 box, has been running Fedora SMP kernels since FC2, currently using 2.6.18-1.2200.fc5smp. I was pressed for time when I installed the 2239 smp kernel, but only 1 CPU was found and the display was similar to the screenshot here. An odd message to the effect that CPU0 was not registered in BIOS scrolled off the screen. It's possible that this bug and also bug 215249 are caused by race conditions the init call process. Andrew Morton fixed this upstream. http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=735a7ffb739b6efeaeb1e720306ba308eaaeb20e I have this same board (Asus P2B-DS) in a box in my cube, and it experiences the same problem. I'll roll a test kernel with the patch in comment #9 and see if that resolves the problem... I think that patch is a red herring. It fixes a problem with multithreaded probing, which was added in 2.6.19rc. FC kernels don't have that yet. Indeed, no CONFIG_PCI_MULTITHREAD_PROBE in the FC5 or 6 kernels, so I won't bother with it... I'll poke at the changes between 2.6.18-1.2798.fc6 and 2849... Looks to me like the new x86 apic auto-detection heuristics code is the culprit here. This is a new addition to arch/i386/Kconfig: ----8<---- config X86_APIC_AUTO bool "Use heuristics to enable/disable local APIC" depends on X86_LOCAL_APIC help This option uses some proven heuristics to automatically enable or disable the local APIC. All decisions can be overriden by command line options. In a nutshell very old systems run better with APIC off and newer or multiprocessor systems prefer APIC on This is a useful default for distribution kernels. ----8<---- And we have this in the .config file: CONFIG_X86_APIC_AUTO=y And this added to Documentation/kernel-parameters.txt: apic [APIC,i386] Override default heuristics to enable/disable the local APIC by CONFIG_X86_APIC_AUTO. When this option is set the kernel will try to use the local APIC. Plus a chunk of new code in arch/i386/kernel/apic.c that is #ifdef'd by CONFIG_X86_APIC_AUTO. It looks like apic should be enabled if dmi reports multiple cpus, but apparently, that isn't happening on this board/bios (dmidecode only shows info about one of two procs on my P2B-DS). Failing that, if its not an Intel BIOS or a BIOS newer than 2001 inclusive, no apic. (dmidecode says Award is the vendor, date is in 2000). Not sure what the best way around this is without turning off CONFIG_X86_APIC_AUTO. Its easy enough to work around it on a per-board basis (add code along the lines of: board = dmi_get_system_info(DMI_BOARD_NAME);, then compare w/a whitelist), but I'm not sure how many boards there are that this impacts, so that could be a real beast to maintain... I don't think it'll be that bad. I mean, we're talking about a BIOS bug here, given that it isn't enumerating the other CPU correctly. Hopefully that won't bite too many different cases. Until we figure out an automatic workaround for the next update, booting with just "apic" should allow this kernel to boot on the affected hardware. Can you test that? Trevor, can you attach your output of dmidecode ? It's possible that the P2B-S will have sufficiently different strings to Jarods P2B-DS that we may need two separate entries (or a wildcard). Just noticed on re-reading that Trevor had a slightly different board, plus we have a dual Pentium Pro system in comment #8, a dual P2 Compaq system in comment #7, the Via Eden system in comment #4 Clevo D410 laptop in comment #5... So already, we've got 6 different systems that may need work-arounds, thus my thinking this could be less-than-fun to maintain... (Assuming they're all failing to boot because they all need apic enabled... :) Can all folks verify that adding 'apic' to their kernel command line lets their system boot, and if so, also attach their dmidecode output? I'll attach the P2B-DS output in a sec... Created attachment 141809 [details]
dmidecode output for Asus P2B-DS motherboard
*** Bug 215249 has been marked as a duplicate of this bug. *** Well, I am seeing a boot problem with FC6 and kernel 2.6.18-1.2849.fc6, things hang after the PCI setup starts, the last initcall_debug output is acpi_init<hex string> then just hangs. Machine is a Fujitsu-Siemens Amilo L7300 laptop, and works quite happily with the 2798 kernel. I have taken a dmidecode output, but adding apic into the kernel parameters does not get the boot process any further, lapic doesn't work, and so far no other parameters make any difference. Please advise on what I can provide to help. This seems different from other peoples' experience, so far adding apic to the parameters works for everyone else I've read about. brian, yes, sounds like an unrelated problem. can you open up a separate bug for that one ? (Also first try the test kernel at http://people.redhat.com/davej/kernels/Fedora/ ) Compaq dual P2-300 boots successfully with APIC added to 2.6.18-1.2239.fc5smp kernel command line. dmidecode does not find a SMBIOS or DMI entry point, and the default file location is there but contains zero bytes. Any hints on where else to point for dmidecode info or the proper usage? Ugh, the lack of DMI tables is pretty poor show. Is there a BIOS update for that compaq perhaps ? (Might be tricky to find these days I guess). Without DMI, we'll have to think up some alternative heuristic to determine that it's SMP. Can you attach your dmesg from a working boot ? Created attachment 141869 [details]
Console output from successful boot w/workaround
I'm happy to report that the little hack I put together (inlined at end) does
"do the right thing" on my P2B-DS system... However, there's another hack for
this same board in arch/i386/kernel/acpi/boot.c, complete with a table of
systems to force-enable acpi on (which includes the P2B-DS) and a comment about
acpi=ht boxes and continuing long enough to enumerate LAPICs... Something we
can possibly tie into here? Looks like the function acpi_boot_init() should be
setting acpi_lapic too. Should the auto-apic code be run *after* the acpi
checks instead?
Functional patch:
----8<----
diff -Naur 2849/arch/i386/kernel/apic.c 2849.1/arch/i386/kernel/apic.c
--- 2849/arch/i386/kernel/apic.c 2006-11-21 11:43:29.000000000 -0500
+++ 2849.1/arch/i386/kernel/apic.c 2006-11-21 14:35:14.000000000 -0500
@@ -1343,6 +1343,7 @@
{
int year;
int apic;
+ int board;
char *vendor;
/* If the machine has more than one CPU try to use APIC because it'll
@@ -1354,12 +1355,18 @@
year = dmi_get_year(DMI_BIOS_DATE);
vendor = dmi_get_system_info(DMI_BIOS_VENDOR);
+ board = dmi_get_system_info(DMI_BOARD_NAME);
apic = 0;
/* All Intel BIOS since 1998 assumed APIC on. Don't include 1998 itself
because we're not sure for that. */
if (vendor && !strncmp(vendor, "Intel", 5))
apic = 1;
+ /* Some boards have a buggy BIOS, need apic forced on */
+ else if (board && !strncmp(board, "P2B-", 4)) {
+ printk (KERN_INFO "Motherboard \"%s\" has a buggy BIOS,
force-enabling APIC...\n", board);
+ apic = 1;
+ }
/* Use APIC for anything since 2001 */
else if (year >= 2001)
apic = 1;
Created attachment 141879 [details]
compaq 1600 dmesg boot smp
Created attachment 141885 [details]
dmidecode output from Asus P2B-DS motherboard with dual 500 MHz PIII:s
I have IntelliStation M Pro with bios dated 20th of Feb 2004. apic kernel option allows kernel 2.6.18-1.2849.fc6 to boot. Without acpi=force kernel sees only one CPU (I have Intel(R) Pentium(R) 4 CPU 3.00GHz with HT), acpi=ht does not help. dmidecode shows: # dmidecode 2.7 # No SMBIOS nor DMI entry point found, sorry. On the same PC kernel 2.6.18-1.2849.fc6xen and xen start without any additional options and both CPUs are started. Sorry for the long delay: the box is not mine and it's tough to get onsite to do testing. I'm trying to get access soon and will submit the necessary info. Note, now that you guys mention it, this board may be the P2B-DS -- didn't the D simply mean "dual"? This is for sure a dual-proc board. If that's the case, the results will probably be the same. (In reply to comment #26) > I have IntelliStation M Pro with bios dated 20th of Feb 2004. > apic kernel option allows kernel 2.6.18-1.2849.fc6 to boot. > Without acpi=force kernel sees only one CPU (I have Intel(R) > Pentium(R) 4 CPU 3.00GHz with HT), acpi=ht does not help. > dmidecode shows: > # dmidecode 2.7 > # No SMBIOS nor DMI entry point found, sorry. Ick. Another busticated BIOS... Any chance there is a BIOS update available for that system? A well-behaving BIOS should be giving us DMI info, which the auto-apic code relies upon... (In reply to comment #28) > Ick. Another busticated BIOS... Any chance there is a BIOS update available for > that system? A well-behaving BIOS should be giving us DMI info, which the > auto-apic code relies upon... I updated BIOS and all my problems are gone. I guess you don't need dmidecode anymore... Kernel-2.6.18-1.2849.fc6 locks up hard just after: NET: Registered protocol family 2 This is on a dual Pentium III 450MHz machine (Asus P2B-D) with BIOS 1012B (which could be upgraded to 1013, but 1012B has been stable for years). Kernels up to and including kernel-2.6.18-1.2798.fc6 run fine on the same machine. CAVEAT: Kernel should not trust nor be totally dependent on DMI data. In practice, vendor BIOSes generate incorrect DMI/SMBIOS tables most of the time, and moreover, LinuxBIOS has NO DMI/SMBIOS information -- so it's a bad idea for auto-apic to rely on DMI info. Don't trust, VERIFY. Please don't break what used to work in 2.6.18-1.2798.fc6 on account of false hope that BIOS will provide correct DMI tables. See http://lkml.org/lkml/2006/7/5/168 for more information... I have reopened bug #216553 to track the problem I have where my laptop hangs and apic does not get it booting. With luck someone will find a way of helping me out. Created attachment 142131 [details]
output from dmidecode on 2.6.18-1.2798.fc6
Fully virtualized xen guests are also currently broken by the auto-apic code (see bug 217700). That may well be xen's fault though, since it apparently does try to present valid dmi tables. :) So we now have potentially 3 special cases to work around: 1) broken bioses (a la Asus P2B-DS), 2) system running LinuxBIOS and 3) fully-virt xen guests... for the kernel currently in updates-testing, I've disabled the auto-apic patch. It clearly needs some more thought before it's ready for prime time. Thanks, Dave. One more comment: The "dmidecode" man page says: "More often than not, information contained in the DMI tables is inaccurate, incomplete or simply wrong." Therefore, Linux kernel should not rely on DMI tables. fc6 with 2.6.18-1.2868.fc6 boot fine *** Bug 218130 has been marked as a duplicate of this bug. *** Hm, so does this mean that I have to have a boot line that includes acpi=forced or acpi=ht To get this machine to boot up till now I've had to have acpi=off or is it noacpi<br> I can't remember - the machine won't boot so I need to know how to get it to boot, please This is a Gateway M675 P4 3Ghz HT laptop Sorry - I didn't add myself to the CC list The latest kernel from updates has dropped the patch that was causing the particular problem in this bug. This kernel ought to boot just the same as the original FC6 kernel. |