From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050825 Firefox/1.0.6 (Ubuntu package 1.0.6) Description of problem: Hi there, I tried to install FC4 on a two-chassis IBM x445 system, and the installer fails when hardware detection starts and the system tries to load modules for any PCI devices in the second chassis. It appears that interrupts from any device in the second chassis do not get routed to the boot CPU (in the first chassis) unless APIC support is enabled in the kernel. Apparently, this same issue affects RHEL4 and a patch has been written that forces APICs off by default. This patch makes it so that CONFIG_UP_APIC can be turned on in the UP kernel build and by default the APICs are forced off. Furthermore, the patch makes it so that if a user passes "lapic apic" on the kernel command line, the APICs will be enabled. This makes it so that we can have APIC support for the hardware that needs it; by leaving them off by default, we sidestep the problem of broken APICs on other UP systems. I took the UP kernel (2.6.11-1.1369), applied that RHEL4 patch to it, and built a custom installer image with the patched kernel. With this kernel, the system boots correctly and I could install the system ok. The attached patch is against 2.6.11-1.1369. Would it be possible to have this patch included in FC5? I'm told that this APIC problem also affects multi-node x440 and x460 systems as well. Note that the FC4 SMP kernels aren't affected by this because APIC support is enabled and mainline isn't either because one can enable or disable APIC support at will. Version-Release number of selected component (if applicable): 2.6.11-1.1369 How reproducible: Always Steps to Reproduce: 1. Stick FC4 CD into x445. 2. Boot installer per instructions. 3. Wait for drivers to start loading... Actual Results: Drivers erupt in a blizzard of complaints about device timeouts, interrupts that should have happened, etc. But only if the devices are on the second chassis or in a RXE100 rack. Expected Results: Drivers find devices, register them, and the install continues. Additional info: Patch fixing this problem will be attached shortly.
Created attachment 118357 [details] Patch to 2.6.11-1.1369 UP kernel that keeps APICs off by default. Apply this patch and then turn on CONFIG_X86_UP_APIC and CONFIG_X86_UP_IOAPIC. APIC code should remain dormant unless 'lapic apic' are specified on command line.
I've applied it to rawhide CVS. Can you try and get this upstream please? It'd also be good to have dmi entries force the relevant boot flags on if necessary on affected systems.
Created attachment 120333 [details] Patch to 2.6.13-1.1624 UP kernel that keeps APICs off by default. Respin of the patch, this time without the scary BIOS bug message if CONFIG_X86_UP_APIC_DEFAULT_OFF=y.
I'll merge this patch into tomorrows build, but there's also an additional case I think.. Look at bug number 171661, and you'll see another case where we're panicing in the lapic init code, when we don't pass in 'apic', (it works just fine with it).
Created attachment 120440 [details] Patch to 2.6.13-1.1626 UP kernel that keeps APICs off by default. Here's a second respin. Now, we always jump out of APIC_init_uniprocessor if enable_local_apic == -2, regardless of the boot cpu feature flags. This _should_ keep the local APIC off _except_ when expressly asked for via 'lapic'. Before, we were toggling the boot cpu feature flags, which wasn't reliably keeping the lapic off.
Created attachment 121381 [details] disable i386 uniproc APICs by default Ok, here's the latest apic-off-by-default patch, which should apply against 2.6.14-1.1707_FC5. This patch (v6) adds two things over v4: 1. The v6 patch makes it so that the ACPI MADT is not parsed except when 'lapic apic' are passed. Disabling APIC_init_uniprocessor is insufficient, because the \_PIC method in ACPI needs to be notified about which method (PIC, APIC, etc) that we're using. The acpi_process_madt function has a side effect of settnig "acpi_irq_model = ACPI_IRQ_MODEL_IOAPIC"; this acpi_irq_model variable is eventually passed to \_PIC, which means that the BIOS thinks we're using APICs when we're not. This is probably why Mr. Tweedie's machine gets confused. And yes, Dave, you were correct to suggest poking through the ACPI code to make sure that there weren't any side effects. :) 2. It _also_ turns out that the get_smp_config function plays a role in locating the local and IO APICs; if ACPI doesn't supply an MADT (see #1 above), then this method will poke through the MP table as a backup and try to set things up--precisely what we don't want. Since we're assuming a uniprocessor, APIC-less machine in this mode, we don't need MP configuration and can skip that step. I've tested this on a x226, a single-chassis x445 and a two-chassis x445 without problems, and I'm hoping that it resolves at least a few problems. Unfortunately, I've not been having any problems on our hardware, so debugging is a bit ... difficult. This patch is intended as a drop-in replacement of the one that's in the rawhide kernel right now.
Created attachment 121388 [details] disable i386 uniproc APIC by default Rev. 8 of the patch, wherein enable_local_apic is now behind a #ifdef CONFIG_X86_LOCAL_APIC guard, which fixes the x86_64 SMP build.
Created attachment 121415 [details] the same, but with actual x86-64 build fixes v9 = v8 + actually fix x86-64 build.
v9 now merged in current kernels available in rawhide today.