Bug 125841
Summary: | (ACPI) kernel-2.6.6-1.427smp crashes when attempting to boot. Previous release of SMP version of kernel was fine. | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jerry <jvdelisle> |
Component: | kernel | Assignee: | Dave Jones <davej> |
Status: | CLOSED NEXTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 2 | CC: | acpi-bugzilla, amadeus84, anil.s.keshavamurthy, chref, drongo007, gasi, hongjiu.lu, james, joe.christy, jonathan.underwood, kb8, kpfleming, loka, michael, mishu, pfrields, raines, suckfish, yaneti |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-04-16 04:04:44 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Jerry
2004-06-12 08:17:51 UTC
Created attachment 101081 [details]
kernel boot output
I observed the same problem on my HT-enabled Pentium 4 2.8GHz machine with
Intel D875PBZ board. I captured the full output (see attachment). When
passing the parameter "acpi=off" to the kernel, it does not crash.
I have HT-enabled P4 2.8GHz Intel D865GLC board. When I set acpi=ht the kernel boots ok, but of course acpi is disabled farther down the boot path. So something is broke between this version and teh previous. When rebuilding kernel-2.6.6-1.427smp from the kernel-sourcecode RPM with identical configuration but disabled SCHED_SMT option, it doesn't crash either. All these crashes can also be reproduced with a 2.6.7-rc3 vanilla kernel (with the only difference that it doesn't produce the oops message but just hangs after "ACPI: Subsystem revision 20040326"). Seeing this also on my box. Also an Intel D875PBZ (latest BIOS) with Prescott 3.0. Same problem here. Intel D865GBF motherboard, P4 2.4 processor, hyperthreading. Haven't checked if the oops still happens with acpi disabled. No change with 435 smp kernel, still oopses at 'unable to handle kernel NULL pointer dereference at virtual address 00000206' at boot. I'll throw in a "me too" here. I'm running an Intel Desktop board 865GLC. Worked fine with the smp kernel from the FC2 install, but after up2date, it gets the NULL pointer deref error. This is with a 3ghz 800mhz bus, 1gb dual channel 400mhz ram system if that is any help. Did not try without acpi, but I will confirm the latest (435) kernel doesn't fix it. "me too" too: and the up Kernel ...435 runs fine! (Just like fc1: no more smp support for fedora?) there should be additional debug info after the NULL dereference message. We'll need that to make anything useful out of this bug. Here's the stack trace. I wrote it on paper, then here, so it might contain errors, but it's unlikely. Uncompressing Linux... Ok, booting the kernel. ACPI: S3 and PAE do not like each other for now, S3 disabled. Unable to handle kernel NULL pointer dereference at virtual address 00000008 printing eip: *pde = 00003001 Oops: 0000 [#1] SMP Modules linked in: CPU: 1 EIP: 0060:[<00000008>] Not tainted EFLAGS: 00010202 (2.6.6-1.427smp) EIP is at 0x8 eax: 00000000 ebx: 00000000 ecx: 00000000 edx: 00000001 esi: 000003e8 edi: 021a9279 ebp: 021b2fab esp: 20e9df9c ds: 007b es: 007b ss: 0068 Process swapper (pid: 1, threadinfo=20e9d000 task=20e3f6b0) Stack: 00000001 021bdabf 00000000 00000000 00000000 021b6d5d 00000000 021c7486 02384108 0236fd2f 000000f4 0240dd60 0236fe0c 0235d7d4 023d6210 0000001e 02100507 021003fe 00000000 00000000 00000000 021051f1 00000000 00000000 Call Trace: [<021bdabf>] <1>Unable to handle kernel NULL pointer dereference at virtual add That's all there is to it. Well, actually you get a little more (before the oops) if you remove the "quiet" from the kernel command parameters. I already posted the complete output a an attachment, please look here: https://bugzilla.redhat.com/bugzilla/attachment.cgi? id=101081&action=view I've got a D875PBZ, though still running 2.6.5-1.327smp on BIOS 0041.P09 I'll update my kernel and BIOS to see if I can reproduce this. My D875PBZ was actually running 2.6.5-1.358smp (FC2) already. Upgraded to BIOS 0093.P22 w/ no issues; next a kernel 'upgrade'... # yum install kernel-smp gave me 2.6.6-1.435smp which fails to boot in the same area, but the crash didn't make it out onto my serial console: ... ACPI: Subsystem revision 20040326 Unable to <and that's all it wrote> "acpi=ht" booted successfully. now to find the src for this kernel and enable debugging... maxcpus=1 is sufficient to make 2.6.6-1.435smp boot on my D875PBZ consistent with mention of !SCHED_SMT in comment #3 confirmed 2.6.7 (released) with SCHED_SMT does the same and that "acpi=ht" or "maxcpus=1" still work-around the failure. *** Bug 126195 has been marked as a duplicate of this bug. *** I have the same "kernel NULL pointer dereference" on my P4 box with an Intel D865PERLK mother board going to the 425 or 435 kernels. I just tried the acpi=ht thing and it has gotten past that error but is now stuck in kudzu doing a checking for new hardware. 358 runs fine (so far). Okay, disabling kudzu and rebooting with 435 and then running kudzu after login, I see it hangs when doing a modprobe on the firewire driver ohci1394 Deleted CONFIG_4KSTACKS from the 2.6.7 build and now the CONFIG_SCHED_SMT kernel boots w/o cmdline workaround. Hmm, letting it sit there a minute and this time a tack track popped out: ACPI: Subsystem revision 20040326 U<1>Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: 00000000 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: CPU: 1 EIP: 0060:[<00000000>] Not tainted EFLAGS: 00010006 (2.6.7) EIP is at 0x0 eax: f7faedb0 ebx: f7fae000 ecx: 00000000 edx: 00001000 esi: c057b9c0 edi: c0118ce0 ebp: 00000000 esp: f7faecc8 ds: 007b es: 007b ss: 0068 Process swapper (pid: 1, threadinfo=f7fae000 task=f7f9f670) Stack: c0115427 00001000 00001000 00000000 c0106e0a 00001000 0000007b f7faedb4 00000000 c0118ce0 00000000 fffff000 c171007b c171007b fffffffb c0118d07 00000060 00000206 f7faed1c 00000000 c1713000 00000000 c1713000 00000010 Call Trace: [<c0115427>] smp_call_function_interrupt+0x37/0x80 [<c0106e0a>] call_function_interrupt+0x1a/0x20 [<c0118ce0>] do_page_fault+0x0/0x526 [<c011c5d8>] scheduler_tick+0x108/0x4e0 [<c0127f1c>] update_process_times+0x2c/0x40 [<c0115bee>] smp_apic_timer_interrupt+0x13e/0x160 [<c0106e2a>] apic_timer_interrupt+0x1a/0x20 [<c0115310>] smp_call_function+0x150/0x1b0 [<c011946e>] remap_area_pages+0x1de/0x1f0 [<c0119552>] __ioremap+0xd2/0xf0 [<c01f9961>] acpi_os_map_memory+0x2a/0x3c [<c020cf93>] acpi_tb_find_rsdp+0x29/0xfa [<c020ceee>] acpi_find_root_pointer+0x16/0x3e [<c0118f47>] do_page_fault+0x267/0x526 [<c0206a19>] acpi_ns_delete_namespace_by_owner+0x26/0x99 [<c01f991c>] acpi_os_get_root_pointer+0xd/0x28 [<c020ccd5>] acpi_get_firmware_table+0x5c/0x25f [<c02051fe>] acpi_hw_low_level_read+0x5e/0xa3 [<c0204fdf>] acpi_hw_register_read+0xa6/0x150 [<c020e05c>] acpi_ut_release_mutex+0x62/0x67 [<c0204db8>] acpi_get_register+0x57/0x79 [<c01fda9e>] acpi_enable_event+0x3f/0x56 [<c020e05c>] acpi_ut_release_mutex+0x62/0x67 [<c01fe552>] acpi_install_fixed_event_handler+0x6e/0x74 [<c052aa58>] acpi_ec_ecdt_probe+0x14/0x188 [<c052a921>] acpi_bus_init+0xb2/0x121 [<c052a9f2>] acpi_init+0x62/0xb4 [<c05158b2>] do_initcalls+0x22/0xb0 [<c0100534>] init+0xf4/0x240 [<c0100440>] init+0x0/0x240 [<c0104285>] kernel_thread_helper+0x5/0x10 Created attachment 101264 [details]
reduce stack usage in ACPI table code
Bob Moore cooked up this patch to reduce stack consumption
in the ACPI table code (acpi_get_firmware_table())
Alas, on my system it doesn't make this failure go away.
I can confirm that disabling 4K stacks does not solve this problem on my system, which is a D865GBFL with latest BIOS and 2.6GHz P4-HT. The few times I have managed to get a call trace (that won't go to a serial console, it's too early), the trace has been _very_ long. If the D865GBF dies before serial console works, then it may have a different failure than the D875PBZ. Does deleting CONFIG_SCHED_SMT help the D865GBFL? Yes, disabling CONFIG_SCHED_SMT (and leaving 4K stacks enabled) solves the problem, I can boot with ACPI turned on. I have found where to disable CONFIG_SCHED_SMT in kernel .config but I could not see how to leave 4K stacks enabled. Is this 4K stacks enabled a default? I am trying these things to se what they do on D865GLC The 4K stacks item is in the "Kernel Hacking" menu, near the bottom of the main menu. On D865GLC I have 2.6.6-1.435 running with CONFIG_SCHED_SMT disabled. It is currently compiling 2.6.7 with CONFIG_SCHED_SMT disabled and 4K stacks enabled. No unusual behavior observed yet and I have been doing some other background i/o tasks just to exercise the system. I also have CONFIG_MPENTIUM4 enabled. SMP is working. Will report on 2.6.7 when its done baking. OK, on D865GLC, 2.6.7 is running apparently fine with CONFIG_SCHED_SMT disabled, 4K Stacks enabled, CONFIG_MPENTIUM4 enabled. SMP is functioning and ACPI is functioning. Is the problem really in SMT? Just as a point of information, I am running a dual processor system, and I do not have this problem with either 2.6.6-1.427smp or 2.6.6-1.435smp. Perhaps it is unique to HT support. > If the D865GBF dies before serial console works,
> then it may have a different failure than the
> D875PBZ. Does deleting CONFIG_SCHED_SMT help the
> D865GBFL?
Perhaps I should add here that the actual oops output in my
attachment did not come through the serial console, only the lines
preceding it. I appended the oops lines manually (should have
mentioned that earlier).
Created attachment 101570 [details]
A patch to move sched_init_smp after do_basic_setup
This patch seems to work for me. The problem may be that
arch_init_sched_domains is called too early.
I rebuilt kernel 2.6.7-1.459 with the patch above. It runs OK on my P4 HT machine. Using this patch my D875PBZ boots cleanly (patch applied to 2.6.6-1. 435.2.1). Patch also works applied to 2.6.7-bk (using a pull from about 12 hours ago). This patch works with the Intel board D865PERL with Hyperthreading P4! I've been getting "Unable to handle kernel NULL pointer dereference at virtual address 00000008" for all smp kernels after 2.6.5-1.358smp. I should have mentioned that was on 2.6.6-1.435.2.3smp Yep, I too can't boot .435 smp kernels with a D875PBZ and HT P4 for exactly the same reasons. The above patch looks mostly right though, _BUT_ it's worth checking it against this : http://lkml.org/lkml/2004/5/8/34 Here Andrew Morton moves do basic_setup() until after populate_rootfs(). It would be good to get this right :) I also applied the patch given in comment #32 and have kernel 2.6.7 working with SMP and SMT enabled with ACPI also working fine. This is on a D865GLC mb. kernel 2.6.7, ACPI=yes (acpi=on is default w/no parameters given to kernel) SMP=yes, SCHED_SMT=yes, HT=yes, 4KSTACKS is not set, Comment#32_patch APPLIED intel GB865GBF single 2.8Ghz_HT --> System boots correctly detecting 2 cpus with full acpi support --> Patch also works when 4KSTACKS=yes NOTE: passing acpi=ht is not full acpi - "run only enough ACPI to enable Hyper Threading" I agree w/Jonathan (msg#38) to double check exact location to sched_init_smp() although certainly after basic_setup(!) pls CC me further developments, thx Created attachment 101788 [details]
stack trace
While the patch above is a good clue, I don't see how it
can be the correct fix. I think we must have a race
or locking bug in the use of kacpid. Here's an additional
trace taken w/o any patches.
As I apply the patch? Created attachment 102072 [details]
calling acpi_hw_set_mode() from cpu0
After much debug it was found that on these failing systems acpi_hw_set_mode()
if executed on CPU1 we see this problem. The attached patch move the task onto
cpu0 before executing acpi_hw_set_mode() which resolves the bug.
Please try out this patch. To recreate the bug pass acpi_cpu1_bug as boot
parameter to reproduce the bug.
New patch, applied against 2.6 bk pull from this morning (2.6.8-rc1 or something), works fine. Passing "acpi_cpu1_bug" on kernel command line brings back problem as expected. Testing was done on an Intel D865GBFL board with a 2.6GHz P4-HT CPU. Same here on my D875PBZ with kernel 2.6.6-1.435.2.3smp What did you mean by "Same here on my D875PBZ with kernel 2.6.6- 1.435.2.3smp"? I assume that the new patch also solved your problem. Yes, that's what I meant. Created attachment 102078 [details]
calling acpi_hw_set_mode() from cpu0
Fixed the function where in after calling acpi_hw_set_mode() from cpu0, setting
the cpus_allowed back to original value.
Thanks Anil, This patch works for me with current 2.6.8-rc2 BK head. New patch from comment #49 applied to kernel-2.6.6-1.435.2.3 with SMP, SMT is working fine on D865GLC mb. Thankyou for your efforts. If this turns out to be the accepted fix for this bug, how can we know when this is included in a kernel release and which one? What is the usual process or cycle time for a fix like this to make it in? To comment 42: I'm confused by the call traces: [<c0118e90>] do_page_fault+0x0/0x526 [<c0106ecd>] error_code+0x2d/0x40 [<c03dc85c>] __down_failed+0x8/0xc [<c020542a>] acpi_get_register+0x46/0x79 Because the following call chain wouldn't reach __down_failed. acpi_hw_set_mode --> acpi_hw_get_mode -->acpi_get_register If the call trace is correct, I'm eager to know what's the exact semaphore , on which it down failed. Thanks, Luming Linux version 2.6.8-rc1 (sean@linux1) (gcc version 3.4.1 20040702 (Red Hat Linux 3.4.1-1)) #5 SMP Wed Jul 14 03:41:29 EDT 2004 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003ef2fc00 (usable) BIOS-e820: 000000003ef2fc00 - 000000003ef30000 (ACPI NVS) BIOS-e820: 000000003ef30000 - 000000003ef40000 (ACPI data) BIOS-e820: 000000003ef40000 - 000000003eff0000 (ACPI NVS) BIOS-e820: 000000003eff0000 - 000000003f000000 (reserved) BIOS-e820: 00000000fecf0000 - 00000000fecf1000 (reserved) BIOS-e820: 00000000fed20000 - 00000000feda0000 (reserved) Warning only 896MB will be used. Use a HIGHMEM enabled kernel. 896MB LOWMEM available. found SMP MP-table at 000ff780 On node 0 totalpages: 229376 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 225280 pages, LIFO batch:16 HighMem zone: 0 pages, LIFO batch:1 DMI 2.3 present. ACPI: RSDP (v000 ACPIAM ) @ 0x000f61b0 ACPI: RSDT (v001 INTEL D865GBF 0x20040405 MSFT 0x00000097) @ 0x3ef30000 ACPI: FADT (v002 INTEL D865GBF 0x20040405 MSFT 0x00000097) @ 0x3ef30200 ACPI: MADT (v001 INTEL D865GBF 0x20040405 MSFT 0x00000097) @ 0x3ef30300 ACPI: ASF! (v016 LEGEND I865PASF 0x00000001 MSFT 0x0100000d) @ 0x3ef345b0 ACPI: TCPA (v001 INTEL TBLOEMID 0x00000001 MSFT 0x00000097) @ 0x3ef34649 ACPI: WDDT (v001 INTEL OEMWDDT 0x00000001 MSFT 0x0100000d) @ 0x3ef3467d ACPI: DSDT (v001 INTEL D865GBF 0x00000001 MSFT 0x0100000d) @ 0x00000000 ACPI: PM-Timer IO Port: 0x408 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 15:2 APIC version 20 ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: Assigned apic_id 2 IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Built 1 zonelists Kernel command line: ro root=/dev/md0 kgdb console=kgdb Initializing CPU#0 PID hash table entries: 4096 (order 12: 32768 bytes) Detected 2793.223 MHz processor. Using pmtmr for high-res timesource Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 904696k/917504k available (1735k kernel code, 12020k reserved, 787k data, 168k init, 0k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay loop... 5537.79 BogoMIPS Security Scaffold v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode There is already a security framework initialized, register_security failed. selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 512 (order: 0, 4096 bytes) CPU: Trace cache: 12K uops, L1 D cache: 8K CPU: L2 cache: 512K CPU: Physical Processor ID: 0 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0: Intel P4/Xeon Extended MCE MSRs (12) available CPU0: Thermal monitoring enabled Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. CPU0: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 09 per-CPU timeslice cutoff: 1462.56 usecs. task migration cache decay timeout: 2 msecs. enabled ExtINT on CPU#0 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 Booting processor 1/1 eip 3000 Initializing CPU#1 masked ExtINT on CPU#1 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 Calibrating delay loop... 5570.56 BogoMIPS CPU: Trace cache: 12K uops, L1 D cache: 8K CPU: L2 cache: 512K CPU: Physical Processor ID: 0 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#1. CPU1: Intel P4/Xeon Extended MCE MSRs (12) available CPU1: Thermal monitoring enabled CPU1: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 09 Total of 2 processors activated (11108.35 BogoMIPS). ENABLING IO-APIC IRQs ..TIMER: vector=0x31 pin1=2 pin2=-1 activating NMI Watchdog ... done. testing NMI watchdog ... OK. Using local APIC timer interrupts. calibrating APIC timer ... ..... CPU clock speed is 2792.0736 MHz. ..... host bus clock speed is 199.0481 MHz. checking TSC synchronization across 2 CPUs: passed. Brought up 2 CPUs checking if image is initramfs...it isn't (no cpio magic); looks like an initrd Freeing initrd memory: 785k freed Inside worker_thread:0F -Exit- AE_OK 1 00 93 65 0A 01 70 0A .e...{m....e..p. scope [----] c03e16fc first child f7ee6fa8 wait_semaphore : ----Entry d) osl-0843 [0001] [14] os_wait_semaphore : Waiting for semaphore [f7ffe2e0|1|65535] rgs-0180 [0001] [13] ps_get_next_namestring: ----Entry --Exit- AE_OK tate : ----Entry 7ed7228 Owner F004] added to CHN0 (Device) [Node f7ed7b28] cquire_mutex : Thread 1 acquired Mutex [ACPI_MTX_Caches] e0|1] alk-0079 [0001] [13] ps_get_next_walk_op : ----Entry f7edeaa8 4> osl-0835 [0001] [11] os_wait_semaphore : ----Entry elease_mutex : Thread 1 released Mutex [ACPI_MTX_Caches] eleased Mutex [ACPI_MTX_Caches] ut_acquire_mutex : Thread 1 attempting to acquire Mutex [ACPI_MTX_Memory] osl-0901 [0001] [15] os_wait_semaphore : ----Exit- AE_OK naling semaphore[f7ffe2e0|1] osl-0901 [0001] [14] os_wait_semaphore : ----Exit- AE_OK 12] ut_delete_generic_stat: ----Entry 2 [0001] [10] ut_release_mutex : Thread 1 releasing Mutex [ACPI_MTX_Caches] 0923 [0001] [13] os_signal_semaphore : Signaling semaphore [f7ffe2e0|1] d Mutex [ACPI_MTX_Caches] x : Thread 1 acquired Mutex [ACPI_MTX_Caches] 8 [0001] [13] os_wait_semaphore : Acquired semaphore [f7ffe2e0|1|65535] 35] l-0927 [0001] [16] os_signal_semaphore : ----Exit- AE_OK 12] ut_remove_reference : ----Entry f7c410a8 ait_semaphore : Acquired semaphore[f7ffe2e0|1|65535] s_wait_semaphore : Acquired semaphore[f7ffe2e0|1|65535] 0110 [0001] [14] ds_method_data_init : ----Exit- osl-0901 [0001] [14] os_wait_semaphore : ----Exit- AE_OK osl-0915 [0001] [13] os_signal_semaphore : ----Entry semaphore : ----Exit- AE_OK E_OK osl-0898 [0001] [15] os_wait_semaphore : Acquired semaphore [f7ffe2e0|1|65535] sl-0843 [0001] [18] os_wait_semaphore : Waiting for semaphore [f7ffe2e0|1|65535] 13] os_wait_semaphore : ----Entry semaphore[f7ffed60|1|65535] [0001] [17] os_wait_semaphore : ----Exit- AE_OK : ----Exit- AE_OK sl-0835 [0001] [12] os_wait_semaphore : ----Entry t- 00000000 locate_object_des: f7c4a828 Size 28 01] [12] os_signal_semaphore : ----Entry psparse-0405 [0001] [11] ps_next_parse_state : ----Exit- AE_OK 0901 [0001] [11] os_wait_semaphore : ----Exit- AE_OK 001] [13] ns_attach_object : Installing f7c4c6a8 into Node f7f704a8 [ATA0] osl-0835 [0001] [12] os_wait_semaphore : ----Entry phore : Acquired semaphore[f7ffe2e0|1|65535] 65535] 0001] [13] os_signal_semaphore : Signaling semaphore[f7ffe2e0|1] semaphore : ----Exit- AE_OK ire_mutex : Thread 1 acquired Mutex [ACPI_MTX_Caches] re_mutex : Thread 1 attempting to acquire Mutex [ACPI_MTX_Caches] ----Exit- AE_OK release_mutex : Thread 1 releasing Mutex [ACPI_MTX_Caches] ry f7edea28 s_attach_object : ----Exit- AE_OK emaphore : Acquired semaphore[f7ffe2e0|1|65535] l-0923 [0001] [14] os_signal_semaphore : Signaling semaphore [f7ffe2e0|1] 0734 [0001] [17] ut_acquire_mutex : Thread 1 attempting to acquire Mutex [ACPI_MTX_Caches] misc-0782 [0001] [13] ut_release_mutex : Thread 1 releasing Mutex [ACPI_MTX_Caches] re : Signaling semaphore[f7ffe2e0|1] and_enter : ----Entry t-0086 [0001] [14] ut_create_internal_obj: ----Entry Integer [14] os_signal_semaphore : Signaling semaphore[f7ffe2e0|1] re_from_cache : Object f7ed6b28 from State Object Cache [0001] [14] os_wait_semaphore : ----Entry 0001] [14] ds_push_walk_state : ----Exit- 3] ns_get_attached_object: ----Exit- 00000000 3 [0001] [17] os_wait_semaphore : Waiting for semaphore [f7ffe2e0|1|65535] aling semaphore[f7ffe2e0|1] che : Object f7edd9a8 from Parse Node Cache 0 utmisc-0734 [0001] [16] ut_acquire_mutex : Thread 1 attempting to acquire Mutex [ACPI_MTX_Memory] hore[f7ffe2e0|1] 835 [0001] [20] os_wait_semaphore : ----Entry red semaphore[f7ffe2e0|1|65535] try f7f74828 nal_pathna: ----Entry f7f6fca8 fed60|1|65535] tate : ----Exit- f7ed6228 Cache utmisc-0782 [0001] [12] ut_release_mutex : Thread 1 releasing Mutex [ACPI_MTX_Caches] : ----Exit- AE_OK 00040 0001] [15] ut_acquire_mutex : Thread 1 attempting to acquire Mutex [ACPI_MTX_Caches] misc-0782 [0001] [13] ut_release_mutex : Thread 1 releasing Mutex [ACPI_MTX_Caches] osl-0927 [0001] [16] os_signal_semaphore : ----Exit- AE_OK try osl-0923 [0001] [16] os_signal_semaphore : Signaling semaphore [f7ffe2e0|1] : ----Entry te: ----Exit- f7ed5c28 elease_mutex : Thread 1 released Mutex [ACPI_MTX_Caches] ] [15] ut_release_mutex : Thread 1 releasing Mutex [ACPI_MTX_Caches] e0|1] ps_delete_parse_tree : ----Entry f7ede528 semaphore[f7ffe2e0|1|65535] [16] os_wait_semaphore : ----Exit- AE_OK emaphore : ----Entry ---Entry read 1 released Mutex [ACPI_MTX_Caches] 23 [0001] [13] os_signal_semaphore : Signaling semaphore[f7ffe2e0|1] utex : Thread 1 attempting to acquire Mutex [ACPI_MTX_Caches] scope : ----Entry ed Mutex [ACPI_MTX_Caches] osl-0843 [0001] [16] os_wait_semaphore : Waiting for semaphore [f7ffe2e0|1|65535] pting to acquire Mutex [ACPI_MTX_Caches] Entry osl-0923 [0001] [16] os_signal_semaphore : Signaling semaphore [f7ffe2e0|1] t_free : ----Exit- X_Caches] misc-0740 [0001] [16] ut_acquire_mutex : Thread 1 acquired Mutex [ACPI_MTX_Caches] init : ----Exit- Thread 1 released Mutex [ACPI_MTX_Caches] 7 [0001] [18] ns_get_external_pathna: ----Entry f7f7aba8 or semaphore[f7ffed60|1|65535] acquire Mutex [ACPI_MTX_Caches] osl-0835 [0001] [13] os_wait_semaphore : ----Entry ntry _offset 00082 osl-0898 [0001] [21] os_wait_semaphore : Acquired semaphore [f7ffed60|1|65535] 34 [0001] [15] ut_acquire_mutex : Thread 1 attempting to acquire Mutex [ACPI_MTX_Caches] ] ut_push_generic_state : ----Entry 4> osl-0927 [0001] [20] os_signal_semaphore : ----Exit- AE_OK [16] os_wait_semaphore : ----Entry re[f7ffed60|1] Thread 1 acquired Mutex [ACPI_MTX_Caches] 0001] [13] ps_get_next_arg : ----Entry f7f74e10 e_generic_stat: ----Exit- tate : ----Entry s_signal_semaphore : Signaling semaphore[f7ffed60|1] osl-0915 [0001] [16] os_signal_semaphore : ----Entry : Thread 1 releasing Mutex [ACPI_MTX_Caches] _wait_semaphore : Waiting for semaphore[f7ffe2e0|1|65535] : ----Exit- AE_OK ] os_signal_semaphore : Signaling semaphore[f7ffe2e0|1] s_signal_semaphore : ----Entry _walk_op : ----Exit- AE_OK 16] os_wait_semaphore : Acquired semaphore[f7ffe2e0|1|65535] [0001] [16] os_signal_semaphore : Signaling semaphore[f7ffe2e0|1] 1] [15] ut_acquire_mutex : Thread 1 attempting to acquire Mutex [ACPI_MTX_Caches] quire Mutex [ACPI_MTX_Caches] _next_arg : ----Entry f7f74e10 osl-0898 [0001] [15] os_wait_semaphore : Acquired semaphore [f7ffe2e0|1|65535] 2 [0001] [15] ut_release_mutex : Thread 1 released Mutex [ACPI_MTX_Caches] osl-0898 [0001] [16] os_wait_semaphore : Acquired semaphore [f7ffe2e0|1|65535] -Entry f7edda28 tex : Thread 1 releasing Mutex [ACPI_MTX_Caches] 01] [18] ut_free : f7ff47a8 freed 0001] [15] ut_acquire_from_cache : Object f7f6fb28 from State Object Cache [0001] [19] os_signal_semaphore : ----Entry 0250 [0001] [13] ps_pop_scope : ----Exit- rom_cache : Object f7f6f828 from Parse Node Cache lative to prefix scope [_CRS] (f7c44328) 3 [0001] [16] os_wait_semaphore : Waiting for semaphore [f7ffe2e0|1|65535] [0001] [13] os_signal_semaphore : ----Entry : ----Exit- AE_OK rch-0196 [0001] [16] ns_search_parent_tree : ----Entry read 1 released Mutex [ACPI_MTX_Caches] os_signal_semaphore : Signaling semaphore[f7ffe2e0|1] _release_mutex : Thread 1 releasing Mutex [ACPI_MTX_Memory] dsmethod-0172 [0001] [10] ds_parse_method : **** [_STA] Parsed **** named_obj=f7c43ca8 Op=c17fd1a8 ----Entry f7f74828 maphore[f7ffe2e0|1|65535] -Entry > utmisc-0782 [0001] [13] ut_release_mutex : Thread 1 releasing Mutex [ACPI_MTX_Caches] : Waiting for semaphore[f7ffe2e0|1|65535] Breakpoint 1, acpi_enable () at drivers/acpi/events/evxfevnt.c:67 67 { (gdb) next 71 ACPI_FUNCTION_TRACE ("acpi_enable"); (gdb) next 76 if (!acpi_gbl_FADT) { (gdb) next 71 ACPI_FUNCTION_TRACE ("acpi_enable"); (gdb) next 76 if (!acpi_gbl_FADT) { (gdb) next 81 if (acpi_hw_get_mode() == ACPI_SYS_MODE_ACPI) { (gdb) next 87 status = acpi_hw_set_mode (ACPI_SYS_MODE_ACPI); (gdb) step acpi_hw_set_mode (mode=0) at drivers/acpi/hardware/hwacpi.c:109 109 { (gdb) step 115 ACPI_FUNCTION_TRACE ("hw_set_mode"); (gdb) step acpi_ut_trace (line_number=4159455036, dbg_info=0x0) at drivers/acpi/utilities/utdebug.c:245 245 { (gdb) step 247 acpi_gbl_nesting_level++; (gdb) step 245 { (gdb) step 248 acpi_ut_track_stack_ptr (); (gdb) step acpi_ut_track_stack_ptr () at drivers/acpi/utilities/utdebug.c:96 96 { (gdb) step 102 if (current_sp < acpi_gbl_lowest_stack_pointer) { (gdb) step 106 if (acpi_gbl_nesting_level > acpi_gbl_deepest_nesting) { (gdb) step 103 acpi_gbl_lowest_stack_pointer = current_sp; (gdb) step 106 if (acpi_gbl_nesting_level > acpi_gbl_deepest_nesting) { (gdb) step 107 acpi_gbl_deepest_nesting = acpi_gbl_nesting_level; (gdb) step 109 } (gdb) step acpi_ut_trace (line_number=115, dbg_info=0xf7ec3f1c) at drivers/acpi/utilities/utdebug.c:250 250 acpi_ut_debug_print (ACPI_LV_FUNCTIONS, line_number, dbg_info, (gdb) step acpi_ut_debug_print (requested_debug_level=4159455004, line_number=3224136991, dbg_info=0x73, format=0x0) at drivers/acpi/utilities/utdebug.c:139 139 { (gdb) step 147 if (!(requested_debug_level & acpi_dbg_level) || (gdb) step 139 { (gdb) step 147 if (!(requested_debug_level & acpi_dbg_level) || (gdb) step 139 { (gdb) step 147 if (!(requested_debug_level & acpi_dbg_level) || (gdb) step 155 thread_id = acpi_os_get_thread_id (); (gdb) step acpi_os_get_thread_id () at thread_info.h:90 90 __asm__("andl %%esp,%0; ":"=r" (ti) : "0" (~ (THREAD_SIZE - 1))); (gdb) step 974 return 0; (gdb) step 90 __asm__("andl %%esp,%0; ":"=r" (ti) : "0" (~ (THREAD_SIZE - 1))); (gdb) step 88 { (gdb) step 9 { (gdb) step 975 } (gdb) step acpi_ut_debug_print (requested_debug_level=2097152, line_number=115, dbg_info=0xf7ec3f1c, format=0xc02c6d1f "%s\n") at drivers/acpi/utilities/utdebug.c:157 157 if (thread_id != acpi_gbl_prev_thread_id) { (gdb) step 170 acpi_os_printf ("%8s-%04ld ", dbg_info->module_name, line_number); (gdb) step acpi_os_printf (fmt=0x1 <Address 0x1 out of bounds>) at drivers/acpi/osl.c:105 105 { (gdb) step 107 va_start(args, fmt); (gdb) step 108 acpi_os_vprintf(fmt, args); (gdb) step acpi_os_vprintf (fmt=0xc02d7698 "%8s-%04ld ", args=0x73 <Address 0x73 out of bounds>) at drivers/acpi/osl.c:114 114 { (gdb) step 117 vsprintf(buffer, fmt, args); (gdb) step vsprintf (buf=0xc03e11c0 "----Exit- ", '0' <repeats 15 times>, "2\n", fmt=0xc02d7698 "%8s-%04ld ", args=0xf7ec3ee8 "Ã1-Ãs") at lib/vsprintf.c:582 582 { (gdb) step 583 return vsnprintf(buf, (~0U)>>1, fmt, args); (gdb) step vsnprintf (buf=0x200000 <Address 0x200000 out of bounds>, size=3223201211, fmt=0x73 <Address 0x73 out of bounds>, args=0xc01d2a91 "¡\230\024>Ã\211ã\203x0") at lib/vsprintf.c:257 257 { (gdb) step 274 if (unlikely((int) size < 0)) { (gdb) step 257 { (gdb) step 274 if (unlikely((int) size < 0)) { (gdb) step 282 str = buf; (gdb) step 283 end = buf + size - 1; (gdb) step 285 if (end < buf - 1) { (gdb) step 287 size = end - buf + 1; (gdb) step 286 end = ((void *) -1); (gdb) step 287 size = end - buf + 1; (gdb) step 290 for (; *fmt ; ++fmt) { (gdb) step 291 if (*fmt != '%') { (gdb) step 299 flags = 0; (gdb) step 301 ++fmt; /* this also skips first '%' */ (gdb) step 302 switch (*fmt) { (gdb) step 311 field_width = -1; (gdb) step 312 if (isdigit(*fmt)) (gdb) step 313 field_width = skip_atoi(&fmt); (gdb) step skip_atoi (s=0xc02d7699) at lib/vsprintf.c:129 129 { (gdb) step 130 int i=0; (gdb) step 133 i = i*10 + *((*s)++) - '0'; (gdb) step 135 } (gdb) step vsnprintf (buf=0xc03e11c0 "----Exit- ", '0' <repeats 15 times>, "2 \n", size=1069674048, fmt=0xc02d769a "s-%04ld ", args=0xf7ec3ee8 "Ã1-Ãs") at lib/vsprintf.c:325 325 precision = -1; (gdb) step 313 field_width = skip_atoi(&fmt); (gdb) step 326 if (*fmt == '.') { (gdb) step 340 qualifier = -1; (gdb) step 341 if (*fmt == 'h' || *fmt == 'l' || *fmt == 'L' || (gdb) step 352 base = 10; (gdb) step 354 switch (*fmt) { (gdb) step 375 s = va_arg(args, char *); (gdb) step 362 __asm__ __volatile__( (gdb) step 375 s = va_arg(args, char *); (gdb) step 377 s = "<NULL>"; (gdb) step 362 __asm__ __volatile__( (gdb) step 381 if (!(flags & LEFT)) { (gdb) step 362 __asm__ __volatile__( (gdb) step 381 if (!(flags & LEFT)) { (gdb) step 385 ++str; (gdb) step 383 if (str <= end) (gdb) step 384 *str = ' '; (gdb) step 385 ++str; (gdb) step 383 if (str <= end) (gdb) step 384 *str = ' '; (gdb) step 385 ++str; (gdb) step 388 for (i = 0; i < len; ++i) { (gdb) step 389 if (str <= end) (gdb) step 390 *str = *s; (gdb) step 391 ++str; ++s; (gdb) step 389 if (str <= end) (gdb) step 390 *str = *s; (gdb) step 391 ++str; ++s; (gdb) step 389 if (str <= end) (gdb) step 390 *str = *s; (gdb) step 391 ++str; ++s; (gdb) step 389 if (str <= end) (gdb) step 390 *str = *s; (gdb) step 391 ++str; ++s; (gdb) step 389 if (str <= end) (gdb) step 390 *str = *s; (gdb) step 391 ++str; ++s; (gdb) step 389 if (str <= end) (gdb) next 390 *str = *s; (gdb) next 391 ++str; ++s; (gdb) next 396 ++str; (gdb) next 290 for (; *fmt ; ++fmt) { (gdb) next 291 if (*fmt != '%') { (gdb) next 454 if (str <= end) (gdb) next 455 *str = *fmt; (gdb) next 456 ++str; (gdb) next 290 for (; *fmt ; ++fmt) { (gdb) next 291 if (*fmt != '%') { (gdb) next 299 flags = 0; (gdb) next 301 ++fmt; /* this also skips first '%' */ (gdb) next 302 switch (*fmt) { (gdb) next 307 case '0': flags |= ZEROPAD; goto repeat; (gdb) next 301 ++fmt; /* this also skips first '%' */ (gdb) next 302 switch (*fmt) { (gdb) next 311 field_width = -1; (gdb) next 312 if (isdigit(*fmt)) (gdb) next 313 field_width = skip_atoi(&fmt); (gdb) next 325 precision = -1; (gdb) next 313 field_width = skip_atoi(&fmt); (gdb) next 326 if (*fmt == '.') { (gdb) next 340 qualifier = -1; (gdb) next 341 if (*fmt == 'h' || *fmt == 'l' || *fmt == 'L' || (gdb) next 343 qualifier = *fmt; (gdb) next 344 ++fmt; (gdb) next 345 if (qualifier == 'l' && *fmt == 'l') { (gdb) next 343 qualifier = *fmt; (gdb) next 344 ++fmt; (gdb) next 345 if (qualifier == 'l' && *fmt == 'l') { (gdb) next 352 base = 10; (gdb) next 354 switch (*fmt) { (gdb) next 445 flags |= SIGN; (gdb) next 462 if (qualifier == 'L') (gdb) next 464 else if (qualifier == 'l') { (gdb) next 475 num = va_arg(args, unsigned int); (gdb) next 476 if (flags & SIGN) (gdb) next 475 num = va_arg(args, unsigned int); (gdb) next 476 if (flags & SIGN) (gdb) next 477 num = (signed int) num; (gdb) next 479 str = number(str, end, num, base, (gdb) next 290 for (; *fmt ; ++fmt) { (gdb) next 291 if (*fmt != '%') { (gdb) next 454 if (str <= end) (gdb) next 455 *str = *fmt; (gdb) next 456 ++str; (gdb) next 290 for (; *fmt ; ++fmt) { (gdb) next 482 if (str <= end) (gdb) next 483 *str = '\0'; (gdb) next 490 return str-buf; (gdb) next 491 } (gdb) next vsprintf (buf=0xe <Address 0xe out of bounds>, fmt=0xc03e11cd " ", args=0xc02d76a2 "") at lib/vsprintf.c:584 584 } (gdb) next acpi_os_vprintf (fmt=0xe <Address 0xe out of bounds>, args=0xc02d76a2 "") at drivers/acpi/osl.c:126 126 printk("%s", buffer); (gdb) next acpi_os_printf (fmt=0xc03e11c0 " hwacpi-0115 ") at drivers/acpi/osl.c:109 109 va_end(args); (gdb) next acpi_ut_debug_print (requested_debug_level=4159455004, line_number=115, dbg_info=0xf7ec3f1c, format=0x200000 <Address 0x200000 out of bounds>) at drivers/acpi/utilities/utdebug.c:172 172 if (ACPI_LV_THREADS & acpi_dbg_level) { (gdb) next 173 acpi_os_printf ("[%04lX] ", thread_id); (gdb) next 176 acpi_os_printf ("[%02ld] %-22.22s: ", acpi_gbl_nesting_level, dbg_info->proc_name); (gdb) next 178 va_start (args, format); (gdb) next 179 acpi_os_vprintf (format, args); (gdb) next 180 } (gdb) next 0xc01e25bb in acpi_ut_trace (line_number=115, dbg_info=0xf7ec3f1c) at drivers/acpi/utilities/utdebug.c:250 250 acpi_ut_debug_print (ACPI_LV_FUNCTIONS, line_number, dbg_info, (gdb) next acpi_hw_set_mode (mode=1) at drivers/acpi/hardware/hwacpi.c:121 121 if (!acpi_gbl_FADT->smi_cmd) { (gdb) next 115 ACPI_FUNCTION_TRACE ("hw_set_mode"); (gdb) next 121 if (!acpi_gbl_FADT->smi_cmd) { (gdb) next 133 if (!acpi_gbl_FADT->acpi_enable && !acpi_gbl_FADT- >acpi_disable) { (gdb) next 138 switch (mode) { (gdb) next 143 status = acpi_os_write_port (acpi_gbl_FADT- >smi_cmd, (gdb) next See this one, here it never returns from acpi_os_write_port(). I highly suspect this is a locking bug , introduced by the following
code:
u32 acpi_os_get_thread_id (void)
{
if (!in_atomic())
return current->pid;
return 0;
}
In acpi_hw_get_mode(), acpi_get_register will be called with
ACPI_MTX_LOCK argument, in turn ,mutex ACPI_MTX_HARDWARE could be
acquired by current->pid or 0, depending on expression :in_atomic().
We expected the corresponding acpi_ut_release_mutex should get same
mutex owner from the subsequent call of acpi_os_get_thread_id().
But it CANNOT be guaranteed, because there is a slow function call
acpi_hw_register_read between the acpi_ut_acquire_mutex and the
acpi_ut_release_mutex. It is slow because it's IO operation.
It is possible that ACPI_MTX_HARDWARE get acquired using current-
>pid, and get released using 0. As result, ACPI_MTX_HARDWARE could be
in acquired state after one call of acpi_get_register.
And it will affect the following functions:
0 evxfevnt.c acpi_enable_event 189
1 evxfevnt.c acpi_disable_event 406
2 evxfevnt.c acpi_get_event_status 552
3 hwacpi.c acpi_hw_get_mode 219
I don't know whether the following patch can solve the problem: --- 26/drivers/acpi/osl.c 2004-07-21 10:35:42.000000000 +0800 +++ 26/drivers/acpi/osl.c.0 2004-07-24 04:12:17.000000000 +0800 @@ -968,10 +968,7 @@ u32 acpi_os_get_thread_id (void) { - if (!in_atomic()) - return current->pid; - - return 0; + return current->pid; } acpi_status The patch in comment 56 fixes the SMP boot problem on my grantsdale machine. I have a kernel crash with patch in comment #56 on D865GLC motherboard. This is on kernel-2.6.8rc2 which I have running ok with patch from comment #49. I started with clean untarred source code (to not mix the patches). The patch would not take without reject so I applied by hand. Anyone else getting it to work? To comment #57, I'm glad to know that patch can have some positive result. To comment #58, could you please post the dmesg of kernel crash with patch in comment #56. Thanks, Luming The boot gets as far as the first [ACPI] message stating the ACPI version number and then just hangs. I need to turn on debugging and try again. The end result of the patch looked like: u32 acpi_os_get_thread_id (void) { return current->pid; } I have compiled with patch from comment 56 and debugging enabled. The boot goes a few lines and gets the Unable to handle kernel NULL pointer dereference message and then hangs. Thats as much as I can do here. Wait a sec! I tried several hard resets after the lockups to see if anything else will pop out. I got a few partial Call Traces and oops. In one case I got this: bad:scheduling while atomic Also this trace on a different reset: acpi_ev_fixed_event_initialize schedule acpi_ev_initialize load_balance acpi_enable_subsystem poke_blanked_console acpi_bus_init preempt_schedule sync_inodes acpi_init vgacon_cursor clear_selection do_initcalls init_workqueues init init kerenel_thread_helper kernel panic: Fatal exception in interrupt In interupt handler - not syncing Luming, I put this in acpi_os_get_thread_id() if (in_atomic()) dump_stack(); It never fired and I hit the usual boot-time panic. Anil, I printed out current_thread_info()->cpu from acpi_hw_set_mode() and it is always 1 on my D875PBZ, which never returns from that routine. What I noticed is that if acpi_hw_set_mode() is called from CPU other than the BSP, acpi_hw_set_mode() never returns and we will endup in some stack corruption. However if called from BSP, things work fine. This behaviour is same irrespective of kacpid or SCHED_SMT. Looks like this bug exist even on 2.4 kernel http://bugme.osdl.org/show_bug.cgi?id=1908 as the description of the problem appears to be the same. The patch provided in comment#49 provides a way to trigger this bug by passing kernel parameter and which can be removed if this is going to be the final solution. Ah, good hunting Anil. I've used patch in comment #49 to reproduce the failure w/o CONFIG_SCHED_SMT. I guess it is just luck of the scheduler changes that we happened to execute this register access on non-CPU0. I think this is a platform-dependent BIOS bug -- there is no rule that says we must write the SMI_CMD register from the primary sibling. And when I use "acpi_cpu1_bug" from comment #49 on my E7505VB2 it boots just fine. That said, I think we have to work around this in Linux because the platforms are already shipped in volume and those users certainly would see this as a Linux regression. This reminds me of the SMP powerdown bug: http://bugzilla.kernel.org/show_bug.cgi?id=1141 Where it seems that some SMP machines would not power off if we wrote the registers from someplace other than CPU0. verified that latest BIOS for D875PBZ (P27) still fails w/o the patch. The reference 7505 bios is known to have a bug where ANY smm calls from the non-boot cpu cause problems; we hit those too with the SMM ps/2 legacy emulation when we redirect the ps/2 and USB interrupts to a non-boot cpu... Some questions (forgive my ignorance): 1) Will this patch be submitted to the kernel development tree? 2) Will this patch be included in a future FC2 kernel update, and if so, is this going to happen soon? 3) Is there any method for petitioning the bios writers to fix this bios bug? (Like Len Brown, I'm also running the latest bios release (P27) on an Intel D875PBZ board, and the problem persists) I noticed that todays release of kernel-2.6.7-1.494.2.2smp runs fine on my machine without the patch. I am sure it is coincidental that it just happens to initialize on cpu0. I am going to apply the patch anyway and confirm that it still boots OK. From my study of it, I see that it is a very simple fix. I suspect the bug only hits under the right startup conditions? kernel-2.6.7-1.494.2.2smp booting and running fine on intel D875PBZ board. Not clear from the release notes whether this kernel has been patched to fix this problem, or whether we just got lucky? kernel-smp-2.6.7-1.494.2.2 works for me too, while the former one - kernel-smp-2.6.6-1.435.2.3 - crashes (HT P4). The current 2.6.7 Red Hat kernel includes the patch in Comment 32. I think it should be backed out and use the one in Comment 49. Created attachment 102665 [details] test patch to enable ACPI mode before smp_init() We have a problem on some systems when ACPI mode is enabled after the LAPIC timer is running. http://bugzilla.kernel.org/show_bug.cgi?id=1269 The proposed (test) fix for that problem (attached) moves the ACPI mode enable before smp_init(); so it should address this failure as well. Please give it a test. Tested patch from comment #73 on kernel 2.6.8-rc2 with successful boot and no other issues noted at this time on D865GLC motherboard. I just tested the patch from comment #73 applied to kernel 2.6.8 on my D875PBZ: it boots successfully. Created attachment 102747 [details] Reject from applying patch in comment 73 to 2.6.8.1 I get this reject when I apply the patch in comment #73 to kernel 2.6.8.1 Created attachment 102755 [details]
Ajustement to last patch for 2.6.8.1
I've corrected the rejects for 2.6.8.1
This bug is also reported here: http://bugzilla.kernel.org/show_bug.cgi?id=2941 and the fix in (comment #73) is in 2.6.9 tree. Could someone provide instructions as to how to apply the patches mentioned? Please note that I am C-ignorant. A step-by-step would be much appreciated. This is fixed in kernel-2.6.8-1.521. What kernel are you running? I am running kernel-2.6.8-1.521 on a dual Xeon, using a SuperMicro X5DP8-G2 motherboard. The problem does not seem to be fixed for me. kernel-2.6.8-1.521 (single cpu) boots fine. kernel-2.6.8-1.521smp does not boot. By the way, I have turned off ACPI both in the BIOS and in grub.conf. And, I have wiped and reinstalled the machine a few time, and I have made sure that I have the latest BIOS revision (1.5c). None of this changes the behavior. Is anyone else still having this problem? mass update for old bugs: Is this still a problem with the 2.6.9 based update kernel ? 2.6.9 out of the box is fine - problem resolved on: Intel D865GBF w/HT and also: Serverworks CSB5 board w/ dual xeon HT Booting without problems now No problems with 2.6.9 (FC2) and D875PBZ board - smp boots fine. 2.6.9 Working fine now on D865GLC The patch supplied in Comment #77 fixes the boot-time oops I was seeing with 2.6.8-10 compiled with ACPI, SMP and hyperthreading. CPU is a DL865GLC. Fedora Core 2 has now reached end of life, and no further updates will be provided by Red Hat. The Fedora legacy project will be producing further kernel updates for security problems only. If this bug has not been fixed in the latest Fedora Core 2 update kernel, please try to reproduce it under Fedora Core 3, and reopen if necessary, changing the product version accordingly. Thank you. |