After the update to kernel-2.6.40-4.fc15.i686 the system fails to boot unless I pass acpi=off as boot parameter. This wasn't necessary with the previous kernel ( kernel-2.6.38.8-35.fc15.i686 ) The last displayed messages resemble this: input: Power Button as /devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input0 ACPI: Power Button [PWRB] input: Sleep Button as /devices/LNXSYSTM:00/device:00/PNP0C0E:00/input/input1 ACPI: Sleep Button [SLPB] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2 ACPI: Power Button [PWRF] The system just hangs (ctrl+alt+del, caps lock, num lock not working, no other messages after those). It also fails if I use noapic as boot param; it only works if I use acpi=off. Is this a regression or just a case of crappy motherboard ? The mainboard is Asus P4V8X-X with a Via P4X533 chipset. I will attach smolt profile, dmesg for 2.6.38-8 and dmesg for 2.6.40-4 with acpi=off.
Created attachment 516508 [details] Smolt profile
Created attachment 516509 [details] Dmesg when booting 2.6.38.8-35 without acpi=off
Created attachment 516510 [details] Dmesg when booting 2.6.40-4 with acpi=off
I also have have the same problem. this is my smolt uuid: pub_9fbe0e20-dabc-4df0-8c82-886fad980710
Update: nothing changed with kernel-2.6.40.3-0.fc15.i686
I'm having the same problem with 2.6.40.3-0.fc15-i686. It only boots with acpi=off: http://www.smolts.org/client/show/pub_a2e25252-5bf9-46b7-8597-695a377b4aa7 Attaching the serial log from when I boot without acpi=off.
Created attachment 519205 [details] kernel output dumped to serial console.
BUG: unable to handle kernel paging request at 00c0b141 IP: [<f4402300>] 0xf44022ff *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.40.3-0.fc15.i686 #1 /D865GBF EIP: 0060:[<f4402300>] EFLAGS: 00010246 CPU: 0 EIP is at 0xf4402300 EAX: 00000000 EBX: f4620800 ECX: 00000000 EDX: 00000000 ESI: f4402300 EDI: c0634bf5 EBP: f4620800 ESP: f4493d08 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process swapper (pid: 0, ti=f4492000 task=c0a29fe0 task.ti=c09dc000) Stack: f4493d40 c04deb9b f45b4a20 00001001 f45b4a30 f4493d48 c06354b2 f45b4a30 0000001c 000080d0 f45b4a20 f4620800 00000000 f45b4a20 00000000 f4620da8 f4493d6c c0634ccd f4493d58 f442e618 c096fd0d 00000000 f4620800 f4409c00 Call Trace: [<c04deb9b>] ? __kmalloc+0x103/0x110 [<c06354b2>] ? acpi_ns_evaluate+0x3a/0x18d [<c0634ccd>] ? acpi_evaluate_object+0xd6/0x1c5 [<c064240d>] ? acpi_processor_get_power_info+0x5a/0x53d [<c07e9a6b>] ? _raw_spin_unlock_irqrestore+0x13/0x15 [<c042a839>] ? task_rq_unlock+0x17/0x19 [<c043877c>] ? set_cpus_allowed_ptr+0xc7/0xd1 [<c0641470>] ? acpi_processor_get_throttling_fadt+0x72/0x7a [<c06416b0>] ? acpi_processor_get_throttling+0x65/0x6e [<c0642339>] ? acpi_processor_get_throttling_info+0x4d1/0x500 [<c0634daf>] ? acpi_evaluate_object+0x1b8/0x1c5 [<c07dfeaf>] ? acpi_processor_power_init+0xdc/0x10c [<c07dfcf7>] ? acpi_processor_add+0x40e/0x4ea [<c0535793>] ? sysfs_do_create_link+0x120/0x157 [<c0620107>] ? acpi_device_probe+0x41/0xf5 [<c067ff74>] ? driver_probe_device+0x123/0x1ff [<c04297af>] ? should_resched+0xd/0x27 [<c07e8801>] ? _cond_resched+0xd/0x21 [<c0680098>] ? __driver_attach+0x48/0x64 [<c067f1d6>] ? bus_for_each_dev+0x42/0x6b [<c067fbd1>] ? driver_attach+0x1f/0x23 [<c0680050>] ? driver_probe_device+0x1ff/0x1ff [<c067f870>] ? bus_add_driver+0xca/0x210 [<c06804c2>] ? driver_register+0x84/0xe3 [<c0620888>] ? acpi_bus_register_driver+0x3f/0x41 [<c0aafe81>] ? acpi_processor_init+0x65/0xd0 [<c040118a>] ? do_one_initcall+0x8c/0x142 [<c0aafe1c>] ? acpi_pci_slot_init+0x1b/0x1b [<c0a84827>] ? kernel_init+0xaa/0x136 [<c0a8477d>] ? start_kernel+0x353/0x353
That oops address is somewhere in the ACPI BIOS, i think.
People reporting this bug have either: Intel(R) Pentium(R) D CPU 2.66GHz Or: Intel(R) Pentium(R) 4 CPU 3.00GHz
Disassembly of the oopsing code shows that it's not even really valid instructions. So the ACPI code just jumped to some invalid address.
Would be great to know via serial console capture if all the failures look like Adam's in comment 7/8, or if there are multiple failures here. If we really are crashing under acpi_processor_get_power_info(), then something in C-states is broken. Do any of these (individual) cmdline params allow boot to succeed? idle=poll idle=halt processor.nocst=1 processor.max_cstate=1 Adam, please attach the output from acpidump. re: comment #10 actually comment #1 shows this: CPU Model: Intel(R) Celeron(R) CPU 2.40GHz CPU Family: 15 CPU Model Num: 4 but that is still a version of the P4. Apparently these are all 32-bit processors, so we don't have the option to try the x86_64 kernel. Can this be reproduced with an upstream kernel.org kernel? Presumably upstream 2.6.38.stable works, b/c FC15's kernel-2.6.38.8-35.fc15.i686 worked. What about 2.6.39 3.0.0 (I assume that FC is using 2.6.40 as a synonym for this?) 3.1-rc? BTW, unrelated to the cause of this bug report, but present in comment #3 Linux version 2.6.40-4.fc15.i686 WARNING: at arch/x86/kernel/apm_32.c:908 apm_cpu_idle+0x42/0x251() ... deprecated apm_cpu_idle will be deleted in 2012 I added that warning to 3.0 to let folks know that CONFIG_APM_CPU_IDLE=y may not be what you want. If you think you really do need it, I need to hear from you...
Created attachment 520150 [details] Output from acpidump
It is my understanding that Fedora's 2.6.40 is some version of 3.0.*. I can try 3.0.3 over the weekend, assuming NJ isn't completely washed away. I will try those other kernel parameters as well when I get a chance.
idle=poll idle=halt processor.nocst=1 Each one let the machine boot. It still crashed with: processor.max_cstate=1
processor.nocst=1 worked -- yay, that's a big clue. Using that param, please show the output from grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* Also, if you can show that same output for the working 2.6.38 kernel, that would be helpful to compare what the FADT does vs _CST. It seems that acpi_processor_get_power_info_cst() is bombing out, presumably in the evaluation of _CST itself. That routine has not changed recently, so it must be something funky in the actual AML/interpreter. Unfortunately, the version of acpidump you used didn't grab the dynamic tables where your _CST lives, or they were not exported. Can you attach the files from here? /sys/firmware/acpi/tables/dynamic/* If you don't see anything there w/ the latest kernel, then go back to working 2.6.38 and they should be present there.
On 2.6.40, /sys/devices/system/cpu/cpu0/ doesn't contain cpuidle on this machine... There is also nothing under /sys/firmware/acpi/tables/dynamic/ on 2.6.40. I will have to build a 2.6.38 kernel first as the previous version I have installed is a 2.6.35 F14 kernel. Adam
On 2.6.38.8 /sys/firmware/acpi/tables/dynamic/ contains nothing and /sys/devices/system/cpu/cpu0/cpuidle does not exist either.
Has anyone tried a 2.6.39 kernel as Len asked? You might be able to use http://koji.fedoraproject.org/koji/buildinfo?buildID=244663 to test with. Bug 730007 is showing similar issues and thus far we only see it on particular Pentium 4 models. If someone is willing to git bisect this on an afflicted machine, that would be very helpful.
Matthew Garrett pointed me at a patch for a regression in ACPI yesterday. I've started a scratch build with this patch applied. Could those with an impacted machine please try this kernel when it finishes building and let us know the results? http://koji.fedoraproject.org/koji/taskinfo?taskID=3624930
My apologies, I pasted the wrong link to the scratch build. This is the one that should be tested: http://koji.fedoraproject.org/koji/taskinfo?taskID=3625177
Unfortunately I can't test it anymore, I'm running F16 now, with 3.1.6-1.fc16.i686, and I still have to use processor.nocst=1...
(In reply to comment #22) > Unfortunately I can't test it anymore, I'm running F16 now, with > 3.1.6-1.fc16.i686, and I still have to use processor.nocst=1... The 2.6.41.x kernels are almost identical to the F16 3.1.x kernels. They are both based on the 3.1.x stable series. You should be able to install the kernel from the scratch build without issue.
(In reply to comment #21) > My apologies, I pasted the wrong link to the scratch build. This is the one > that should be tested: > > http://koji.fedoraproject.org/koji/taskinfo?taskID=3625177 While I too am now running F16, installing the kernel and trying a boot indicates that there is still a panic with the backtrace indicating acpi issues, so I do not think this is (yet) identified/fixed.
Same here, nothing changed.
One more test case on D865PERL got same result while boot. (DMI: /D865PERL , BIOS RL86510A.86A.0061.P09.0308281850 08/28/2003) P.S: P09, P15, P21 BIOS has same result. After enable ACPI debug and enable early stage serial console then I got bad CST. Hers is the combination can boot: enable HT in BIOS, acpi=off (no HT, no ACPI) disable HT, no kernel command (no HT, have ACPI) enable HT, processor.nocst=1 (have HT, have ACPI) This is same family with Bug#730007 I have: processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.60GHz stepping : 9 cpu MHz : 2600.000 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 1 initial apicid : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr bogomips : 5185.92 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 32 bits virtual power management:
(In reply to comment #26) 3.1.6-1.fc16.i686.PAE
assuming that the 2.6.42 (3.2) builds don't make any difference either ?
(In reply to comment #28) > assuming that the 2.6.42 (3.2) builds don't make any difference either ? Running the F16 kernel 3.2.7-1 still requires processor.nocst=1 (crash otherwise). Is there something specific in the F15 2.6.42 kernel builds that was intended to fix this (i.e. is it worth getting a F15 2.6.42 kernel to test?)
no.that's pretty much equivalent (as far as acpi is concerned). Len, is there any hope for resolution on this, or shall we just start dmi blacklisting the affected systems ? There doesn't seem to be too many of them at least..
From 730007 we know that whatever caused this issue showed up in 2.6.39-rc1. What we haven't been able to do is find someone with an impacted machine that is willing to do a git bisect to figure out which commit changed things.
(In reply to comment #31) > From 730007 we know that whatever caused this issue showed up in 2.6.39-rc1. > What we haven't been able to do is find someone with an impacted machine that > is willing to do a git bisect to figure out which commit changed things. Ok, I'll bite(*). I had been hoping someone else would do the work (compiling a kernel on that old system can take many hours, so I can usually only get one/two tests per day), but I'll see if I can get any useful results from a git bisect. Gary (*) I think the term is you shamed me into it.... :-)
Matthew pointed me to: http://marc.info/?l=linux-acpi&m=133002974918284&w=2 That seems like a rather plausible fix for this.
My git bisect has completed, and seems to confirm that the commit referenced in comment 33 is the commit that caused the problems. --- $ git bisect good 64b3db22c04586997ab4be46dd5a5b99f8a2d390 is the first bad commit commit 64b3db22c04586997ab4be46dd5a5b99f8a2d390 Author: Bob Moore <robert.moore> Date: Mon Feb 14 15:50:42 2011 +0800 ACPICA: Remove use of unreliable FADT revision field The revision number in the FADT has been found to be completely unreliable and cannot be trusted. Only the table length can be used to infer the actual version. Signed-off-by: Bob Moore <robert.moore> Signed-off-by: Lin Ming <ming.m.lin> Signed-off-by: Len Brown <len.brown> :040000 040000 e40ed2fa28b82990cc8fb147f61841fd6400e711 544b3a6eb35875e502695e35366f23c6e5c80d2c M drivers :040000 040000 165441d52fb3ece49801c33a91f1b5e266d53abf 4b5394b30c2c89d29372d999098ebeba16fbe23d M include
Excellent. Thank you very much Gary. I should have a scratch-build with the patch I referenced in just a bit for people to test out.
This scratch build should have the patch mentioned above: http://koji.fedoraproject.org/koji/taskinfo?taskID=3822374 Testing when it completes would be much appreciated.
(In reply to comment #36) > Testing when it completes would be much appreciated. This new kernel works in my environment without needing the previous workaround of processor.nocst=1 Thanks!
Yup, works for me too. Thanks!
Excellent. Thank you both for testing. I will get this committed to the Fedora branches today and it should be in the next update.
kernel-3.2.8-3.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.2.8-3.fc16
Package kernel-3.2.8-3.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.2.8-3.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-2745/kernel-3.2.8-3.fc16 then log in and leave karma (feedback).
kernel-3.2.9-1.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.2.9-1.fc16
kernel-3.2.9-1.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.
commit 3e80acd1af40fcd91a200b0416a7616b20c5d647 Author: Julian Anastasov <ja> Date: Thu Feb 23 22:40:43 2012 +0200 ACPICA: Fix regression in FADT revision checks shipped in upstream Linux 3.4-rc1