Red Hat Bugzilla – Bug 727865
boot hang w/ 2.6.40 unless processor.nocst=1 - Asus P4V8X-X, Intel D865GBF
Last modified: 2013-01-10 03:21:25 EST
After the update to kernel-2.6.40-4.fc15.i686 the system fails to boot unless I pass acpi=off as boot parameter. This wasn't necessary with the previous kernel ( kernel-126.96.36.199-35.fc15.i686 )
The last displayed messages resemble this:
input: Power Button as /devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input0
ACPI: Power Button [PWRB]
input: Sleep Button as /devices/LNXSYSTM:00/device:00/PNP0C0E:00/input/input1
ACPI: Sleep Button [SLPB]
input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
ACPI: Power Button [PWRF]
The system just hangs (ctrl+alt+del, caps lock, num lock not working, no other messages after those).
It also fails if I use noapic as boot param; it only works if I use acpi=off.
Is this a regression or just a case of crappy motherboard ? The mainboard is Asus P4V8X-X with a Via P4X533 chipset.
I will attach smolt profile, dmesg for 2.6.38-8 and dmesg for 2.6.40-4 with acpi=off.
Created attachment 516508 [details]
Created attachment 516509 [details]
Dmesg when booting 188.8.131.52-35 without acpi=off
Created attachment 516510 [details]
Dmesg when booting 2.6.40-4 with acpi=off
I also have have the same problem.
this is my smolt uuid: pub_9fbe0e20-dabc-4df0-8c82-886fad980710
Update: nothing changed with kernel-184.108.40.206-0.fc15.i686
I'm having the same problem with 220.127.116.11-0.fc15-i686. It only boots with acpi=off:
Attaching the serial log from when I boot without acpi=off.
Created attachment 519205 [details]
kernel output dumped to serial console.
BUG: unable to handle kernel paging request at 00c0b141
IP: [<f4402300>] 0xf44022ff
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in:
Pid: 0, comm: swapper Not tainted 18.104.22.168-0.fc15.i686 #1 /D865GBF
EIP: 0060:[<f4402300>] EFLAGS: 00010246 CPU: 0
EIP is at 0xf4402300
EAX: 00000000 EBX: f4620800 ECX: 00000000 EDX: 00000000
ESI: f4402300 EDI: c0634bf5 EBP: f4620800 ESP: f4493d08
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=f4492000 task=c0a29fe0 task.ti=c09dc000)
f4493d40 c04deb9b f45b4a20 00001001 f45b4a30 f4493d48 c06354b2 f45b4a30
0000001c 000080d0 f45b4a20 f4620800 00000000 f45b4a20 00000000 f4620da8
f4493d6c c0634ccd f4493d58 f442e618 c096fd0d 00000000 f4620800 f4409c00
[<c04deb9b>] ? __kmalloc+0x103/0x110
[<c06354b2>] ? acpi_ns_evaluate+0x3a/0x18d
[<c0634ccd>] ? acpi_evaluate_object+0xd6/0x1c5
[<c064240d>] ? acpi_processor_get_power_info+0x5a/0x53d
[<c07e9a6b>] ? _raw_spin_unlock_irqrestore+0x13/0x15
[<c042a839>] ? task_rq_unlock+0x17/0x19
[<c043877c>] ? set_cpus_allowed_ptr+0xc7/0xd1
[<c0641470>] ? acpi_processor_get_throttling_fadt+0x72/0x7a
[<c06416b0>] ? acpi_processor_get_throttling+0x65/0x6e
[<c0642339>] ? acpi_processor_get_throttling_info+0x4d1/0x500
[<c0634daf>] ? acpi_evaluate_object+0x1b8/0x1c5
[<c07dfeaf>] ? acpi_processor_power_init+0xdc/0x10c
[<c07dfcf7>] ? acpi_processor_add+0x40e/0x4ea
[<c0535793>] ? sysfs_do_create_link+0x120/0x157
[<c0620107>] ? acpi_device_probe+0x41/0xf5
[<c067ff74>] ? driver_probe_device+0x123/0x1ff
[<c04297af>] ? should_resched+0xd/0x27
[<c07e8801>] ? _cond_resched+0xd/0x21
[<c0680098>] ? __driver_attach+0x48/0x64
[<c067f1d6>] ? bus_for_each_dev+0x42/0x6b
[<c067fbd1>] ? driver_attach+0x1f/0x23
[<c0680050>] ? driver_probe_device+0x1ff/0x1ff
[<c067f870>] ? bus_add_driver+0xca/0x210
[<c06804c2>] ? driver_register+0x84/0xe3
[<c0620888>] ? acpi_bus_register_driver+0x3f/0x41
[<c0aafe81>] ? acpi_processor_init+0x65/0xd0
[<c040118a>] ? do_one_initcall+0x8c/0x142
[<c0aafe1c>] ? acpi_pci_slot_init+0x1b/0x1b
[<c0a84827>] ? kernel_init+0xaa/0x136
[<c0a8477d>] ? start_kernel+0x353/0x353
That oops address is somewhere in the ACPI BIOS, i think.
People reporting this bug have either:
Intel(R) Pentium(R) D CPU 2.66GHz
Intel(R) Pentium(R) 4 CPU 3.00GHz
Disassembly of the oopsing code shows that it's not even really valid instructions. So the ACPI code just jumped to some invalid address.
Would be great to know via serial console capture if all the failures
look like Adam's in comment 7/8, or if there are multiple failures here.
If we really are crashing under acpi_processor_get_power_info(),
then something in C-states is broken.
Do any of these (individual) cmdline params allow boot to succeed?
Adam, please attach the output from acpidump.
re: comment #10
actually comment #1 shows this:
CPU Model: Intel(R) Celeron(R) CPU 2.40GHz
CPU Family: 15
CPU Model Num: 4
but that is still a version of the P4.
Apparently these are all 32-bit processors, so we don't
have the option to try the x86_64 kernel.
Can this be reproduced with an upstream kernel.org kernel?
Presumably upstream 2.6.38.stable works, b/c FC15's
3.0.0 (I assume that FC is using 2.6.40 as a synonym for this?)
BTW, unrelated to the cause of this bug report, but present in comment #3
Linux version 2.6.40-4.fc15.i686
WARNING: at arch/x86/kernel/apm_32.c:908 apm_cpu_idle+0x42/0x251()
deprecated apm_cpu_idle will be deleted in 2012
I added that warning to 3.0 to let folks
know that CONFIG_APM_CPU_IDLE=y may not be what you want.
If you think you really do need it, I need to hear from you...
Created attachment 520150 [details]
Output from acpidump
It is my understanding that Fedora's 2.6.40 is some version of 3.0.*. I can try 3.0.3 over the weekend, assuming NJ isn't completely washed away.
I will try those other kernel parameters as well when I get a chance.
Each one let the machine boot. It still crashed with:
processor.nocst=1 worked -- yay, that's a big clue.
Using that param, please show the output from
grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
Also, if you can show that same output for the working
2.6.38 kernel, that would be helpful to compare
what the FADT does vs _CST.
It seems that acpi_processor_get_power_info_cst()
is bombing out, presumably in the evaluation of _CST itself.
That routine has not changed recently, so it must be something
funky in the actual AML/interpreter.
Unfortunately, the version of acpidump you used didn't
grab the dynamic tables where your _CST lives,
or they were not exported.
Can you attach the files from here?
If you don't see anything there w/ the latest kernel,
then go back to working 2.6.38 and they should be present there.
On 2.6.40, /sys/devices/system/cpu/cpu0/ doesn't contain cpuidle on this machine... There is also nothing under /sys/firmware/acpi/tables/dynamic/ on 2.6.40.
I will have to build a 2.6.38 kernel first as the previous version I have installed is a 2.6.35 F14 kernel.
On 22.214.171.124 /sys/firmware/acpi/tables/dynamic/ contains nothing and /sys/devices/system/cpu/cpu0/cpuidle does not exist either.
Has anyone tried a 2.6.39 kernel as Len asked? You might be able to use
to test with. Bug 730007 is showing similar issues and thus far we only see it on particular Pentium 4 models. If someone is willing to git bisect this on an afflicted machine, that would be very helpful.
Matthew Garrett pointed me at a patch for a regression in ACPI yesterday. I've started a scratch build with this patch applied. Could those with an impacted machine please try this kernel when it finishes building and let us know the results?
My apologies, I pasted the wrong link to the scratch build. This is the one that should be tested:
Unfortunately I can't test it anymore, I'm running F16 now, with 3.1.6-1.fc16.i686, and I still have to use processor.nocst=1...
(In reply to comment #22)
> Unfortunately I can't test it anymore, I'm running F16 now, with
> 3.1.6-1.fc16.i686, and I still have to use processor.nocst=1...
The 2.6.41.x kernels are almost identical to the F16 3.1.x kernels. They are both based on the 3.1.x stable series. You should be able to install the kernel from the scratch build without issue.
(In reply to comment #21)
> My apologies, I pasted the wrong link to the scratch build. This is the one
> that should be tested:
While I too am now running F16, installing the kernel
and trying a boot indicates that there is still a panic
with the backtrace indicating acpi issues, so I do not
think this is (yet) identified/fixed.
Same here, nothing changed.
One more test case on D865PERL got same result while boot.
(DMI: /D865PERL , BIOS RL86510A.86A.0061.P09.0308281850 08/28/2003)
P.S: P09, P15, P21 BIOS has same result.
After enable ACPI debug and enable early stage serial console then I got bad CST.
Hers is the combination can boot:
enable HT in BIOS, acpi=off (no HT, no ACPI)
disable HT, no kernel command (no HT, have ACPI)
enable HT, processor.nocst=1 (have HT, have ACPI)
This is same family with Bug#730007 I have:
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 CPU 2.60GHz
stepping : 9
cpu MHz : 2600.000
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
apicid : 1
initial apicid : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr
bogomips : 5185.92
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 32 bits virtual
(In reply to comment #26)
assuming that the 2.6.42 (3.2) builds don't make any difference either ?
(In reply to comment #28)
> assuming that the 2.6.42 (3.2) builds don't make any difference either ?
Running the F16 kernel 3.2.7-1 still requires processor.nocst=1
Is there something specific in the F15 2.6.42 kernel builds
that was intended to fix this (i.e. is it worth getting a
F15 2.6.42 kernel to test?)
no.that's pretty much equivalent (as far as acpi is concerned).
Len, is there any hope for resolution on this, or shall we just start dmi blacklisting the affected systems ? There doesn't seem to be too many of them at least..
From 730007 we know that whatever caused this issue showed up in 2.6.39-rc1. What we haven't been able to do is find someone with an impacted machine that is willing to do a git bisect to figure out which commit changed things.
(In reply to comment #31)
> From 730007 we know that whatever caused this issue showed up in 2.6.39-rc1.
> What we haven't been able to do is find someone with an impacted machine that
> is willing to do a git bisect to figure out which commit changed things.
Ok, I'll bite(*). I had been hoping someone else would do the
work (compiling a kernel on that old system can take many hours,
so I can usually only get one/two tests per day), but I'll see
if I can get any useful results from a git bisect.
(*) I think the term is you shamed me into it.... :-)
Matthew pointed me to:
That seems like a rather plausible fix for this.
My git bisect has completed, and seems to confirm that
the commit referenced in comment 33 is the commit that
caused the problems.
$ git bisect good
64b3db22c04586997ab4be46dd5a5b99f8a2d390 is the first bad commit
Author: Bob Moore <email@example.com>
Date: Mon Feb 14 15:50:42 2011 +0800
ACPICA: Remove use of unreliable FADT revision field
The revision number in the FADT has been found to be completely
unreliable and cannot be trusted. Only the table length can be
used to infer the actual version.
Signed-off-by: Bob Moore <firstname.lastname@example.org>
Signed-off-by: Lin Ming <email@example.com>
Signed-off-by: Len Brown <firstname.lastname@example.org>
:040000 040000 e40ed2fa28b82990cc8fb147f61841fd6400e711 544b3a6eb35875e502695e35366f23c6e5c80d2c M drivers
:040000 040000 165441d52fb3ece49801c33a91f1b5e266d53abf 4b5394b30c2c89d29372d999098ebeba16fbe23d M include
Excellent. Thank you very much Gary. I should have a scratch-build with the patch I referenced in just a bit for people to test out.
This scratch build should have the patch mentioned above:
Testing when it completes would be much appreciated.
(In reply to comment #36)
> Testing when it completes would be much appreciated.
This new kernel works in my environment without
needing the previous workaround of processor.nocst=1
Yup, works for me too.
Excellent. Thank you both for testing. I will get this committed to the Fedora branches today and it should be in the next update.
kernel-3.2.8-3.fc16 has been submitted as an update for Fedora 16.
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.2.8-3.fc16'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).
kernel-3.2.9-1.fc16 has been submitted as an update for Fedora 16.
kernel-3.2.9-1.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.
Author: Julian Anastasov <email@example.com>
Date: Thu Feb 23 22:40:43 2012 +0200
ACPICA: Fix regression in FADT revision checks
shipped in upstream Linux 3.4-rc1