Red Hat Bugzilla – Bug 470551
powernow-k8 causing SIGSEGV during the boot
Last modified: 2013-01-10 02:58:05 EST
Description of problem:
Sometimes (1 of 3 restarts) the boot process stops with message (translated from czech l10n):
Beginning non-interactive setup
/etc/rc5.d/S06cpuspeed: line 112: 1838 Unauthorized memory access (SIGSEGV) /sbin/modprobe powernow-k8 2>/dev/null
Version-Release number of selected component (if applicable):
I'm using kernel 126.96.36.199-79.fc10.x86_64
I found out that it almost certainly cause the SIGSEGV if I shut it off manually by power button and then power it on after a while. But it also happens after normal system reboot sometimes.
Steps to Reproduce:
1. Shut down in fully loaded system (even kdm)
2. Power on after a while
3. Here goes the SIGSEGV
The boot stops
Not happen at all
I'm running Fedora 10 Preview on AMD Turion X2 64bit 2GHz (puma platform - RM-70)
the next time it happens, can you capture the output of dmesg afterwards, and attach that please?
Okay, I have the output of dmesg, but there's nothing about that SIGSEGV. I captured it after reboot though, cause when it happens, the system hangs, so I cannot capture it right after. But the interesting thing is, that there's nothing even in /var/log/messages (also attached), there's no single line about that boot. Is there any other log file in which can this be logged? The /var/log/messages contains info from 00:34 and then the next entries are from 10:02, but the boot when it crashed was before 10:00, so you can see there's no line about it.
Any suggestions how to capture that output or where to look?
Created attachment 322995 [details]
Captured dmesg output after reboot after the crash
Created attachment 322996 [details]
Captured /var/log/messages output after reboot after the crash
ah, I was hoping it would survive long enough to capture a backtrace. The post-reboot ones aren't really helpful. Though it does show the driver loads and inits successfully.
It's odd because that driver hasn't changed in a long time.
Can you try this..
echo ondemand > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
and see if it locks up in the same way?
Nope, it does not. I tried echoing it to the both processors and the laptop continues to run with no problem. I also found out, that cpufreq_ondemand is loaded by default and also in /sys/.../scaling_governor is 'ondemand' as default. I'm posting my lsmod here, maybe it can help...
Created attachment 323023 [details]
I'm thinking....isn't there any switch to boot the kernel with, which would cause it to log immediately? Maybe I'll try without quiet and then move up the screen if that will be possible. And what about bootstrap? Does it write the output immediately or it keeps it in the memory and dumps it later? Could the bootstrap help us here?
Oops sorry, I meant Bootchart, not bootstrap :)
Actually the bootchart did help. I booted the kernel with it and it showed the whole bug backtrace. I'm including a photo of the backtrace screen. If something's not readable, I have more detailed and clear photos, so just ask for them :)
Created attachment 323182 [details]
Backtrace screen photo
ah, excellent. thanks.
Any update on this? I'm running 188.8.131.52-109.fc10.x86_64 and that SIGSEGV is still present. It's getting pretty annoying that about every second/third boot I need to turn off the laptop and then turn it back on (it's even more annoying as laptops do not have a reset button :)
No fix yet. We've actually seen reports of this dating back some time, even as far back as RHEL5 (2.6.18).
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.
More information and reason for this action is here:
*** Bug 462648 has been marked as a duplicate of this bug. ***
I wish I could find a machine to reproduce this on. In the meantime, something to try..
there's a kernel boot parameter called printk.bootdelay=1000
setting that will cause a delay of 1 second to occur every time the kernel prints a message. So your boot will go _really slowly_.
The plus side of this however is that we can see messages before they scroll off the screen. I'm curious what cpufreq is doing just before that oops happens, so boot with cpufreq.debug=7 printk.bootdelay=1000
(you can increase the 1000 for longer delays, 2000=2s, 3000=3s etc up to 10s).
If you could capture the screen before that first [cut here] message appears, that might yield some more clues.
Oh, and I'm pretty sure it's unrelated but both this report and the one in 462648 are tainted with binary modules. It's unlikely that two different modules are causing the same problem here, but just to rule it out if you could try and reproduce the trace without it loaded, that would be one less thing to worry about.
By binary modules you mean proprietary drivers like amd's fglrx? Well the only
"non-standard" modules that I'm using are broadcom's driver wl, amd's fglrx
(the first report was without fglrx though) and a custom built wacom
(touchscreen) driver. Wacom and fglrx have no impact on that as the original
report was without these, then only the wl has left. It actually may be
related, because I noticed, that it usually happens right before the wifi led
should light up blue (I mean if it lights up normally let's say at 15th second,
the crash happens at 14th second), but this may be just a pure accident.
I'll try the thing you suggested and also I'll try to investigate the wl
further and I'll post back as soon as I'll found something useful.
Also, I typoed the above. Looks like it should be printk.boot_delay=1000
I missed the _ character.
Some are having problems capturing the backtrace. By
moving cpuspeed to be almost last in the startup sequence
the error can often be captured and is apparently sent automatically
to kernel.org via kerneloops
service cpuspeed stop
chkconfig cpuspeed off
change the chkconfig line of /etc/init.d/cpuspeed to have 99...
# chkconfig: 12345 99 99
then re enable it..
chkconfig cpuspeed on
service cpuspeed start
I do not know of a simple way to update the rpm toward this end
but if someone does then the kerneloops data might make this
easy to track and measure. There is no reason I know to
have this chkconfig'ed at 06...and it might gather more info from
'untainted' kernels if it was started much later.
To reproduce I was previously able to trigger it with with a loop
"service cpuspeed stop; sleep 1; service cpuspeed start" and
capture the oops back trace.
I still see this. Currently I have fc9 kernel:
Is it possible test a kernel patch on the affected hardware? If so I can spin a test patch. Thanks.
I'm able and willing to test such kernel, let me know about that spin.
I just want to let you know, that with the pre-last kernel updated (I think it is 184.108.40.206-137 or something like that, not the newest update though) this crash appears much less often then with the kernel I reported with. If then it was every 3rd boot, now it is like every 15th boot. Also I've got suspend to ram working, so I'm mostly just suspending, but even though sometimes it won't wake up and everything including keyboard (caps lock LED) is dead. Could it be that it loads the module again after waking and therefore it hangs the kernel?
I spun a FC9 kernel with debugging turned on in powernow-k8. Give this a try and attach all debug output.
BTW, from the console output in Comment #11, it looks like the policy (target CPU freq) the governor is attempting to set is null. Lets see if there is any obvious failure that gets logged before we start disecting code.
(In reply to comment #24)
> I spun a FC9 kernel with debugging turned on in powernow-k8. Give this a try
> and attach all debug output.
F9 development for the 2.6.27 kernel is continuing in a branch:
We may or may not move F9 to 2.6.29 on the trunk later, but for now it's dead.
I treid to install but it left me with this:
sudo rpm -Uvf kernel-220.127.116.11-6.fc9.x86_64.rpm
error: Failed dependencies
kernel-firmware >= 18.104.22.168-6.fc9 is needed for kernel-22.214.171.124-6.fc9.x86_64
kernel-uname-r = 126.96.36.199-134.fc10.x86_64 is needed for (installed) kmod-wl-188.8.131.52-134.fc10.x86_64-184.108.40.206-5.fc10.7.x86_64
kernel-uname-r = 220.127.116.11-159.fc10.x86_64 is needed for (installed) kmod-wl-18.104.22.168-159.fc10.x86_64-22.214.171.124-1.fc10.x86_64
kernel-uname-r = 126.96.36.199-170.2.5.fc10.x86_64 is needed for (installed) kmod-wl-188.8.131.52-170.2.5.fc10.x86_64-184.108.40.206-1.fc10.1.x86_64
The strings may be slightly differ as I translated them into english from my native language. I'm not sure what to do, would removing the kmod-wl be just sufficient? I'm asking first, because I don't want to break my working distro.
I also have to note, that I haven't seen this bug occurrence for quite some time now, using 220.127.116.11-170.2.5.fc10.x86_64
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '10'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 10's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 10 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.