Bug 470551
Summary: | powernow-k8 causing SIGSEGV during the boot | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Martin Klapetek <martin.klapetek> | ||||||||||
Component: | kernel | Assignee: | Bhavna Sarathy <bnagendr> | ||||||||||
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 10 | CC: | jfeeney, kernel-maint, peterm, rhbugzilla | ||||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2009-12-18 06:46:59 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Martin Klapetek
2008-11-07 16:43:03 UTC
the next time it happens, can you capture the output of dmesg afterwards, and attach that please? Okay, I have the output of dmesg, but there's nothing about that SIGSEGV. I captured it after reboot though, cause when it happens, the system hangs, so I cannot capture it right after. But the interesting thing is, that there's nothing even in /var/log/messages (also attached), there's no single line about that boot. Is there any other log file in which can this be logged? The /var/log/messages contains info from 00:34 and then the next entries are from 10:02, but the boot when it crashed was before 10:00, so you can see there's no line about it. Any suggestions how to capture that output or where to look? Created attachment 322995 [details]
Captured dmesg output after reboot after the crash
Created attachment 322996 [details]
Captured /var/log/messages output after reboot after the crash
ah, I was hoping it would survive long enough to capture a backtrace. The post-reboot ones aren't really helpful. Though it does show the driver loads and inits successfully. It's odd because that driver hasn't changed in a long time. Can you try this.. modprobe cpufreq_ondemand echo ondemand > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor and see if it locks up in the same way? Nope, it does not. I tried echoing it to the both processors and the laptop continues to run with no problem. I also found out, that cpufreq_ondemand is loaded by default and also in /sys/.../scaling_governor is 'ondemand' as default. I'm posting my lsmod here, maybe it can help... Created attachment 323023 [details]
lsmod output
I'm thinking....isn't there any switch to boot the kernel with, which would cause it to log immediately? Maybe I'll try without quiet and then move up the screen if that will be possible. And what about bootstrap? Does it write the output immediately or it keeps it in the memory and dumps it later? Could the bootstrap help us here? Oops sorry, I meant Bootchart, not bootstrap :) Actually the bootchart did help. I booted the kernel with it and it showed the whole bug backtrace. I'm including a photo of the backtrace screen. If something's not readable, I have more detailed and clear photos, so just ask for them :) Created attachment 323182 [details]
Backtrace screen photo
ah, excellent. thanks. Any update on this? I'm running 2.6.27.5-109.fc10.x86_64 and that SIGSEGV is still present. It's getting pretty annoying that about every second/third boot I need to turn off the laptop and then turn it back on (it's even more annoying as laptops do not have a reset button :) No fix yet. We've actually seen reports of this dating back some time, even as far back as RHEL5 (2.6.18). This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle. Changing version to '10'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping *** Bug 462648 has been marked as a duplicate of this bug. *** I wish I could find a machine to reproduce this on. In the meantime, something to try.. there's a kernel boot parameter called printk.bootdelay=1000 setting that will cause a delay of 1 second to occur every time the kernel prints a message. So your boot will go _really slowly_. The plus side of this however is that we can see messages before they scroll off the screen. I'm curious what cpufreq is doing just before that oops happens, so boot with cpufreq.debug=7 printk.bootdelay=1000 (you can increase the 1000 for longer delays, 2000=2s, 3000=3s etc up to 10s). If you could capture the screen before that first [cut here] message appears, that might yield some more clues. Thanks. Oh, and I'm pretty sure it's unrelated but both this report and the one in 462648 are tainted with binary modules. It's unlikely that two different modules are causing the same problem here, but just to rule it out if you could try and reproduce the trace without it loaded, that would be one less thing to worry about. By binary modules you mean proprietary drivers like amd's fglrx? Well the only "non-standard" modules that I'm using are broadcom's driver wl, amd's fglrx (the first report was without fglrx though) and a custom built wacom (touchscreen) driver. Wacom and fglrx have no impact on that as the original report was without these, then only the wl has left. It actually may be related, because I noticed, that it usually happens right before the wifi led should light up blue (I mean if it lights up normally let's say at 15th second, the crash happens at 14th second), but this may be just a pure accident. I'll try the thing you suggested and also I'll try to investigate the wl further and I'll post back as soon as I'll found something useful. Thanks. Also, I typoed the above. Looks like it should be printk.boot_delay=1000 I missed the _ character. Some are having problems capturing the backtrace. By moving cpuspeed to be almost last in the startup sequence the error can often be captured and is apparently sent automatically to kernel.org via kerneloops i.e service cpuspeed stop chkconfig cpuspeed off change the chkconfig line of /etc/init.d/cpuspeed to have 99... # chkconfig: 12345 99 99 then re enable it.. chkconfig cpuspeed on service cpuspeed start I do not know of a simple way to update the rpm toward this end but if someone does then the kerneloops data might make this easy to track and measure. There is no reason I know to have this chkconfig'ed at 06...and it might gather more info from 'untainted' kernels if it was started much later. To reproduce I was previously able to trigger it with with a loop "service cpuspeed stop; sleep 1; service cpuspeed start" and capture the oops back trace. I still see this. Currently I have fc9 kernel: 2.6.27.5-41.fc9.x86_64 Is it possible test a kernel patch on the affected hardware? If so I can spin a test patch. Thanks. I'm able and willing to test such kernel, let me know about that spin. I just want to let you know, that with the pre-last kernel updated (I think it is 2.6.27.7-137 or something like that, not the newest update though) this crash appears much less often then with the kernel I reported with. If then it was every 3rd boot, now it is like every 15th boot. Also I've got suspend to ram working, so I'm mostly just suspending, but even though sometimes it won't wake up and everything including keyboard (caps lock LED) is dead. Could it be that it loads the module again after waking and therefore it hangs the kernel? I spun a FC9 kernel with debugging turned on in powernow-k8. Give this a try and attach all debug output. http://people.redhat.com/bmaly/kernel-2.6.28.2-6.fc9.x86_64.rpm BTW, from the console output in Comment #11, it looks like the policy (target CPU freq) the governor is attempting to set is null. Lets see if there is any obvious failure that gets logged before we start disecting code. (In reply to comment #24) > I spun a FC9 kernel with debugging turned on in powernow-k8. Give this a try > and attach all debug output. > > http://people.redhat.com/bmaly/kernel-2.6.28.2-6.fc9.x86_64.rpm > > F9 development for the 2.6.27 kernel is continuing in a branch: private-fedora-9-2_6_27-branch We may or may not move F9 to 2.6.29 on the trunk later, but for now it's dead. I treid to install but it left me with this: sudo rpm -Uvf kernel-2.6.28.2-6.fc9.x86_64.rpm error: Failed dependencies kernel-firmware >= 2.6.28.2-6.fc9 is needed for kernel-2.6.28.2-6.fc9.x86_64 kernel-uname-r = 2.6.27.7-134.fc10.x86_64 is needed for (installed) kmod-wl-2.6.27.7-134.fc10.x86_64-5.10.27.6-5.fc10.7.x86_64 kernel-uname-r = 2.6.27.9-159.fc10.x86_64 is needed for (installed) kmod-wl-2.6.27.9-159.fc10.x86_64-5.10.27.12-1.fc10.x86_64 kernel-uname-r = 2.6.27.12-170.2.5.fc10.x86_64 is needed for (installed) kmod-wl-2.6.27.12-170.2.5.fc10.x86_64-5.10.27.12-1.fc10.1.x86_64 The strings may be slightly differ as I translated them into english from my native language. I'm not sure what to do, would removing the kmod-wl be just sufficient? I'm asking first, because I don't want to break my working distro. I also have to note, that I haven't seen this bug occurrence for quite some time now, using 2.6.27.12-170.2.5.fc10.x86_64 This message is a reminder that Fedora 10 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 10. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '10'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 10's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 10 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |