Bug 241700
Summary: | Regression: kernel panic on resume | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Roland Wolters <roland.wolters> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED NEXTRELEASE | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | rawhide | CC: | alex, bloch, hez | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-07-27 10:53:19 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Roland Wolters
2007-05-29 17:04:59 UTC
There have been some changes in the way suspend/resume is handled in Fedora 7. Please check out the Sleep Quirk Debugger pages, available here: http://people.freedesktop.org/~hughsient/quirk/index.html The suggestions there should help you fix resume on your laptop. As I've mentioned, I've already tried to use pm-suspend. It simply didn't work. Anyway, the debug page mentioned I should try to check dmesg for this: # cat dmesg.txt |grep "hash matches" hash matches drivers/base/power/resume.c:28 hash matches device 0000:00:1d.0 Also, dmesg featured that kernel-dump(?): BUG: warning at kernel/softirq.c:138/local_bh_enable() (Not tainted) [<c042b0cf>] local_bh_enable+0x45/0x92 [<c06002bd>] cond_resched_softirq+0x2c/0x42 [<c059adf3>] release_sock+0x4f/0x9d [<c05c7755>] tcp_recvmsg+0x8d2/0x9de [<c059a7a9>] sock_common_recvmsg+0x3e/0x54 [<c0599095>] sock_recvmsg+0xec/0x107 [<c04058ff>] common_interrupt+0x23/0x28 [<c0436e71>] autoremove_wake_function+0x0/0x35 [<c0466b1a>] find_extend_vma+0x12/0x49 [<c043dd81>] get_futex_key+0x3a/0xe3 [<c0599ef8>] sys_recvfrom+0xd7/0x12b [<c043ee20>] do_futex+0x205/0xb5e [<c04754ee>] do_sync_write+0xc7/0x10a [<c043d245>] tick_sched_timer+0x0/0xbb [<c0436e71>] autoremove_wake_function+0x0/0x35 [<c0599f83>] sys_recv+0x37/0x3b [<c059a458>] sys_socketcall+0x19c/0x261 [<c0404f70>] syscall_call+0x7/0xb Forgot to mention the device: # lspci|grep 00:1d.0 00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 03) I've reported a similar bug - https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=235240 What are the specs of your system, if I may ask? I haven't had any luck with the methods listed on the suspend quirks while attempting to debug my similar problem. What kind of output/specs do you need? Laptop vs Desktop, CPU, wireless device (if any), video card + driver. A make and model number would be interesting (my system is a Sony Vaio VGN-s260 laptop). Maybe others - this is just for me to compare and see if there's anything that jumps out as an obvious similarity between the systems to see if our problems come from the same root. Type: Laptop CPU: Intel(R) Pentium(R) M processor 1.80GHz RAM: 1GB WLAN device: ipw2200 video card: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10] driver: "radeon", module version = 4.2.0 (standard X.Org) The system is a "Cebop Top 900". Cebop itself is just a company which took Fujitsu/Siemens machines and put a better graphics card inside - even the case is exactly the same as some Fujitsu/Siemens machines of that time. Cebop is out of service for quite some months now so you wont get any information from them. For more information, check this link: http://smolt.fedoraproject.org/show?UUID=41018670-1633-4b95-a4b1-95c4fc89b725 I don't know if this makes any difference in figuring out the problem, but under Debian Etch and Ubuntu Dapper my laptop would freeze and fail to resume (though not kernel panic) often when I would change something related to USB close to the time I would suspend the machine or after suspending. Plugging in or unplugging a USB mouse would almost always cause this problem. I only mention this because of your dmesg output (apparently) pointing a finger at the USB controller as a potential culprit. I have a similar system: Type: Laptop CPU: Intel(R) Pentium(R) M processor 1.70GHz RAM: 1GB WLAN device: ipw2200 video card: ATI Technologies Inc RV250 [Mobility Radeon 9200 M9] driver: "radeon", module version = 4.2.0 (standard X.Org) And again, suspend to ram/disk works without issue on FC6 and CentOS/RHEL/etc 5 for this laptop. Here is a summary of my config, suspend/resume worked fine with FC6 and have the problem with Fedora 7: Sony Vaio TX3XP, on resume both caps-lock and scroll-lock lights blink and the only way to stop the laptop is removing the battery :-( CPU0: Intel Genuine Intel(R) CPU U1400 @ 1.20GHz stepping 08 WLAN: ipw3945 VID: Mobile 945GM Xorg driver: i810 Got this in dmesg: BUG: warning at kernel/softirq.c:138/local_bh_enable() (Not tainted) [<c042b0cf>] local_bh_enable+0x45/0x92 [<c06002bd>] cond_resched_softirq+0x2c/0x42 [<c059adf3>] release_sock+0x4f/0x9d [<c05c670d>] tcp_sendmsg+0x90b/0x9f9 [<c05dec95>] inet_sendmsg+0x3b/0x45 [<c0598731>] sock_aio_write+0xf6/0x102 [<c04754ee>] do_sync_write+0xc7/0x10a [<c0436e71>] autoremove_wake_function+0x0/0x35 [<c0475d47>] vfs_write+0xbc/0x154 [<c0476342>] sys_write+0x41/0x67 [<c0404f70>] syscall_call+0x7/0xb [<c0600000>] __sched_text_start+0x6e8/0x89e ======================= I will try the various solution mentioned here already and post again should one work for me. I was following this bug as I have a similar issue, where suspend to RAM worked ok on my Ferrari 4005 in FC6, but panics on resume in Fedora 7. I too tried the pm_trace trick, although had to apply a patch for it to work on x86_64, but while I got "hash matches drivers/base/power/resume.c:61", I didn't get anything else suggesting which driver might be causing the panic. I've since tried suspending after removing a whole bunch of unused modules first, "lsmod | cut -c-20 | xargs rmmod" and suspend to RAM has now worked a couple of times now. This was after seeing a separate bug report here suggesting that there's some issue with a firewire driver. I'll try to narrow down which module is causing the problems for me, and thought you might have success with a similar approach. Created attachment 156894 [details]
Output of lsmod from a Sony Vaio VGN-S260 - *'d modules removed before suspending
Alex - thanks for the inspiration!
I was just able to suspend to RAM and resume without issue from the F7 Gnome
LiveCD by removing several modules. I just went through the list and removed
modules which looked like they might not be as important (though this was
largely guess-work on my part).
I have attached the output of lsmod on my laptop as it was immediately after
booting the LiveCD - the modules with an * before their name were removed
before attempting resume.
I, too, will try to narrow down the list and find out which specific module(s)
is (are) causing the problem.
Thanks for that list, resume worked for me as well after I removed all these modules. Good work! I will try to find my problematic module as well. For me it's the "fw_ohci" module which causes the kernel panic on resume. This module is the "Firewire Open Host Controller Interface", am I right? Can someone confirm that the problem is caused by this module? And if yes, should I rename the bug or should it be reassigned? Same for me, the offending modules are the firewire ones: fw_ohci and fw_core. I configured pm-suspend so that it does the Right Thing by way of: echo 'SUSPEND_MODULES="fw_ohci fw_core"' > /etc/pm/config.d/unload_modules Bug 237634 mentions potentially the same issue, although I've not tried the patch, and suggests that the firewire resume panic bug will be fixed in 2.6.22. If unloading these modules works for everyone, then we can investigate whether the patch fixes the problem, close off this issue and mark it a duplicate of 237634. I'm unable at the moment to compile a new kernel rpm for me to test the patch, can anyone else do this? The SUSPEND_MODULES line fixes the problem for me as well. I have not tried applying the kernel patch to see if that helps. A brief follow-up: Suspend to RAM works with simply adding the firewire modules to the unload_modules file. However, it was not working 100% of the time - the system would occasionally fail to resume from time to time. I have added the modules to the module blacklist, and suspend to RAM seems to be more stable now. I have yet to have a failure to resume over the last 3 days using the blacklist method, and it only took a few suspend/resume cycles for the unload_modules method to fail. This may not apply to others and I imagine it could be another issue, but I wanted to report it here. It seems that I spoke too soon. I recently had a failure to resume from a suspend to RAM. I have no idea how to debug this as I haven't found anything that consistently leads to a failed resume once the firewire modules are blacklisted. Hezekiah, have you tried the pm_trace trick mentioned in http://people.freedesktop.org/~hughsient/quirk/quirk-suspend-advanced.html ? I'm not sure of the overhead of leaving pm_trace on all the time, so you could always add a hook to the suspend scripts to run it there. Another way to debug a kernel panic on resume is with kdump/kexec, which I had working a while ago. Essentially, you install the kdump rpms, append some magic to the kernel parameters in grub.conf (crashkernel=blahdeblah) and then when you get a panic, a supervisor kernel is booted with scripts to dump the state of the machine somewhere in /var/crash for later debugging. At the very least, you can then get a stack trace of the panic to see what might have caused the problem. The newest kernel fixed the problem for me, therefore I close the bug. Please re-open this bug if there are still other issues with other modules, or fill a new bug report. |