Bug 241700

Summary: Regression: kernel panic on resume
Product: [Fedora] Fedora Reporter: Roland Wolters <roland.wolters>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NEXTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: alex, bloch, hez
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-07-27 10:53:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output of lsmod from a Sony Vaio VGN-S260 - *'d modules removed before suspending none

Description Roland Wolters 2007-05-29 17:04:59 UTC
Description of problem:

Whenever I try to suspend to RAM/resume, the resume does not work: I get a 
kernel panic (num lock and caps lock flashing, reset only possible when I take 
off the battery and unplug the power connector).

Unfortunately, there is no kernel dump in dmesg or messages, and I have no 
idea where to look else.

I also tried to suspend using pm-suspend with some flags for Fujitsu Amilo 
computers (my computer is a similar build but from a re-branding company), but 
I got the same result.

There are no other additional modules loaded!

Before the update from Fedora 6 to Fedora 7 everything worked! Therefore the 
word "regression" in the title..

Version-Release number of selected component (if applicable):
kernel: 2.6.21-1.3194.fc7

Comment 1 Will Woods 2007-05-29 17:58:52 UTC
There have been some changes in the way suspend/resume is handled in Fedora 7.
Please check out the Sleep Quirk Debugger pages, available here:

http://people.freedesktop.org/~hughsient/quirk/index.html

The suggestions there should help you fix resume on your laptop.

Comment 2 Roland Wolters 2007-05-29 19:41:06 UTC
As I've mentioned, I've already tried to use pm-suspend. It simply didn't 
work.
Anyway, the debug page mentioned I should try to check dmesg for this:
# cat dmesg.txt |grep "hash matches"
  hash matches drivers/base/power/resume.c:28
  hash matches device 0000:00:1d.0

Also, dmesg featured that kernel-dump(?):
BUG: warning at kernel/softirq.c:138/local_bh_enable() (Not tainted)
 [<c042b0cf>] local_bh_enable+0x45/0x92
 [<c06002bd>] cond_resched_softirq+0x2c/0x42
 [<c059adf3>] release_sock+0x4f/0x9d
 [<c05c7755>] tcp_recvmsg+0x8d2/0x9de
 [<c059a7a9>] sock_common_recvmsg+0x3e/0x54
 [<c0599095>] sock_recvmsg+0xec/0x107
 [<c04058ff>] common_interrupt+0x23/0x28
 [<c0436e71>] autoremove_wake_function+0x0/0x35
 [<c0466b1a>] find_extend_vma+0x12/0x49
 [<c043dd81>] get_futex_key+0x3a/0xe3
 [<c0599ef8>] sys_recvfrom+0xd7/0x12b
 [<c043ee20>] do_futex+0x205/0xb5e
 [<c04754ee>] do_sync_write+0xc7/0x10a
 [<c043d245>] tick_sched_timer+0x0/0xbb
 [<c0436e71>] autoremove_wake_function+0x0/0x35
 [<c0599f83>] sys_recv+0x37/0x3b
 [<c059a458>] sys_socketcall+0x19c/0x261
 [<c0404f70>] syscall_call+0x7/0xb

Comment 3 Roland Wolters 2007-05-29 19:44:22 UTC
Forgot to mention the device:
# lspci|grep 00:1d.0
00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) 
USB UHCI Controller #1 (rev 03)

Comment 4 Hezekiah M. Carty 2007-05-31 17:27:31 UTC
I've reported a similar bug -
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=235240

What are the specs of your system, if I may ask?  I haven't had any luck with
the methods listed on the suspend quirks while attempting to debug my similar
problem.

Comment 5 Roland Wolters 2007-05-31 17:59:56 UTC
What kind of output/specs do you need?

Comment 6 Hezekiah M. Carty 2007-05-31 18:12:28 UTC
Laptop vs Desktop, CPU, wireless device (if any), video card + driver.  A make
and model number would be interesting (my system is a Sony Vaio VGN-s260
laptop). Maybe others - this is just for me to compare and see if there's
anything that jumps out as an obvious similarity between the systems to see if
our problems come from the same root.

Comment 7 Roland Wolters 2007-05-31 18:45:38 UTC
Type: Laptop
CPU: Intel(R) Pentium(R) M processor 1.80GHz
RAM: 1GB
WLAN device: ipw2200
video card: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10]
driver: "radeon", module version = 4.2.0 (standard X.Org)

The system is a "Cebop Top 900". Cebop itself is just a company which took 
Fujitsu/Siemens machines and put a better graphics card inside - even the case 
is exactly the same as some Fujitsu/Siemens machines of that time.
Cebop is out of service for quite some months now so you wont get any 
information from them.

For more information, check this link:
http://smolt.fedoraproject.org/show?UUID=41018670-1633-4b95-a4b1-95c4fc89b725

Comment 8 Hezekiah M. Carty 2007-05-31 19:26:06 UTC
I don't know if this makes any difference in figuring out the problem, but under
Debian Etch and Ubuntu Dapper my laptop would freeze and fail to resume (though
not kernel panic) often when I would change something related to USB close to
the time I would suspend the machine or after suspending.  Plugging in or
unplugging a USB mouse would almost always cause this problem.

I only mention this because of your dmesg output (apparently) pointing a finger
at the USB controller as a potential culprit.

I have a similar system:
Type: Laptop
CPU: Intel(R) Pentium(R) M processor 1.70GHz
RAM: 1GB
WLAN device: ipw2200
video card: ATI Technologies Inc RV250 [Mobility Radeon 9200 M9]
driver: "radeon", module version = 4.2.0 (standard X.Org)

And again, suspend to ram/disk works without issue on FC6 and CentOS/RHEL/etc 5
for this laptop.

Comment 9 Patrice Lazareff 2007-06-05 17:52:37 UTC
Here is a summary of my config, suspend/resume worked fine with FC6 and have 
the problem with Fedora 7:

Sony Vaio TX3XP, on resume both caps-lock and scroll-lock lights blink and the 
only way to stop the laptop is removing the battery :-(

CPU0: Intel Genuine Intel(R) CPU U1400  @ 1.20GHz stepping 08
WLAN: ipw3945
VID: Mobile 945GM
Xorg driver: i810

Got this in dmesg:
BUG: warning at kernel/softirq.c:138/local_bh_enable() (Not tainted)
 [<c042b0cf>] local_bh_enable+0x45/0x92
 [<c06002bd>] cond_resched_softirq+0x2c/0x42
 [<c059adf3>] release_sock+0x4f/0x9d
 [<c05c670d>] tcp_sendmsg+0x90b/0x9f9
 [<c05dec95>] inet_sendmsg+0x3b/0x45
 [<c0598731>] sock_aio_write+0xf6/0x102
 [<c04754ee>] do_sync_write+0xc7/0x10a
 [<c0436e71>] autoremove_wake_function+0x0/0x35
 [<c0475d47>] vfs_write+0xbc/0x154
 [<c0476342>] sys_write+0x41/0x67
 [<c0404f70>] syscall_call+0x7/0xb
 [<c0600000>] __sched_text_start+0x6e8/0x89e
 =======================

I will try the various solution mentioned here already and post again should 
one work for me.

Comment 10 Alex Tucker 2007-06-12 09:11:28 UTC
I was following this bug as I have a similar issue, where suspend to RAM worked
ok on my Ferrari 4005 in FC6, but panics on resume in Fedora 7.  I too tried the
pm_trace trick, although had to apply a patch for it to work on x86_64, but
while I got "hash matches drivers/base/power/resume.c:61", I didn't get anything
else suggesting which driver might be causing the panic.

I've since tried suspending after removing a whole bunch of unused modules
first, "lsmod | cut -c-20 | xargs rmmod" and suspend to RAM has now worked a
couple of times now.

This was after seeing a separate bug report here suggesting that there's some
issue with a firewire driver.  I'll try to narrow down which module is causing
the problems for me, and thought you might have success with a similar approach.

Comment 11 Hezekiah M. Carty 2007-06-13 17:37:23 UTC
Created attachment 156894 [details]
Output of lsmod from a Sony Vaio VGN-S260 - *'d modules removed before suspending

Alex - thanks for the inspiration!

I was just able to suspend to RAM and resume without issue from the F7 Gnome
LiveCD by removing several modules.  I just went through the list and removed
modules which looked like they might not be as important (though this was
largely guess-work on my part).

I have attached the output of lsmod on my laptop as it was immediately after
booting the LiveCD - the modules with an * before their name were removed
before attempting resume.

I, too, will try to narrow down the list and find out which specific module(s)
is (are) causing the problem.

Comment 12 Roland Wolters 2007-06-13 18:22:55 UTC
Thanks for that list, resume worked for me as well after I removed all these 
modules. Good work!
I will try to find my problematic module as well.

Comment 13 Roland Wolters 2007-06-13 18:36:48 UTC
For me it's the "fw_ohci" module which causes the kernel panic on resume.
This module is the "Firewire Open Host Controller Interface", am I right?

Can someone confirm that the problem is caused by this module? And if yes, 
should I rename the bug or should it be reassigned?

Comment 14 Alex Tucker 2007-06-13 19:16:10 UTC
Same for me, the offending modules are the firewire ones: fw_ohci and fw_core. 
I configured pm-suspend so that it does the Right Thing by way of:

echo 'SUSPEND_MODULES="fw_ohci fw_core"' > /etc/pm/config.d/unload_modules

Bug 237634 mentions potentially the same issue, although I've not tried the
patch, and suggests that the firewire resume panic bug will be fixed in 2.6.22.

If unloading these modules works for everyone, then we can investigate whether
the patch fixes the problem, close off this issue and mark it a duplicate of 237634.

Comment 15 Roland Wolters 2007-06-13 21:00:30 UTC
I'm unable at the moment to compile a new kernel rpm for me to test the patch, 
can anyone else do this?

Comment 16 Hezekiah M. Carty 2007-06-13 21:04:02 UTC
The SUSPEND_MODULES line fixes the problem for me as well.  I have not tried
applying the kernel patch to see if that helps.

Comment 17 Hezekiah M. Carty 2007-07-02 01:36:56 UTC
A brief follow-up:

Suspend to RAM works with simply adding the firewire modules to the
unload_modules file.  However, it was not working 100% of the time - the system
would occasionally fail to resume from time to time.

I have added the modules to the module blacklist, and suspend to RAM seems to be
more stable now.  I have yet to have a failure to resume over the last 3 days
using the blacklist method, and it only took a few suspend/resume cycles for the
unload_modules method to fail.

This may not apply to others and I imagine it could be another issue, but I
wanted to report it here.

Comment 18 Hezekiah M. Carty 2007-07-05 20:15:10 UTC
It seems that I spoke too soon.  I recently had a failure to resume from a
suspend to RAM.  I have no idea how to debug this as I haven't found anything
that consistently leads to a failed resume once the firewire modules are
blacklisted.

Comment 19 Alex Tucker 2007-07-05 21:06:00 UTC
Hezekiah, have you tried the pm_trace trick mentioned in
http://people.freedesktop.org/~hughsient/quirk/quirk-suspend-advanced.html ?

I'm not sure of the overhead of leaving pm_trace on all the time, so you could
always add a hook to the suspend scripts to run it there.

Another way to debug a kernel panic on resume is with kdump/kexec, which I had
working a while ago.  Essentially, you install the kdump rpms, append some magic
to the kernel parameters in grub.conf (crashkernel=blahdeblah) and then when you
get a panic, a supervisor kernel is booted with scripts to dump the state of the
machine somewhere in /var/crash for later debugging.  At the very least, you can
then get a stack trace of the panic to see what might have caused the problem.

Comment 20 Roland Wolters 2007-07-27 10:53:19 UTC
The newest kernel fixed the problem for me, therefore I close the bug. Please 
re-open this bug if there are still other issues with other modules, or fill a 
new bug report.