Bug 248251

Summary: 2.6.22.1-20 breaks suspend
Product: [Fedora] Fedora Reporter: drago01
Component: kernelAssignee: Jarod Wilson <jarod>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 7CC: chris.brown, fenlason, kjb, krh, stefan-r-rhbz
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.24.2-7.fc8 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-02-25 20:26:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description drago01 2007-07-14 09:08:13 UTC
Description of problem:

After updating from to 2.6.22.1-20 my laptop does no longer suspend.
Well it does suspend just fine but hangs on resume with white lines on the
screen so that I have to power it down.

Suspend to disk also does not work. It just shut down X and does nothing instead
of suspending. But it powers down cleanly when pressing the poweroff button.

Version-Release number of selected component (if applicable):

kernel-2.6.22.1-20.fc7

How reproducible:

Always

Steps to Reproduce:
1. suspend
2. try to resume
---
1. try to hibernate
  
Actual results:

hangs on resume when suspending to ram
hangs on suspending when suspending to disk

Expected results:

should suspend and resume just fine like all 2.6.21 based kernels did.

Additional info:
lspci output:
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT
Express Memory Controller Hub (rev 03)
00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT
Express PCI Express Root Port (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition
Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1
(rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4
(rev 02)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI
Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge
(rev 02)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller
(rev 02)
00:1f.2 SATA controller: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA AHCI
Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation G70 [GeForce Go 7600] (rev a1)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI
Express Gigabit Ethernet controller (rev 01)
03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network
Connection (rev 02)
04:06.0 CardBus bridge: Texas Instruments PCIxx12 Cardbus Controller
04:06.1 FireWire (IEEE 1394): Texas Instruments PCIxx12 OHCI Compliant IEEE 1394
Host Controller
04:06.2 Mass storage controller: Texas Instruments 5-in-1 Multimedia Card Reader
(SD/MMC/MS/MS PRO/xD)
04:06.3 Generic system peripheral [0805]: Texas Instruments PCIxx12 SDA Standard
Compliant SD Host Controller

Laptop is a Zepto 6615WD.

Comment 1 drago01 2007-07-14 09:22:03 UTC
I also tryed pci=nommconf and pci=nomsi and the results where the same.

Comment 2 Christopher Brown 2007-09-19 13:09:31 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Cheers
Chris

Comment 3 drago01 2007-09-19 16:20:10 UTC
nope if something has changed I would have posted it here .. I tested suspend
with all new f7 kernels after this one all had the same issue. even the 2.6.23
based rawhide kernels fail in the same way.
thx for the help offer btw.

Comment 4 Christopher Brown 2007-09-20 08:19:11 UTC
If you can try some of the following it would be helpful (taken from triage page):

# Find out if the system is locked up completely by hitting the caps lock key.

    * If the capslock light doesn't toggle, the system is completely dead. Try
again, but this time before suspending, activate the pm_trace functionality with
echo 1 > /sys/power/pm_trace. This reprograms the real time clock to contain a
few bytes of information which we can use to diagnose which driver failed to
resume. After the hang, reboot, boot up again, and save the output of dmesg.

    * If the capslock light does toggle, then the system did come back up, and
it's possible that we just failed to reinitialise the video.
http://people.freedesktop.org/~hughsient/quirk may contain further useful
information to diagnose this problem. It may also be useful to initiate the
suspend from a tty (ctrl-alt-f1) and run pm-suspend ; dmesg > dmesg.out ; sync
by hand. Upon resuming you'll now have some more debug info to sift through.
Additionally, this way when it resumes, you already have a console logged in
from which you can type commands 'blind'. Trying vbetool post for example may
bring things back to life. 

# Try rmmod'ing various modules before doing the suspend. If this makes things
work again, retry with a smaller set of modules unloaded. Keep retrying until
you narrow down which module is to blame.

# Another trick that sometimes works to force video to come back up is to enable
the BIOS password. This makes the system resume in a VGA text mode that the
kernel recovers from a lot easier. Not a real solution, but it can help to
diagnose other problems.

You may also be interested in the comments found in the following thread
regarding suspend/resume with the nv driver (I take it you are using this rather
than the nvidia driver?):

https://www.redhat.com/archives/fedora-test-list/2007-September/msg00365.html

Cheers
Chris

Comment 5 drago01 2007-10-24 20:10:07 UTC
I did some debugging with kernel-2.6.23.1-10.fc7 (which has pm_trace support for
x86_64). The culprit was firewire. After unloading the modules suspend worked
just fine.
The device causing the failure was:
05:06.1 FireWire (IEEE 1394): Texas Instruments PCIxx12 OHCI Compliant IEEE 1394
Host Controller


Comment 6 drago01 2007-10-27 09:57:24 UTC
I compared the firewire code with that from 2.6.21 (which worked) and ironically
the 2.6.21 had no suspend code at all so the added pci_suspend / pci_resume
functions seem to be the cause of it hanging (look up the box) during resume.
Due to lack of firewire device I cannot confirm if it worked before after resume
at all, but atleast it did not hang the system.


Comment 7 Christopher Brown 2007-12-13 00:23:58 UTC
I'm guessing the problem still exists so am adding Stefan to this as he might
have something to add that may help.

Cheers
Chris

Comment 8 Stefan Richter 2007-12-13 13:37:39 UTC
I interpret comment #5 as:  The culprit is the firewire-ohci module.  Right?

I have sporadically tested it only with suspend(toRAM)/resume on a C2D + 945GM
based x86-64 PC and with APM suspend(toRAM)/resume on an old Pentium MMX
notebook.  Works for me[TM].  I only tested mainline kernels, and I haven't
tested hibernate/restore yet.

The history of suspend/resume|hibernate/restore support in the new firewire
drivers in mainline Linux kernel:
  - implemented .suspend and .resume methods in 2.6.22(-rc*)
  - fixed an issue with iBook G3 and older Powerbooks in 2.6.23(-rc*) and 2.6.22.9
  - fixed loss of SBP-2 and other protocol functionality after resume in
2.6.24(-rc*)



Comment 9 drago01 2007-12-13 22:00:46 UTC
(In reply to comment #8)
> I interpret comment #5 as:  The culprit is the firewire-ohci module.  Right?

Correct

> [..]
> The history of suspend/resume|hibernate/restore support in the new firewire
> drivers in mainline Linux kernel:
>   - implemented .suspend and .resume methods in 2.6.22(-rc*)

Which seems to have broke suspend for me.
I don't know if it worked after suspend before but atleast the system did not
hang on resume.

>   - fixed an issue with iBook G3 and older Powerbooks in 2.6.23(-rc*) and 2.6.22.9

seems to be unrelated to my hardware; tests with f8 (2.6.23.x) seems to confirm
this.

>   - fixed loss of SBP-2 and other protocol functionality after resume in
> 2.6.24(-rc*)

have not tested any 2.6.24 kernel with suspend. 
might give it a shoot if you think that it will change something.

If you need any more information fell free to ask.
 



Comment 10 Stefan Richter 2007-12-13 23:05:48 UTC
>>   - implemented .suspend and .resume methods in 2.6.22(-rc*)
> 
> Which seems to have broke suspend for me.
> I don't know if it worked after suspend before but atleast
> the system did not hang on resume.

Without the methods, the firewire stack ceased to function after a
suspend/resume cycle.

>>   - fixed loss of SBP-2 and other protocol functionality after resume in
>> 2.6.24(-rc*)
> 
> have not tested any 2.6.24 kernel with suspend. 
> might give it a shoot if you think that it will change something.

I don't track Fedora kernel sources; some of the firewire updates which went
into mainline after 2.6.23 was released may have already appeared in Fedora
kernels which feature a 2.6.23.* package name.  In the particular case of this
update, I believe it won't do good nor bad on your system.  It only fixes how
the Linux PC represents itself to external nodes on the FireWire bus.

Things that need to be done by someone are
  - check whether the bug exists in the latest mainline kernel too
  - if yes,
          - debug and fix it there,
          - backport fix into Fedora kernels if appropriate.
  - if not,
          - determine what to backport from mainline to Fedora in order to fix
Fedora as well.

I will eventually try to set up a test PC for hibernation, but I don't know when
I can do this and whether that PC will crash too.  What I or anybody else who
attempts to debug this needs are diagnostic output (kernel panic message) and
probably direct access to the crashing machine for hands-on testing of candidate
fixes.

Comment 11 Stefan Richter 2007-12-13 23:25:30 UTC
(PS:  "-if yes, - debug and fix it there":  If mainline doesn't have a fix for
it yet, then debugging can of course as well proceed on a Fedora kernel rather
than mainline.)

Comment 12 Jarod Wilson 2008-02-14 20:40:33 UTC
I'm going to hijack this bug... :)

Comment 13 drago01 2008-02-23 20:20:23 UTC
This seems to be fixed in 2.6.24.2-7.fc8.
Removed my hack that unloads firewire-ohci from /etc/pm and I can still suspend
and resume without problems.

Comment 14 Stefan Richter 2008-02-23 21:12:10 UTC
Cool.  Now I wonder how this was fixed.  :-)

Comment 15 Jarod Wilson 2008-02-25 20:26:21 UTC
While it'd be nice to know exactly how this got fixed, due to lack of bandwidth,
I'm going to just close this one CURRENTRELEASE. :\