Bug 204987

Summary: kernel 2.6.17-1.2608 - regress in suspend/hibernate
Product: [Fedora] Fedora Reporter: Michal Jaegermann <michal>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: ncunning, richard, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-24 17:02:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
pci devices listing none

Description Michal Jaegermann 2006-09-01 21:04:40 UTC
Description of problem:

Recent kernels (like 2.6.17-1.2600, 2.6.17-1.2597) were able to
suspend, hibernate and _correctly resume_ on my machine.  With
2.6.17-1.2608.fc6 I see now in logs after such attempt:

Extended CMOS year: 20
Class driver suspend failed for cpu0
Extended CMOS year: 20
Could not power down device &sem->wait_lock: error -22
Some devices failed to power down, aborting suspend

and a machine attempts to resume.  This does not go very well.
After "Suspend" it comes back with a black screen and a dead keyboard.
It is possible to login from a remote and do 'pkill -f gdm'.  Usually
after few tries this successfully restarts X and a machine is usable
again.  Resume from "Hibernate" is less dramatic.  A totally blank
alert shows and it has to be "Force Quit"; following that we are back
in business.

With kernel when it worked I was seeing in logs
....
ACPI: PCI interrupt for device 0000:00:0a.0 disabled pci_set_power_state():
0000:00:00.0: state=3, current state=5
Extended CMOS year: 20
Extended CMOS year: 20
....
with these messages showing up only when machine was resuming.

An output from 'lspci -tv' for the machine in question attached.

Version-Release number of selected component (if applicable):
kernel-2.6.17-1.2608.fc6

How reproducible:
always

Comment 1 Michal Jaegermann 2006-09-01 21:04:41 UTC
Created attachment 135411 [details]
pci devices listing

Comment 2 Michal Jaegermann 2006-09-03 19:22:07 UTC
kernel-2.6.17-1.2611.fc6 (2.6.18rc5-git6) unfortunately produces
the same error:

Could not power down device &sem->wait_lock: error -22

and suspending it fails.

Comment 3 David Lawrence 2006-09-05 15:27:14 UTC
Reassigning to correct owner, kernel-maint.

Comment 4 Michal Jaegermann 2006-10-02 19:58:22 UTC
With 2.6.18-1.2724.fc6 my test machine does "hibernate" and "suspend"
again.  None of kernels between 2.6.17-1.2600 and the current one was
able to do that.

Interestingly enough a restore from "hibernate" is way much faster than
the one from "suspend".  Actually on the first try I was already pretty
convinced that a machine crashed with a dark screen and no response
on a keyboard or a network when it came back to life.  On the second
try a screen picture was one of the first things restored but others
tasks, like shell and dmesg, took their long sweet time before becoming
usable.

Quite possibly this fragment of a dmesg output shows the reason:
....
Restarting tasks...<4>ATA: abnormal status 0x80 on port 0xE807
ATA: abnormal status 0x80 on port 0xE807
ATA: abnormal status 0x80 on port 0xE807
 done
Enabling non-boot CPUs ...
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
ACPI: Sleep Button (CM) [SLPB]
ieee1394: Initialized config rom entry `ip1394'
ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[201]  MMIO=[fd800000-fd8007ff] 
Max Packet=[2048]  IR/IT contexts=[4/8]
agpgart: Found an AGP 3.5 compliant device at 0000:00:00.0.
agpgart: Putting AGP V3 device at 0000:00:00.0 into 4x mode
agpgart: Putting AGP V3 device at 0000:01:00.0 into 4x mode
[drm] Loading R300 Microcode
ieee1394: Host added: ID:BUS[0-00:1023]  GUID[00e0180000575a14]
e100: eth1: e100_watchdog: link up, 10Mbps, half-duplex
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: (BMDMA stat 0x1)
ata1.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1: soft resetting port
ata1.00: configured for UDMA/133
ata1: EH complete
....

Looking at /var/log/messages does not clarify that as all corresponding
(and more) entries have the same timestapm; likely from the moment when
syslogd started to operate again.

"Enabling non-boot CPUs ..." above is not doing very much. There is
only one CPU around and no hyperthreading.

Comment 5 Richard Hughes 2007-05-24 10:34:43 UTC
Does this work with the latest kernel? Can this bug be closed? Thanks.

Comment 6 Michal Jaegermann 2007-05-24 17:02:46 UTC
> Does this work with the latest kernel?

With 2.6.21-1.3189.fc7 I see, both with suspend and hibernate, these:

ATA: abnormal status 0x7F on port 0x000000000001e807
ATA: abnormal status 0x7F on port 0x000000000001e807

Yes, always twice in a row.  But this does not seem to affect adversly
anything and it looks like a part of a normal operation.  My particular
desktop box seems to be better at suspend and hibernate than may laptops.

One curious side-effect is that currently after a suspend the first
text console is "painted" all white.  If you will manage to force
a screen refresh on it then it reverts to normal colours.