Bug 204987 - kernel 2.6.17-1.2608 - regress in suspend/hibernate
kernel 2.6.17-1.2608 - regress in suspend/hibernate
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-09-01 17:04 EDT by Michal Jaegermann
Modified: 2007-11-30 17:11 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-24 13:02:46 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
pci devices listing (1.63 KB, text/plain)
2006-09-01 17:04 EDT, Michal Jaegermann
no flags Details

  None (edit)
Description Michal Jaegermann 2006-09-01 17:04:40 EDT
Description of problem:

Recent kernels (like 2.6.17-1.2600, 2.6.17-1.2597) were able to
suspend, hibernate and _correctly resume_ on my machine.  With
2.6.17-1.2608.fc6 I see now in logs after such attempt:

Extended CMOS year: 20
Class driver suspend failed for cpu0
Extended CMOS year: 20
Could not power down device &sem->wait_lock: error -22
Some devices failed to power down, aborting suspend

and a machine attempts to resume.  This does not go very well.
After "Suspend" it comes back with a black screen and a dead keyboard.
It is possible to login from a remote and do 'pkill -f gdm'.  Usually
after few tries this successfully restarts X and a machine is usable
again.  Resume from "Hibernate" is less dramatic.  A totally blank
alert shows and it has to be "Force Quit"; following that we are back
in business.

With kernel when it worked I was seeing in logs
....
ACPI: PCI interrupt for device 0000:00:0a.0 disabled pci_set_power_state():
0000:00:00.0: state=3, current state=5
Extended CMOS year: 20
Extended CMOS year: 20
....
with these messages showing up only when machine was resuming.

An output from 'lspci -tv' for the machine in question attached.

Version-Release number of selected component (if applicable):
kernel-2.6.17-1.2608.fc6

How reproducible:
always
Comment 1 Michal Jaegermann 2006-09-01 17:04:41 EDT
Created attachment 135411 [details]
pci devices listing
Comment 2 Michal Jaegermann 2006-09-03 15:22:07 EDT
kernel-2.6.17-1.2611.fc6 (2.6.18rc5-git6) unfortunately produces
the same error:

Could not power down device &sem->wait_lock: error -22

and suspending it fails.
Comment 3 David Lawrence 2006-09-05 11:27:14 EDT
Reassigning to correct owner, kernel-maint.
Comment 4 Michal Jaegermann 2006-10-02 15:58:22 EDT
With 2.6.18-1.2724.fc6 my test machine does "hibernate" and "suspend"
again.  None of kernels between 2.6.17-1.2600 and the current one was
able to do that.

Interestingly enough a restore from "hibernate" is way much faster than
the one from "suspend".  Actually on the first try I was already pretty
convinced that a machine crashed with a dark screen and no response
on a keyboard or a network when it came back to life.  On the second
try a screen picture was one of the first things restored but others
tasks, like shell and dmesg, took their long sweet time before becoming
usable.

Quite possibly this fragment of a dmesg output shows the reason:
....
Restarting tasks...<4>ATA: abnormal status 0x80 on port 0xE807
ATA: abnormal status 0x80 on port 0xE807
ATA: abnormal status 0x80 on port 0xE807
 done
Enabling non-boot CPUs ...
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
ACPI: Sleep Button (CM) [SLPB]
ieee1394: Initialized config rom entry `ip1394'
ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[201]  MMIO=[fd800000-fd8007ff] 
Max Packet=[2048]  IR/IT contexts=[4/8]
agpgart: Found an AGP 3.5 compliant device at 0000:00:00.0.
agpgart: Putting AGP V3 device at 0000:00:00.0 into 4x mode
agpgart: Putting AGP V3 device at 0000:01:00.0 into 4x mode
[drm] Loading R300 Microcode
ieee1394: Host added: ID:BUS[0-00:1023]  GUID[00e0180000575a14]
e100: eth1: e100_watchdog: link up, 10Mbps, half-duplex
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: (BMDMA stat 0x1)
ata1.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1: soft resetting port
ata1.00: configured for UDMA/133
ata1: EH complete
....

Looking at /var/log/messages does not clarify that as all corresponding
(and more) entries have the same timestapm; likely from the moment when
syslogd started to operate again.

"Enabling non-boot CPUs ..." above is not doing very much. There is
only one CPU around and no hyperthreading.
Comment 5 Richard Hughes 2007-05-24 06:34:43 EDT
Does this work with the latest kernel? Can this bug be closed? Thanks.
Comment 6 Michal Jaegermann 2007-05-24 13:02:46 EDT
> Does this work with the latest kernel?

With 2.6.21-1.3189.fc7 I see, both with suspend and hibernate, these:

ATA: abnormal status 0x7F on port 0x000000000001e807
ATA: abnormal status 0x7F on port 0x000000000001e807

Yes, always twice in a row.  But this does not seem to affect adversly
anything and it looks like a part of a normal operation.  My particular
desktop box seems to be better at suspend and hibernate than may laptops.

One curious side-effect is that currently after a suspend the first
text console is "painted" all white.  If you will manage to force
a screen refresh on it then it reverts to normal colours.

Note You need to log in before you can comment on or make changes to this bug.