697150 – pm-utils struggle with suspend/hibernate on Asus K52Jc laptop and fail without extensive help

Bug 697150 - pm-utils struggle with suspend/hibernate on Asus K52Jc laptop and fail without extensive help

Summary: pm-utils struggle with suspend/hibernate on Asus K52Jc laptop and fail withou...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	pm-utils
Sub Component:
Version:	16
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Jaroslav Škarvada
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	http://lkml.org/lkml/2011/11/30/456
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-04-16 07:47 UTC by Michal Jaegermann
Modified:	2012-10-24 00:46 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Clones:	798628 (view as bug list)
Environment:
Last Closed:	2012-10-24 00:46:42 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
an extra "hook" file which allows to suspend (466 bytes, text/plain) 2011-04-16 07:49 UTC, Michal Jaegermann	no flags	Details
a screen image after a crash on thaw (no SUSPEND_MODULES defined yet) (728.10 KB, image/png) 2011-04-16 07:52 UTC, Michal Jaegermann	no flags	Details
dmesg after a "boot, hibernate, thaw" sequence with "fixes" already in place (91.58 KB, text/plain) 2011-04-16 07:56 UTC, Michal Jaegermann	no flags	Details
a layout of PCI devices as shown by 'lspci -tv' (1.88 KB, text/plain) 2011-04-16 07:57 UTC, Michal Jaegermann	no flags	Details
an output of pm-utils-bugreport-info.sh (with a working suspend/hibernate) (9.31 KB, text/plain) 2011-04-16 08:01 UTC, Michal Jaegermann	no flags	Details
View All

Description Michal Jaegermann 2011-04-16 07:47:52 UTC

Description of problem:

This is a new Asus K52Jc laptop with a fresh installation of Fedora 14 so I have no idea if this is a regression or this never worked.

Right "out-of-the-box" both suspend and hibernate are locking up a machine even before a screen is turned off.  Suspend right away and hibernate after a while.   Actually in the second case some information is written to a disk as after a power cycle a machine "thawes" in a sense only it never hibernated unless a power was turned off manually.

Booting to a text console and with 'no_console_suspend' is of not much help.
I can find on a console:

sdhci-pci 0000:04:00.0: PCI INT B disabled

and /var/log/pm-suspend.log ends up with

Fri Apr 15 20:31:03 MDT 2011: performing suspend

or "... performing hibernate" and that is as far as things go.

A search on a web brough among others https://bbs.archlinux.org/viewtopic.php?id=99238
With /etc/pm/sleep.d/10ehci_hcd.hook.sh in place (attached, modified from a suggestion in a quoted reference) a machine started to perform suspend/resume.
This is forcing unbinding and binding devices to be found in 
/sys/bus/pci/drivers/ehci_hcd/

OTOH with the above in place a laptop started reliably crashing on attempts to thaw after hybernate.  Attached there is an image of a screen after such crash and no other information is available.  In particular an lines above this trace alread scrolled out before display was available.

What is visible suggested the a file /etc/pm/config.d/hci.cfg with the following content:

SUSPEND_MODULES="xhci_hcd sdhci_pci sdhci"

could help and this turned out to be the case. (I know that I need to enable xhci before 'xhci_hcd' will make a difference but this is doing no harm in any case).  With that I am seeing regularly:

[  235.999141] sdhci-pci 0000:04:00.2: PCI INT B disabled
[  236.082042] irq 17: nobody cared (try booting with the "irqpoll" option)
[  236.082832] Pid: 1738, comm: atd Tainted: G        W   2.6.35.12-88.fc14.x86_64 #1
[  236.083651] Call Trace:
[  236.084483]  <IRQ>  [<ffffffff810a74d7>] __report_bad_irq.clone.1+0x3d/0x8b
[  236.085380]  [<ffffffff810a763f>] note_interrupt+0x11a/0x17f
[  236.086292]  [<ffffffff810a811f>] handle_fasteoi_irq+0xa8/0xce
[  236.087228]  [<ffffffff8100c2ea>] handle_irq+0x88/0x90
[  236.088181]  [<ffffffff81470b44>] do_IRQ+0x5c/0xb4
[  236.089150]  [<ffffffff8146b093>] ret_from_intr+0x0/0x11
[  236.090143]  <EOI> 
[  236.090158] handlers:
[  236.092139] [<ffffffffa0227da5>] (ath_isr+0x0/0x17e [ath9k])
[  236.093211] Disabling IRQ #17

but both suspend and hybernate at last are becoming useful.

Version-Release number of selected component (if applicable):
pm-utils-1.3.1-4.fc14
kernel-2.6.35.11-83.fc14
kernel-2.6.35.12-88.fc14
hal-0.5.14-5.fc14.1
hal-info-20090716-3.fc12
gnome-power-manager-2.32.0-3.fc14
hdparm-9.27-1.fc13

How reproducible:
always

Steps to Reproduce:
1. try to suspend "regular" installation and lock up right away.

Additional info:
The laptop is using Intel(R) Pentium(R) P6100 processor.  Video is Intel.
It has also nVidia GT218 [GeForce 310M] graphics but nouveau drives is not recognizing it so it is not in use.  A layout of a PCI bus is attached.

Comment 1 Michal Jaegermann 2011-04-16 07:49:41 UTC

Created attachment 492542 [details]
an extra "hook" file which allows to suspend

Comment 2 Michal Jaegermann 2011-04-16 07:52:04 UTC

Created attachment 492544 [details]
a screen image after a crash on thaw (no SUSPEND_MODULES defined yet)

Comment 3 Michal Jaegermann 2011-04-16 07:56:18 UTC

Created attachment 492545 [details]
dmesg after a "boot, hibernate, thaw" sequence with "fixes" already in place

Comment 4 Michal Jaegermann 2011-04-16 07:57:34 UTC

Created attachment 492546 [details]
a layout of PCI devices as shown by 'lspci -tv'

Comment 5 Michal Jaegermann 2011-04-16 08:01:51 UTC

Created attachment 492547 [details]
an output of pm-utils-bugreport-info.sh (with a working suspend/hibernate)

Comment 6 Michal Jaegermann 2011-04-16 08:08:41 UTC

SELinux note:
I forgot to add but selinux in an enforcing mode (selinux-policy-targeted-3.9.7-37.fc14) appears to prevent suspend/resume at all. If set "permissive" then suspend/resume triggers multiple complaints. audit2allow from these produced the following:

module local 1.0;

require {
        type var_log_t;
        type consoletype_t;
        type user_devpts_t;
        type syslogd_t;
        type NetworkManager_t;
        class capability sys_module;
        class file { read write ioctl };
        class chr_file open;
}

#============= NetworkManager_t ==============
allow NetworkManager_t self:capability sys_module;

#============= consoletype_t ==============
allow consoletype_t var_log_t:file { read write ioctl };

#============= syslogd_t ==============
#!!!! This avc can be allowed using the boolean 'allow_daemons_use_tty'

allow syslogd_t user_devpts_t:chr_file open;

Comment 7 Michal Jaegermann 2011-04-16 08:10:30 UTC

This report could be closely related, or even a duplicate, of a bug 675564 for Asus K52F laptop.

Comment 8 Jaroslav Škarvada 2011-04-18 10:14:46 UTC

Thanks, closing as dupe of 694191. I prefer this to be fixed in kernel. I will give it deeper look later.

*** This bug has been marked as a duplicate of bug 694191 ***

Comment 9 Jaroslav Škarvada 2011-04-18 16:03:06 UTC

Reopening, the thaw issue seems different to 694191 and I guess it is related to sdhci. I will track the sdhci thaw issue here and the ehci unbind issue in 694191.

I think that the backtrace from comment 0 is there because the sdhci-pci is unable to handle the IRQ 17, because the sdhci-pci module was unloaded earlier.

Comment 10 Michal Jaegermann 2011-04-18 16:51:22 UTC

(In reply to comment #9)

> I think that the backtrace from comment 0 is there because the sdhci-pci is
> unable to handle the IRQ 17, because the sdhci-pci module was unloaded earlier.

Maybe.  Although IRQ 17 looks like serving a wireless interface and you can find in a backtrace from comment 0:

[  236.090158] handlers:
[  236.092139] [<ffffffffa0227da5>] (ath_isr+0x0/0x17e [ath9k])

sdhci-pci, OTOH, seems to be associated with

+-1a.0  Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller

and with

+-1d.0  Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller

'/proc/interrupts' look like this:

           CPU0       CPU1       
  0:        178         18   IO-APIC-edge      timer
  1:        258         15   IO-APIC-edge      i8042
  8:          0          1   IO-APIC-edge      rtc0
  9:       8732       1107   IO-APIC-fasteoi   acpi
 12:        340         70   IO-APIC-edge      i8042
 16:        336         44   IO-APIC-fasteoi   ehci_hcd:usb1
 17:     299990         12   IO-APIC-fasteoi   ath9k
 18:          0          0   IO-APIC-fasteoi   jmb38x_ms:slot0, mmc0
 23:       3194      25240   IO-APIC-fasteoi   ehci_hcd:usb2
 41:          0          0   PCI-MSI-edge      pciehp
 42:          0          0   PCI-MSI-edge      pciehp
 43:          0          0   PCI-MSI-edge      pciehp
 44:     113235       3053   PCI-MSI-edge      ahci
 45:      27030       4057   PCI-MSI-edge      i915@pci:0000:00:02.0
 46:          0          0   PCI-MSI-edge      hda_intel
 47:      12218      13146   PCI-MSI-edge      eth0
NMI:          0          0   Non-maskable interrupts
LOC:    2540931    3509573   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
PND:          0          0   Performance pending work
RES:     294022     282270   Rescheduling interrupts
CAL:       3818       2372   Function call interrupts
TLB:      25872      15243   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:         52         50   Machine check polls
ERR:          9
MIS:          0

Just to keep references in one place. Complaints about IRQ 17 may look like in comment 0 or like the following:

[ 7224.510578] irq 17: nobody cared (try booting with the "irqpoll" option)
[ 7224.510586] Pid: 0, comm: swapper Tainted: G        W   2.6.35.12-88.fc14.x86_64 #1
[ 7224.510590] Call Trace:
[ 7224.510594]  <IRQ>  [<ffffffff81010207>] ? paravirt_read_tsc+0x9/0xd
[ 7224.510614]  [<ffffffff810a74d7>] __report_bad_irq.clone.1+0x3d/0x8b
[ 7224.510617]  [<ffffffff810a763f>] note_interrupt+0x11a/0x17f
[ 7224.510620]  [<ffffffff810a811f>] handle_fasteoi_irq+0xa8/0xce
[ 7224.510624]  [<ffffffff8100c2ea>] handle_irq+0x88/0x90
[ 7224.510629]  [<ffffffff81470b44>] do_IRQ+0x5c/0xb4
[ 7224.510634]  [<ffffffff8146b093>] ret_from_intr+0x0/0x11
[ 7224.510635]  <EOI>  [<ffffffff81265570>] ? intel_idle+0x111/0x139
[ 7224.510643]  [<ffffffff8126554f>] ? intel_idle+0xf0/0x139
[ 7224.510648]  [<ffffffff81394cf1>] cpuidle_idle_call+0x8b/0xe9
[ 7224.510653]  [<ffffffff81008325>] cpu_idle+0xaa/0xcc
[ 7224.510660]  [<ffffffff814527e6>] rest_init+0x8a/0x8c
[ 7224.510665]  [<ffffffff81ba1c49>] start_kernel+0x40b/0x416
[ 7224.510669]  [<ffffffff81ba12c6>] x86_64_start_reservations+0xb1/0xb5
[ 7224.510672]  [<ffffffff81ba13c2>] x86_64_start_kernel+0xf8/0x107
[ 7224.510674] handlers:
[ 7224.510675] [<ffffffffa0227da5>] (ath_isr+0x0/0x17e [ath9k])
[ 7224.510689] Disabling IRQ #17

That seem to depend on a phase of a moon (or maybe on winning or loosing some race).

Be as it may but with additions in /etc/pm/ K52J at least behaves.

Comment 11 Michal Jaegermann 2011-04-18 17:34:57 UTC

Apologies for trickling information like that but devices tied up to sdhci seem to be here these:

 +-1c.5-[04]--+-00.0  JMicron Technology Corp. SD/MMC Host Controller
 |            +-00.2  JMicron Technology Corp. Standard SD Host Controller
 |            +-00.3  JMicron Technology Corp. MS Host Controller
 |            +-00.4  JMicron Technology Corp. xD Host Controller
 |            \-00.5  JMicron Technology Corp. JMC250 PCI Express Gigabit Ethernet Controller

Although ehthernet controller is really using 'jme' for a driver.  At least an SD card reader works after suspend/resumes (hibernate/thaw) cycles.  Presumably other too but I do not have on hands suitable media to check.

Comment 12 Michal Jaegermann 2011-12-02 04:17:33 UTC

See also http://lkml.org/lkml/2011/11/30/456 and other messages in that thread.
Basically the same issue with 3.x kernels so this will no go away with an upgrade.

Comment 13 Michal Jaegermann 2011-12-03 02:25:14 UTC

With an upgrade to Fedora 16 and with 3.1.2-1.fc16.x86_64 the situation actually got worse.  I had to add "ath ath9k" to my SUSPEND_MODULES.  Otherwise my wireless connection would be lost after every resume from suspend and thaw after hibernate would silently fail - somewhere.

Also I would consistently see when suspending/hibernating this machine:

 irq 17: nobody cared (try booting with the "irqpoll" option)
 Pid: 0, comm: swapper Not tainted 3.1.2-1.fc16.x86_64 #1
 Call Trace:
  <IRQ>  [<ffffffff810b2222>] __report_bad_irq+0x38/0xc3
  [<ffffffff810b24bc>] note_interrupt+0x176/0x1fa
  [<ffffffff810b0a0f>] handle_irq_event_percpu+0x15d/0x1a5
  [<ffffffff810b0a92>] handle_irq_event+0x3b/0x59
  [<ffffffff81078268>] ? sched_clock_cpu+0x42/0xc6
  [<ffffffff810b2c7c>] handle_fasteoi_irq+0x80/0xa4
  [<ffffffff81010af9>] handle_irq+0x88/0x8e
  [<ffffffff814c040d>] do_IRQ+0x4d/0xa5
  [<ffffffff814b756e>] common_interrupt+0x6e/0x6e
  <EOI>  [<ffffffff814b71ac>] ? _raw_spin_unlock_irqrestore+0x17/0x19
  [<ffffffff813a5cc3>] ? poll_idle+0x28/0x65
  [<ffffffff813a5cb6>] ? poll_idle+0x1b/0x65
  [<ffffffff813a5fe6>] cpuidle_idle_call+0xe8/0x182
  [<ffffffff8100e2e3>] cpu_idle+0xa4/0xe8
  [<ffffffff81494a8e>] rest_init+0x72/0x74
  [<ffffffff81b76b7d>] start_kernel+0x3ab/0x3b6
  [<ffffffff81b762c4>] x86_64_start_reservations+0xaf/0xb3
  [<ffffffff81b76140>] ? early_idt_handlers+0x140/0x140
  [<ffffffff81b763ca>] x86_64_start_kernel+0x102/0x111
 handlers:
 [<ffffffffa02c0d80>] ath_isr
 Disabling IRQ #17

Comment 14 Michal Jaegermann 2012-01-10 17:22:45 UTC

BTW - I tried what would happen with a "barebones" 

   echo mem > /sys/power/state

while running 3.1.7-1.fc16.x86_64 on my Asus K52Jc.  Nothing too exciting. My machine froze immediately with a blank screen and without even seriously trying to suspend. The first reboot after a powerdown failed with ATA errors.  After the second powerdown the laptop booted normally so a battery removal was not required.

With pm-utils "doctored" as per earlier comments both suspend and hibernation work just fine "IRQ #17" complaints notwithstanding.

Comment 15 Chris Moeller 2012-01-28 07:00:14 UTC

I can verify this also applies to the ASUS U52F-BBL5, although the SUSPEND_MODULES setting does not appear to be necessary to thaw from hibernate, at least not without any SD memory card inserted. The 802.11n, which uses the iwlwifi module, also appears to resume from suspend and hibernate without any changes, at least most of the time.

Comment 16 Chris Moeller 2012-01-28 07:40:55 UTC

I spoke too soon, my system decided to oops in the middle of installing gcc and gcc-g++, and I wasn't able to capture a picture of the messages, as they scrolled by too quickly. It looked like some file system corruption issue, possibly related to the failed hibernate/suspend cycles I tried before installing that workaround script.

Comment 17 Michal Jaegermann 2012-10-24 00:46:42 UTC

Kernel 3.6.2-1.fc16 allows me to suspend and hibernate K52Jc in an "out-of-the-box" configuration.  Please see bug 798628#c12 for details.

If ASUS U52F-BBL5 still locks up please follow up in bug 798628.

Note You need to log in before you can comment on or make changes to this bug.