1034457 – (Changed) Delay in accepting SDHC card (mmc0: Timeouts) interferes with hibernation.

Bug 1034457 - (Changed) Delay in accepting SDHC card (mmc0: Timeouts) interferes with hibernation.

Summary: (Changed) Delay in accepting SDHC card (mmc0: Timeouts) interferes with hiber...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	18
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-11-25 21:34 UTC by aaronsloman
Modified:	2014-02-05 22:33 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-02-05 22:33:53 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description aaronsloman 2013-11-25 21:34:05 UTC

Description of problem:

I have been using pm-hibernate successfully for some time both on my Dell Latitude E6410 running Fedora18 and on desktop PC running Fedora19

When the kernel on the laptop was upgraded from 3.10.14-100.fc18.x86_64 to 3.11.7-100.fc18.x86_64 I found that after pm-hibernate, instead of resume/thaw the laptop would sometimes reboot, losing the saved state.

So I have restored 3.10.14-100 as my default kernel and so far resume seems to be working as before.

Version-Release number of selected component (if applicable):

Problem kernel: kernel-3.11.7-100.fc18.x86_64
Hibernate from: pm-utils-1.4.1-22.fc18.x86_64
(used with both kernels.)

How reproducible: Not totally: sometimes resume works as expected.


Steps to Reproduce:
1. Boot into kernel 3.11.7-100
2. Start X windows, and run firefox and other things
3. run pm-hibernate
4. Reboot machine.

Actual results:

Instead of restoring the the saved state it rebooted several times, wasting work (e.g. when the machine had been set up for a presentation).

Expected results:
Should resume/thaw instead of re-booting.

Additional info:
I have been too busy since this problem turned up to try multiple hibernates to collect information about frequency of failure.

I am hoping this is just a simple glitch in 3.11.7-100 that will soon be fixed.
On my desktop PC pm-hibernate and resume works fine with kernel 3.11.4-201.fc19.x86_64

Comment 1 aaronsloman 2013-11-25 21:36:56 UTC

Point of clarification:

When I wrote "Instead of restoring the saved state it rebooted several times, wasting work" I did not mean that one attempt to resume caused several reboots, but that on several different occasions after pm-hibernate it rebooted instead of resuming.

Comment 2 aaronsloman 2013-11-26 03:16:31 UTC

I tried going back to kernel-3.11.7-100.fc18.x86_64 to see if I could replicate the failed resume. I could not, but I found another problem. After the pm-hibernate command the machine nothing happened, and I could not kill the command. This went on for a couple of minutes, and then I remembered that I had previously seen something similar with the SD card installed (which I leave in unmounted, and occasionally mount it for backup while travelling).

So I tried removing the card and immediately the hibernate proceeded normally.
Resume also worked. I was able to replicate this behaviour a few times using 3.11.7-100 and also found multiple entries in the /var/log/messages file of this form: 
mmc0: Timeout waiting for hardware interrupt.

I then went back to kernel 3.10.14-100.fc18.x86_64  and everything seemed to work perfectly again. The pm-hibernate command worked immediately whether the sd card was in or out, and resume always worked. So I have left that kernel running.

The upshot is that I know there is something wrong with 3.11.7-100 but don't have a consistent set of symptoms.

Since 3.10.14-100 seems to work perfectly I'll stick with that till there's a new kernel to try.

Comment 3 aaronsloman 2013-11-29 03:32:31 UTC

Update: I did some more systematic experiments with kernel 3.10.14-100 and found that the difference between the two kernels (3.10 and 3.11) was not as great as I had thought.

In both cases there is a problem dealing with the SD card if left in its drive on the laptop. It takes about two minutes after booting for the card to be recognized, as shown by entries like this in /var/log/messages:

Nov 28 17:53:11 lape kernel: [49030.076387] mmc0: Timeout waiting for hardware interrupt.
Nov 28 17:53:21 lape kernel: [49039.701264] mmc0: Timeout waiting for hardware interrupt.
Nov 28 17:53:31 lape kernel: [49049.656023] mmc0: Timeout waiting for hardware interrupt.
Nov 28 17:53:41 lape kernel: [49059.663317] mmc0: Timeout waiting for hardware interrupt.
Nov 28 17:53:51 lape kernel: [49069.673535] mmc0: Timeout waiting for hardware interrupt.
Nov 28 17:54:01 lape kernel: [49079.683903] mmc0: Timeout waiting for hardware interrupt.
Nov 28 17:54:11 lape kernel: [49089.694290] mmc0: Timeout waiting for hardware interrupt.
Nov 28 17:54:21 lape kernel: [49099.703445] mmc0: Timeout waiting for hardware interrupt.
Nov 28 17:54:31 lape kernel: [49109.709635] mmc0: Timeout waiting for hardware interrupt.
Nov 28 17:54:41 lape kernel: [49119.717020] mmc0: Timeout waiting for hardware interrupt.
Nov 28 17:54:51 lape kernel: [49129.725232] mmc0: Timeout waiting for hardware interrupt.
Nov 28 17:55:01 lape kernel: [49139.734021] mmc0: Timeout waiting for hardware interrupt.
Nov 28 17:55:11 lape kernel: [49149.743215] mmc0: Timeout waiting for hardware interrupt.
Nov 28 17:55:11 lape kernel: [49149.821810] mmc0: new high speed SDHC card at address 0001
Nov 28 17:55:11 lape kernel: [49149.822058] mmcblk0: mmc0:0001 00000 7.46 GiB
Nov 28 17:55:11 lape kernel: [49149.823283] mmcblk0: p1

If I run pm-hibernate during that interval after booting, the system freezes, the hibernate command cannot be interrupted then eventually (e.g. up to two minutes later) hibernation starts and completes successfuly. After resume from hibernate, the above pattern repeats.

I had not noticed this in kernel 3.10.14-100 because I had not been trying to hibernate within two minutes of booting or resuming. But when I installed the later kernel I started testing hibernates in quick succession and had problems with freezing and in one case not resuming properly (though I am not sure why as I don't recall details of the context: it may have been caused by an attempt to connect to a projector too soon).

I've also confirmed that pm-suspend also freezes while waiting for the SDHC card to be connected. Then, after a delay it succeeds.

Fortunately, not everything freezes while waiting for the SDHC card to be recognised. E.g. wireless connection usually proceeds as normal immediately after resume from hibernate, which is one reason why it took me so long to identify the problem. But it does not always succeed. I've had some mysterious failures to connect after hibernate or suspend. Most recently I had to disable wireless and re-enable it for the connection to complete properly. I can't be sure that was a side effect of the SDHC connection problem.

So I need to transfer this bug to another category, since it seems to be concerned with the driver for SDHC drives rather than pm-utils. But I have not been able to locate the category.

However, I've changed the bug heading from:
"pm-hibernate had been working well then when I installed kernel 3.11.7-100.fc18.x86_64 it rebooted instead of resuming"

to:
"Delay in accepting SDHC card (mmc0: Timeouts) interferes with hibernation."

Perhaps someone can tell me how to specify the relevant component in the bug description, to replace pm-utils.

I have the impression others have been confused by this bug, e.g. this may be the same bug:

Bug #1022453 : nothing happened after run "pm-suspend"

Maybe severity should be set to "high" rather than medium, especially if there are other processes impeded by the timeouts.

Comment 4 Jaroslav Škarvada 2013-11-29 08:34:28 UTC

This seems to be kernel bug, reassigning to kernel for further investigation.

Comment 5 aaronsloman 2013-11-29 11:45:02 UTC

Following comments in the other bug report

    Bug #1022453 : nothing happened after run "pm-suspend"

I tried using 'systemctl hibernate' instead of 'pm-hibernate', during the period when mmc0 timeouts were being reported after resuming from a previous hibernate.

That produced the same behaviour as pm-hibernate, i.e. system froze for a while, then hibernated. Later it resumed normally. In case it's useful information here is some of the information in /var/log/messages from just before I gave the 'systemctl hibernate' command.

Nov 29 11:04:53 lape kernel: [84270.937936] ACPI: Waking up from system sleep state S4
......
Nov 29 11:04:53 lape kernel: [84272.099645] PM: restore of devices complete after 1016.364 msecs
Nov 29 11:04:53 lape kernel: [84272.099872] Restarting tasks ... done.
......
Nov 29 11:04:59 lape kernel: [84277.916833] wlan0: associated
......
Nov 29 11:05:04 lape NetworkManager[745]: <info> Activation (wlan0) successful, device activated.
......
[This seems to be where I gave the hibernate command]
......
Nov 29 11:05:13 lape kernel: [84292.233790] mmc0: Timeout waiting for hardware interrupt.
Nov 29 11:05:18 lape systemd[1]: Starting Sleep.
Nov 29 11:05:18 lape systemd[1]: Reached target Sleep.
Nov 29 11:05:18 lape systemd[1]: Starting Hibernate...
Nov 29 11:05:18 lape systemd-sleep[18754]: Hibernating system...
Nov 29 11:05:19 lape NetworkManager[745]: <info> (wlan0): IP6 addrconf timed out or failed.
Nov 29 11:05:19 lape NetworkManager[745]: <info> Activation (wlan0) Stage 4 of 5 (IPv6 Configure Timeout) scheduled...
Nov 29 11:05:19 lape NetworkManager[745]: <info> Activation (wlan0) Stage 4 of 5 (IPv6 Configure Timeout) started...
Nov 29 11:05:19 lape NetworkManager[745]: <info> Activation (wlan0) Stage 4 of 5 (IPv6 Configure Timeout) complete.
Nov 29 11:05:23 lape kernel: [84302.239407] mmc0: Timeout waiting for hardware interrupt.
Nov 29 11:05:33 lape kernel: [84312.245358] mmc0: Timeout waiting for hardware interrupt.
Nov 29 11:05:43 lape kernel: [84322.251648] mmc0: Timeout waiting for hardware interrupt.
Nov 29 11:05:53 lape kernel: [84332.258258] mmc0: Timeout waiting for hardware interrupt.
Nov 29 11:06:03 lape kernel: [84342.265150] mmc0: Timeout waiting for hardware interrupt.
Nov 29 11:06:13 lape kernel: [84352.253318] mmc0: Timeout waiting for hardware interrupt.
Nov 29 11:06:23 lape kernel: [84362.244860] mmc0: Timeout waiting for hardware interrupt.
Nov 29 11:06:33 lape kernel: [84372.254147] mmc0: Timeout waiting for hardware interrupt.
Nov 29 11:06:43 lape kernel: [84382.264430] mmc0: Timeout waiting for hardware interrupt.
Nov 29 11:06:53 lape kernel: [84392.274778] mmc0: Timeout waiting for hardware interrupt.
Nov 29 11:07:03 lape kernel: [84402.285120] mmc0: Timeout waiting for hardware interrupt.
Nov 29 11:07:03 lape kernel: [84402.364642] mmc0: new high speed SDHC card at address 0001
Nov 29 11:07:03 lape kernel: [84402.364911] mmcblk0: mmc0:0001 00000 7.46 GiB
Nov 29 11:07:03 lape kernel: [84402.366165]  mmcblk0: p1
Nov 29 11:07:03 lape kernel: [84402.366462] mmc0: card 0001 removed
Nov 29 11:07:03 lape systemd-udevd[18757]: inotify_add_watch(7, /dev/mmcblk0, 10) failed: No such file or directory
Nov 29 11:07:03 lape systemd-udevd[18757]: inotify_add_watch(7, /dev/mmcblk0p1, 10) failed: No such file or directory
Nov 29 11:10:02 lape kernel: [84402.369227] PM: Syncing filesystems ... done.
.....
Followed by usual information about freezing and then resuming.

I hope that helps to locate the problem. Presumably this will not affect machines that either do not have SD card readers, or do not have cards left in them.

System: Dell Latitude E6410 with Intel core i5 CPU running kernel
    3.11.7-100.fc18.x86_64 #1 SMP Mon Nov 4 15:09:31 UTC 2013

Comment 6 aaronsloman 2013-11-30 23:17:14 UTC

I have now found a workaround following a comment in  Bug #1015800:
"SD card not detected and automounted after resume"

As described there, I created a script /etc/pm/sleep.d/99-sdcard
(There's nothing else in the directory. I am not sure whether the prefix should be 99 or 00, but it works as 99.)

This is a modified version of the script, to cope with suspend as well as hibernate:

#!/bin/bash

if [ "$1" == "hibernate" ] || [ "$1" == "suspend" ] ; then

    echo "umount /dev/mmcblk0p1"
    umount /dev/mmcblk0p1

    echo "modprobe -r sdhci_pci"

    modprobe -r sdhci_pci

elif [ "$1" == "thaw" ] || [ "$1" == "resume" ] ; then

   echo "modprobe sdhci_pci"

   modprobe sdhci_pci

   sleep 1

   echo "mount /dev/mmcblk0p1"
   mount /dev/mmcblk0p1

fi

(I presume a 'case' format would be more compact.)

I have now tested this with pm-hibernate and pm-suspend.

The 'echo' commands produce trace printing in /var/log/pm-suspend.log

I am not sure whether this is a kernel bug or a pm-utils bug.

Either way, if the bug is not fixed at a more central location, it seems that the pm-utils package should provide a generic script to unmount /dev/mmcblk0p1 for suspend and hibernate, and then re-mount if the device was previously mounted.

NOTE:
The first time I tried this without the 'sleep 1' command during thaw, and the mount failed.

Comment 7 Fedora End Of Life 2013-12-21 14:47:04 UTC

This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 8 Fedora End Of Life 2014-02-05 22:33:53 UTC

Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.