Bug 806315 - Resume from pm-hibernate requires acpi=off on Dell latitude E6410 using kernel 3.3.0-4.fc16
Summary: Resume from pm-hibernate requires acpi=off on Dell latitude E6410 using kerne...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: pm-utils
Version: 16
Hardware: i686
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Jaroslav Škarvada
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 807615 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-23 12:24 UTC by aaronsloman
Modified: 2013-02-14 01:23 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Problem persists with kernel 3.3.8-1.fc16.i686 and with kernel 3.4.2-1.fc16.i686 also fc17
Clone Of:
Environment:
Last Closed: 2013-02-14 01:23:48 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description aaronsloman 2012-03-23 12:24:12 UTC
Description of problem: 
After pm-hibernate, resume sometimes works and sometimes fails, producing a full reboot after almost resuming.
This is intermittent: I can resume successfully three or four times, then fail.
If I add "acpi=off" to boot command it is always successful.

Version-Release number of selected component (if applicable):
pm-utils-1.4.1-12.fc16.i68

(I think the problem has existed for some time, using earlier kernels and earlier versions of pm-utils.)

How reproducibleo Very - but not with absolute predictability. It sometimes occurs after a few successful resumes. The use actpi=off when resuming seems to work totally reliably, but should not be necessary.

Steps to Reproduce:
1. pm-hibernate
2. resume
3. repeat 
  
Actual results:
Eventually resume fails just before completion.

Expected results:
Resume should not fail.

Additional info:
I normally boot to non-graphic mode (run level 3) then use startx. I use ctwm as window manager, but have had the same result using Openbox.

Normally when I resume it goes back to graphic mode, with window manager.

Comment 1 aaronsloman 2012-03-28 21:40:44 UTC
I have now tried this with the latest patch kernel 3.3.0-7.1.fc16.i686 #1 SMP Wed Mar 28 19:04:51 UTC 2012. I have twice had resume fail without acpi=off although it also succeeded three times.

I shall now try regularly resuming with acpi=off -- to see if the new patch alters the effect of that.

Comment 2 aaronsloman 2012-03-28 21:55:42 UTC
*** Bug 807615 has been marked as a duplicate of this bug. ***

Comment 3 Jaroslav Škarvada 2012-03-29 12:14:12 UTC
Currently I am not aware of such regression. Did it work correctly before (i.e. with older Fedoras)?

acpi=off seems to be overkill here, please try various hibernation modes:

# echo platform > /sys/power/disk
# echo disk > /sys/power/state

# echo shutdown > /sys/power/disk
# echo disk > /sys/power/state

# echo reboot > /sys/power/disk
# echo disk > /sys/power/state

Also if you have time, try debugging according to instruction from kernel-doc package (/usr/share/doc/kernel-doc-*/Documentation/power/basic-pm-debugging.txt

Some logs/backtraces from console would be really helpful here. Sometimes it is possible to get some more console output by adding: "no_console_suspend debug ignore_loglevel" to kernel boot parameters (in grub).

Comment 4 aaronsloman 2012-03-29 13:54:29 UTC
(In reply to comment #3)
Thanks for responding to this.

> Currently I am not aware of such regression. Did it work correctly before (i.e.
> with older Fedoras)?

When I first got the Dell Laptop in June 2010 (E6410 with intel graphics) I had great problems getting graphics to work at all. Eventually I found a workable version of fedora 13, and was able to use tuxonice for hibernate/resume. That worked mostly, but there were various restrictions -- e.g. it crashed if I had an external monitor or projector plugged in while going in or out of graphic mode, and I could  not use the built in SD card reader or use the Alps touch-pad to scroll. I think the built in microphone also did not work on linux.

I used 2.6.33.5-112_1.cubbi_tuxonice.fc13.i686 in this way for several months.

Because of the restrictions, I tried various later versions of fedora, and also Arch and Ubuntu, and all gave problems. In January 2011 I tried F14 with 2.6.36 kernel, but had trouble with graphics. 

I have a note on 26th Jan 2011 saying "If a screensaver is allowed to blank the screen, nothing can get it out of the blank state, except a reboot.
Moreover, sometimes resuming after hibernating does not restore the screen properly." So I went back to F13.

In March 2011 I switched to F15 Alpha, partly in order to be able to use the SD card. For a while it worked very well (in a note I wrote that it was "rock solid".

In May 2011 I switched to the 'final release' of F15, using the fxce spin. I had various problems with hibernate/resume and at some point (I forget when) found that hibernate tuxonice did not work at all for me. Sometimes it would freeze during hibernate, and mostly it would not resume without rebooting.

In Sep 2011 I recorded that with kernel 2.6.40-4 if I hibernate and resume more than once, resume intermittently stops working and I am forced to reboot. I found a suggestion online to try adding "acpi=off" as a boot flag. That fixed the problem of resume not completing. (I don't understand how).

Likewise with later kernels, including 2.6.41.4-1.fc15.i686.

Later, in Jan 2012 I switched to F16. Intermittent freezing during hibernate continued until it seemed to be fixed in 3.3.0-4.fc16.i686 , but not the problem of resume failing (leading to reboot), unless I use acpi=off for resume.ve 

So I have had this problem with older kernels, including fedora 15 since the middle of 2011. I did not then know how to report bugs here.

I'll try to understand the rest of your suggestions and see if I can provide more information later. (I am relatively ignorant of kernel matters, and merely use the tools that do what I want, often blindly following suggestions I find on the internet.)

If there is a particularly relevant documentation web site spelling out options for hibernate for a novice, please let me know.

In case it's relevant, this is my grub.cfg boot line:

linux   /vmlinuz-3.3.0-7.1.fc16.i686 root=UUID=29cae70a-ba5d-4178-8d93-30208331e510 ro rd.md=0 rd.lvm=0 rd.dm=0 SYSFONT=latarcyrheb-sun16  KEYTABLE=uk rd.luks=0 LANG=en_US.UTF-8

Also my /etc/systemd/system/default.target is a link to
/lib/systemd/system/runlevel3.target

So I start in non-graphical mode and can also run pm-hibernate before running startx

(I don't use gnome or kde so I can't use their menu options.)

Comment 5 aaronsloman 2012-03-29 14:26:35 UTC
(In reply to comment #3)
I managed to install kernel-doc and found the basic-pm-debugging.txt file

So I tried the first option there:
# echo platform > /sys/power/disk
# echo disk > /sys/power/statech 

which was also in your list. It resumed once OK but on the second time failed to complete the resume and instead rebooted.

I'll try the others and also try your suggested boot parameters.

Comment 6 aaronsloman 2012-03-29 15:05:43 UTC
> So I tried the first option there:
> # echo platform > /sys/power/disk

Sorry typo: that was "echo reboot"

> # echo disk > /sys/power/statech 
> 
> which was also in your list. It resumed once OK but on the second time failed
> to complete the resume and instead rebooted.

Before going on to other tests, since I did that test after entering graphic mode (startx) I thought I should try in text only mode.

I've now done that three times, using your suggested extra boot parameters in grub.cfg

linux   /vmlinuz-3.3.0-7.1.fc16.i686 root=UUID=29cae70a-ba5d-4178-8d93-30208331e510 ro rd.md=0 rd.lvm=0 rd.dm=0 SYSFONT=latarcyrheb-sun16  KEYTABLE=uk rd.
luks=0 LANG=en_US.UTF-8 no_console_suspend debug ignore_loglevel

The first two times hibernate+resume worked, but on the third time the resume did not complete. So the failure does not require going into graphic mode.
It does have the screen in high resolution mode.

The file /var/log/pm-suspend.log was not altered by any of this. I can't find anything in /var/log/messages that seems to be related to the hibernate testing.

Comment #3 states:
Some logs/backtraces from console would be really helpful here. Sometimes it is
possible to get some more console output by adding: "no_console_suspend debug
ignore_loglevel" to kernel boot parameters (in grub).

Unfortunately, nothing I've tried so far seems to save any console output. The output during resume scrolls up very fast and then the screen just goes blank before the resume fails and reboot starts. I'll try the other commands in your message.

I'll now see if 'echo platform' makes a difference.

Comment 7 aaronsloman 2012-03-29 15:49:43 UTC
Reporting on this test from comment #3

> # echo platform > /sys/power/disk
> # echo disk > /sys/power/state

Doesn't automatically reboot after hibernate. Sometimes resume succeeds, and sometimes it gets very near the end but then screen goes blank and reboots.
I.e. the usual symptom for this bug. I can't find anything relevant in /var/messages

I'll see if there's anything I can read that may suggest a way to collect more information.

I am not sure whether the further information in basic-pm-debugging.txt is relevant. It seems mainly to do with hibernation failing, whereas in my case hibernation works, but resume intermittently fails.

Comment 8 aaronsloman 2012-03-29 17:27:31 UTC
While searching for more information I found this: 
https://wiki.archlinux.org/index.php/Pm-utils#Hibernation_.28suspend2disk.29

  "Reboot instead of resume from suspend

  This problem started when saving NVS area during suspend was introduced (in  2.6.35-rc4) (mailing list post). However, it is known that this mechanism does not work on all machines, so the kernel developers allow the user to disable it with the help of the acpi_sleep=nonvs kernel command line option. This option could be pass to the kernel through GRUB options by editing the file /boot/grub/menu.lst (GRUB 0.97) on the kernel line." 

I tried adding "acpi_sleep=nonvs" in my grub.cfg, but in spite of that resume after hibernate failed.

The failure seems to be always very near the end of the resume process.

Comment 9 aaronsloman 2012-03-29 23:03:14 UTC
# echo shutdown > /sys/power/disk
# echo disk > /sys/power/state

Also doesn't always allow resume to work.

Using acpi_sleep=nonvs in grub.cfg, I have just had a longer than usual sequence of pm-hibernate+resume cycles without resume failing -- about six or sevn

So I thought that the extra boot flag acpi_sleep=nonvs might have helped after all, but then resume failed again. So that's not a fix now.

I then went through trying all the hibernate test options in 
/usr/share/doc/kernel-doc-3.3.0/Documentation/power/basic-pm-debugging.txt

including the ones involving  /sys/power/pm_test

I tried each one twice, and did not get any useful information printed out.

Because I understand so little and all these tests seem to take up so much time without yielding any useful information I think I might as well settle for just using acpi=off when resuming from hibernate, as that very reliably.

However, I have found some web sites recommending use of acpi=noirq so I'll experiment with that. 

I can't use acpi=off for booting: too many things don't work then. So I use it only for resuming. I'll see if acpi=noirq prevents resume failing and whether it causes problems when used for booting. I don't know if I am wasting my time on irrelevant clues!

Comment 10 aaronsloman 2012-03-30 22:10:35 UTC
(In reply to comment #9)
> However, I have found some web sites recommending use of acpi=noirq so I'll
> experiment with that. 
> 
> I can't use acpi=off for booting: too many things don't work then. So I use it
> only for resuming. I'll see if acpi=noirq prevents resume failing and whether
> it causes problems when used for booting. I don't know if I am wasting my time
> on irrelevant clues!

Several trouble free hibernate+resume cycles later it really looks as if on this machine (Dell latitude E6410, with intel graphics, using boot flag acpi=noirq when booting or resuming patch kernel 
3.3.0-7.1.fc16.i686 #1 SMP Wed Mar 28 19:04:51 UTC 2012

everything works. There's no freeze during hibernate, and resume after hibernate has always completed normally instead of rebooting since I inserted the acp=noirq flag in grub.cfg

This also means I no longer need two grub entries for the same kernel, one for a fresh  boot and one for resume.

Comment 11 aaronsloman 2012-03-31 02:45:38 UTC
I tried using the 'acpi=noirq' solution on my Desktop PC running Fedora 15. It seems to work fine in Kernel 2.6.41.4-1.fc15.i686 -- i.e. hibernate and resume both work, though it seems I also need the 'nomodeset' boot flag. 

But if I try newer F15 kernels, I just get either hibernate or resume crashing, e.g. most recently 2.6.42.12-1.fc15.i686

I shall shortly upgrade the PC to F16 to see if the latest F16 kernels work as well on that machine as they do on my laptop.

Comment 12 aaronsloman 2012-04-01 10:14:21 UTC
(In reply to comment #10)
> Several trouble free hibernate+resume cycles later it really looks as if on
> this machine (Dell latitude E6410, with intel graphics, using boot flag
> acpi=noirq when booting or resuming patch kernel 
> 3.3.0-7.1.fc16.i686 #1 SMP Wed Mar 28 19:04:51 UTC 2012
> 
> everything works. There's no freeze during hibernate, and resume after
> hibernate has always completed normally instead of rebooting since I inserted
> the acp=noirq flag in grub.cfg

That proved a false hope. This morning 'yum update' installed a new kernel 3.3.0-8.fc16.i686

I tried  booting and resuming with the acpi=noirq option but resume regularly failed, producing a reboot instead.

So I tried going back to the patch kernel 3.3.0-7.1.fc16.i686 and found that the solution that appeared to work for it no longer does: resume regularly failed to complete, switching to reboot near the end.

So I am now using 3.3.0.8 with my old solution: have two grub.cfg entries, one for booting and one for resume (using acpi=off)

This is on Dell Latitude E6410 laptop, using intel graphics.

Comment 13 aaronsloman 2012-04-04 00:45:55 UTC
Some new information. I was having problems with recent kernels, using Fedora 15 on my Desktop PC (intel core i5 with intel graphics) so upgraded to F16.

That machine is now running the same kernel as my laptop (Dell latitude E6410), namely 3.3.0-8.fc16.i686

However, on the desktop PC I don't seem to have any problems with hibernate and resume, using standard boot parameters and booting into runlevel3, before starting X (with Ctwm window manager).

On the laptop, however, pm-hibernate now works fine, but resume frequently fails when it has nearly finished resuming, and reboots. I can cure that by using acpi=off when resuming from hibernate.

Unfortunately, I don't get any information in /var/log indicating what went wrong.

The resume aborts after the display has switched to high resolution, showing progress of the resume. Then the screen goes blank, and should restore the previous windows, but instead reboots.

Since both machines use i915 I presume this means that there is some hardware difference that is causing the problem. Or could it be a firmware problem on the laptop? (I have ugraded the firmware twice since the machine was new, and it is now up to date I think.)

Comment 14 aaronsloman 2012-04-12 21:38:26 UTC
(In reply to comment #13)
 
> That machine is now running the same kernel as my laptop (Dell latitude E6410),
> namely 3.3.0-8.fc16.i686
> 
> However, on the desktop PC I don't seem to have any problems with hibernate and
> resume, using standard boot parameters and booting into runlevel3, before
> starting X (with Ctwm window manager).
> 
> On the laptop, however, pm-hibernate now works fine, but resume frequently
> fails when it has nearly finished resuming, and reboots. I can cure that by
> using acpi=off when resuming from hibernate.

Update: I am now using kernel 3.3.1-3.fc16.i686 and on both machines I have had multiple successful hibernate/resume cycles without use of 'acpi=off' in boot command for resume.

I sometimes find that pm-hibernate sometimes seems to shut the PC down but without turning off power. I.e. lights are still on. If I then hold power button for several seconds it shuts down fully. It then resumes from hibernation when next booted.

Comment 15 aaronsloman 2012-04-12 21:47:21 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Not sure it is fixed, but it appears to be. 3.3.1-3 has been working ok through several hibernate resume cycles on both laptop and desktop PCs running 32 bit fedora 16

Comment 16 Jaroslav Škarvada 2012-04-16 13:13:13 UTC
Thanks for info, closing according to comment 14.

If the machine doesn't properly shut down during the hibernate cycle even if hibernated with:
# echo disk > /sys/power/state

I would recommend filling new bug against kernel.

Comment 17 aaronsloman 2012-04-22 15:17:48 UTC
(In reply to comment #16)
> If the machine doesn't properly shut down during the hibernate cycle even if
> hibernated with:
> # echo disk > /sys/power/state
> 
> I would recommend filling new bug against kernel.

Not turning off power after hibernate has happened only once since kernel 3.3.1-3. So there's no pattern I can report. If it happens again I'll start a new bug. So far it has not happened with kernel 3.3.2-1.

Comment 18 aaronsloman 2012-04-22 15:29:43 UTC
(In reply to comment #15)
I wrote on 2012-04-12
> Not sure it is fixed, but it appears to be. 3.3.1-3 has been working ok through
> several hibernate resume cycles on both laptop and desktop PCs running 32 bit
> fedora 16

Alas that was premature. Last night resume failed to complete. It seemed to get very close to finishing, then the screen went blank and full reboot started shortly after.

I have now installed kernel 3.3.2-1.fc16.i686 and will see if that makes any difference.

Unfortunately, when resume fails to complete and switches to reboot there seems to be no way to get information about how far the resume process gets before failing.

Comment 19 aaronsloman 2012-04-22 15:31:48 UTC
Forgot to say resume failing occurred on the laptop: Dell Latitude E6410 with intel graphics.

Comment 20 aaronsloman 2012-04-25 17:16:08 UTC
(In reply to comment #18)
> (In reply to comment #15)
> I wrote on 2012-04-12
> > Not sure it is fixed, but it appears to be. 3.3.1-3 has been working ok through
> > several hibernate resume cycles on both laptop and desktop PCs running 32 bit
> > fedora 16
> 
> Alas that was premature. Last night resume failed to complete. It seemed to get
> very close to finishing, then the screen went blank and full reboot started
> shortly after.
> 
> I have now installed kernel 3.3.2-1.fc16.i686 and will see if that makes any
> difference.

No, 3.3.2-1 also did not always resume fully and the same thing has now happened with kernel 3.3.2-6.fc16.i686

The resume goes through several different phases, including changing screen resolution, then displaying the percentages as it decompresses. My impression is that it gets all the way to 100%, or very close, and then the final step of restoring hibernated process fails, and it reboots.

Nothing relevant to the failure seems to be in /var/log. pm-suspend.log includes only details of the latest previous successful resume.

So I shall now return to using the flag acpi=off for resume, as it seems to prevent this failure.

Can this bug be re-opened?

Comment 21 aaronsloman 2012-04-25 17:16:08 UTC
Deleted Technical Notes Contents.

Old Contents:
Not sure it is fixed, but it appears to be. 3.3.1-3 has been working ok through several hibernate resume cycles on both laptop and desktop PCs running 32 bit fedora 16

Comment 22 aaronsloman 2012-04-25 17:18:49 UTC
Sorry. Forgot about comment 3. I'll try the suggestion there:

> Some logs/backtraces from console would be really helpful here. Sometimes it is
> possible to get some more console output by adding: "no_console_suspend debug
> ignore_loglevel" to kernel boot parameters (in grub).

Comment 23 Jaroslav Škarvada 2012-04-25 18:34:23 UTC
Reopening per comment 20.

Comment 24 aaronsloman 2012-04-25 20:17:07 UTC
(In reply to comment #22)
> Sorry. Forgot about comment 3. I'll try the suggestion there:
> 
> > Some logs/backtraces from console would be really helpful here. Sometimes it is
> > possible to get some more console output by adding: "no_console_suspend debug
> > ignore_loglevel" to kernel boot parameters (in grub).

I added that to the boot parameters and the next time I tried to resume after pm-hibernate it again failed to complete the resume, and rebooted. But I neither noticed more console output nor found anything new in /var/log

I wonder if the fact that I always boot into runlevel 3 and then run startx in order to enter graphical mode, which I suspect is fairly unusual, has something to do with the symptoms. If I hibernate with X running then when I resume it should go straight back to graphical mode -- and normally does, except when resume fails. So far "acpi=off" seems to be the only thing that ensures that resume succeeds, but that may be just luck.

Unfortunately I don't understand enough about kernel programming to have more ideas. I'll try to understand the files in /usr/share/doc/kernel-doc-3.3.2/Documentation/power and see if I learn anything relevant.

Comment 25 aaronsloman 2012-04-26 13:30:33 UTC
(In reply to comment #24)
> I wonder if the fact that I always boot into runlevel 3 and then run startx in
> order to enter graphical mode, which I suspect is fairly unusual, has something
> to do with the symptoms. If I hibernate with X running then when I resume it
> should go straight back to graphical mode -- and normally does, except when
> resume fails. So far "acpi=off" seems to be the only thing that ensures that
> resume succeeds, but that may be just luck.

Some new information. I've tried hibernating after booting, without going into graphical mode (i.e. not running startx), using the latest kernel:
3.3.2-6.fc16.i686 #1 SMP Sat Apr 21 13:23:12 UTC 2012

What's new now is that in 3.3.2-6 resume after hibernate fails EVERY time, een if X has not been started, whereas previously it was intermittent (though I have not kept detailed records -- too busy with other things).

However, if I add acpi=off (for resume only, not for a fresh boot), then the resume succeeds, reliably, so far.

I don't know enough about acpi to if there's anything else I can try. I saw a mention of acpid in
/usr/share/doc/kernel-doc-3.3.2/Documentation/power/apm-acpi.txt

and then discovered that I did not have acpid installed so I installed it, and tried hibernate with it running, but that made no difference: resume still failed to complete.

Summary: Dell Latitude E6410 with 1440x900 display, Intel graphics and 4 core intel i5 cpu. Runnind fedora 16, booting to runlevel 3, using kernel 3.3.2-6.fc16.i686. pm-hibernate always works. Resume works only with acpi=off added to grub line for resume (not for boot). Without that addition, resume seems to be working perfectly until screen is blanked prior to restoring display: but then it crashes and reboots.

It looks as if something is being done (or not done) because of acpi=off which ought to be handled by kernel resume code, near the end of resume process.
 
The effect used to be intermittent, but with 3.3.2-6 it has become totally predictable.

Comment 26 aaronsloman 2012-04-26 15:32:36 UTC
(In reply to comment #25)
> The effect used to be intermittent, but with 3.3.2-6 it has become totally
> predictable.

I did a bit more experimenting and found that the failure to resume without acpi=off is not as predictable as I thought. It sometimes resumes successfully.

But so far it has always resumed successfully with acpi=off added.

Comment 27 aaronsloman 2012-04-26 15:32:36 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Problem persists in 3.3.2-6.fc16.i686

Comment 28 aaronsloman 2012-04-29 09:58:51 UTC
(In reply to comment #27)
> Problem persists in 3.3.2-6.fc16.i686

That's on my laptop: Dell E6410. For resume to work reliably it requires acpi=off (not when booting, only when resuming from hibernate).

I've now upgraded my desktop PC to 3.3.2-6.fc16.i686 and so far resume works fine without acpi=off

The desktop also uses integrated intel graphics, and the i915 module, but the hardware is different, including use of an external monitor, unlike laptop.

I don't know if this is relevant, but for several months in 2010 after I bought the laptop I used kernel 2.6.33.5-112_1.cubbi_tuxonice.fc13.i686 and hibernate and resume worked fine. However there were other problems, e.g. SD card did not work, Alps touch pad could not be used for scrolling, and switching in or out of graphics mode with an external monitor connected crashed the system.

The i915 module was not available at that time. I think the laptop also worked well with Fedora 15 Alpha, using kernel 2.6.38.

The problem of resume failing on the Desktop machine started in 2.6.40-4  according to some old notes I've found. I think it persisted until I switched to Fedora 16 at the beginning of April 2012.

Google shows lots of users reporting failure of resume after hibernate, so I suspect it's not just my hardware. Example: https://bbs.archlinux.org/viewtopic.php?id=123117

Comment 29 aaronsloman 2012-04-29 13:03:29 UTC
(In reply to comment #28)
This proved too optimistic:
> I've now upgraded my desktop PC to 3.3.2-6.fc16.i686 and so far resume works
> fine without acpi=off

Alas, after a couple more hibernate resume cycles I was proved wrong: 3.3.2-6 repeatedly failed to resume, even with acpi=off added

So, on the desktop PC I have now gone back to 3.3.2-1.fc16.i686, which had previously worked well for about 9 days, hibernating and resuming, at least once a day, and sometimes several times in a day, without using acpi=off for resume. Something that had changed in 3.3.2-6 seems to have reintroduced intermittent failure to resume completely on my desktop PC -- even though that kernel works on my laptop, using acpi=off when resuming.

Comment 30 aaronsloman 2012-05-06 00:35:34 UTC
I now have kernel 3.3.4-1.fc16.i686 on both desktop PC and Dell E6410 Laptop.
In contrast with comment #29 3.3.4-1 works fine on the PC. I can hibernate and resume successfully, without requiring acpi=off in the boot command.

However acpi=off is still required for resume to work on the laptop. I can't use that when booting as it stops too many things working (e.g. adjustment of screen brightness). But I have to use it when resuming, otherwise resume fails at the last moment and the machine reboots instead. So this is the menu entry for RESUME, in grub.cfg on the laptop:

menuentry 'Fedora (3.3.4-1.fc16.i686)RESUME' --class fedora --class gnu-linux --class gnu --class os {
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_msdos
        insmod ext2
        set root='(hd0,msdos6)'
        search --no-floppy --fs-uuid --set=root edcc4XXXXXXXX
        echo 'Loading Fedora (3.3.4-1.fc16.i686)'
        linux   /vmlinuz-3.3.4-1.fc16.i686 root=UUID=XXXXXXXXXX ro rd.md=0 rd.lvm=0 rd.dm=0 SYSFONT=latarcyrheb-sun16 KEYTABLE=uk rd.luks=0 LANG=en_US.UTF-8 acpi=off
        echo 'Loading initial ramdisk ...'
        initrd /initramfs-3.3.4-1.fc16.i686.img
}

The entry for BOOT is the same minus "acpi=off".

I can live with this though it is a nuisance having to duplicate the grub entry whenever I install a new kernel, and having to remember to use the BOOT option, not the RESUME option when booting. The default is RESUME, as I rarely need to BOOT.

Comment 31 aaronsloman 2012-05-06 00:35:35 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Problem persists in 3.3.2-6.fc16.i686+Problem persists in 3.3.2-6.fc16.i686 and in 3.3.4-1.fc16.i686

Comment 32 aaronsloman 2012-05-10 16:16:10 UTC
(In reply to comment #30)
I wrote on 2012-05-05:
> I now have kernel 3.3.4-1.fc16.i686 on both desktop PC and Dell E6410 Laptop.
> In contrast with comment #29 3.3.4-1 works fine on the PC. I can hibernate and
> resume successfully, without requiring acpi=off in the boot command.

That proved premature. After several days working fine on the PC, resume failed and triggered a reboot -- as happened more often on the laptop.

Have now installed  3.3.4-3.fc16.i686 on both. Laptop, as before, fails to resume properly without acpi=off, so have added a boot menu item including it. The desktop PC has resumed twice without it.

Is there a way to tell grub/grub2 "if resuming add_flag "acpi=off"?
That would save the hassle of creating two entries in grub.cfg and the need to remember to chose the one without acpi setting for full reboot.

Comment 33 Arne Woerner 2012-05-10 18:15:33 UTC
u could change /boot/grub2/grubenv "saved_entry" before doing pm-hibernate (so that u use the thaw-grub-menu-entry on thaw) and after pm-hibernate u could change it back... :-)

maybe with a /usr/lib64/pm-utils/sleep.d/ script?

Comment 34 aaronsloman 2012-05-11 08:37:41 UTC
(In reply to comment #33)
> u could change /boot/grub2/grubenv "saved_entry" before doing pm-hibernate (so
> that u use the thaw-grub-menu-entry on thaw) and after pm-hibernate u could
> change it back... :-)
> 
> maybe with a /usr/lib64/pm-utils/sleep.d/ script?

Thanks. I'll have to find time to do a lot of reading and then try these. I am not a system programmer, and I find grub2 one of the most messy systems I have ever had to interact with, with no decent user documentation (that I could find) and very obscure, badly commented, scripts for a user to try to edit in order to control its behaviour. There isn't even a 'man' file with the obvious name. (I am currently using Fedora 16). It may do very sophisticated and useful things, but that doesn't help someone who needs to understand how to change the defaults. I guess I should try to find a place to make constructive suggestions from a relatively ignorant  user's point of view.

Comment 35 Jaroslav Škarvada 2012-05-11 09:33:32 UTC
(In reply to comment #34)
grub2 is messy but powerfull :) The above mentioned hack should be do-able with it:

1) edit /etc/default/grub and add/modify the GRUB_CMDLINE_LINUX to read:
GRUB_CMDLINE_LINUX="quiet rhgb \${resume_hack}"

2) regenerate your grub2 config by:
# grub2-mkconfig -o /boot/grub2/grub.cfg

3) add pm-utils user hack, in /etc/pm/sleep.d create file 01grub-hack (it must have executable permissions, i.e. chmod 0755) with following content:
#!/bin/sh

case "$1" in
	hibernate)
		grub2-editenv - set resume_hack='acpi=off'
		;;
	thaw)
		grub2-editenv - unset resume_hack
		;;
	*) exit $NA
		;;
esac


Now it should work (at least it worked on my testbox :) One problem with this hack is, that in case the resume fails, there will be still acpi=off for next boots. In such case it will need one successful hibernate/resume cycle to sync the settings, or you will need to issue 'grub2-editenv - unset resume_hack' from console and reboot. Maybe the above approach could be improved to handle this, but it would be probably more complex code.

Comment 36 aaronsloman 2012-05-12 15:48:01 UTC
(In reply to comment #35)
> (In reply to comment #34)
> grub2 is messy but powerfull :) The above mentioned hack should be do-able with
> .....
> 
Thanks very much for the suggestions. When I find time I'll try to read relevant documentation on hibernate and grub2, and will try to understand your suggestions but for now I've decided on a simpler strategy that I can easily transfer from one machine to another by copying two scripts.

1. After installing new kernel, run this script to copy grub.cfg to grub.cfg-resume and insert acpi=off in the new file, using 'ed' ('sed' would be cleaner, but I am not so familiar with it):

#!/bin/bash
cd /boot/grub2
cp grub.cfg grub.cfg-resume
ed grub.cfg-resume <<  \\\\
1
?vmlinuz?
s/$/ acpi=off/p
w
q
\\

2. Invoke pm-hibernate via this script

#!/bin/bash

cd /boot/grub2

# save grub.cfg, then install a copy of grub.cfg-resume in its place
echo "saving grub.cfg"
mv grub.cfg grub.cfg-saved

echo "set up grub.cfg-resume as grub.cfg"
cp -p grub.cfg-resume grub.cfg

# hibernate then restore grub.cfg-saved
pm-hibernate
echo "restore grub.cfg-saved"
mv grub.cfg-saved grub.cfg

This has the same problem as the solution in comment #35 that if resume fails then grub.cfg will have to be restored. But a simple 'mv' command will do that.

This is still too messy. But it works and no obscure grub2 mechanisms are required to make it work.
Thanks.

Comment 37 aaronsloman 2012-05-13 12:56:37 UTC
Problem of resume without 'acpi=off' persists in new kernel 3.3.5-2.fc16.i686 (so far only tried on laptop).

My scripts in Comment #36 had typos and design flaws. New script to edit grub.cfg to produce grub.cfg-resume, including inserting 'RESUME' in menu item header, as reminder when grub shows boot menu:

#!/bin/bash
# Run after kernel update produces new grub.cfg

cp grub.cfg grub.cfg-resume

ed grub.cfg-resume <<  \\\\
/menuentry/p
s/i686)/i686)RESUME/p
/vmlinuz/p
s/$/ acpi=off/p
w
q
\\

Regarding the script to run pm-hibernate, I discovered the hard way the need to test for existence of grub.cfg-resume before continuing, so I now have a test:

#!/bin/bash

cd /boot/grub2

if  [ -f grub.cfg-resume ]; then

    echo "Ready to hibernate"
    echo "Saving grub.cfg"
    mv grub.cfg grub.cfg-saved
    echo "Set up grub.cfg-resume as grub.cfg"
    cp -p grub.cfg-resume grub.cfg
    /usr/sbin/pm-hibernate
    echo "Restore grub.cfg-saved"
    mv grub.cfg-saved grub.cfg

else

    echo "Cannot find grub.cfg-resume"

    exit

fi

I could also use existence of grub.cfg-saved as a test for previous resume not completing, as an extra precaution. But that no longer happens with acpi=off, so far.

Comment 38 aaronsloman 2012-05-27 21:05:30 UTC
(In reply to comment #16)
> If the machine doesn't properly shut down during the hibernate cycle even if
> hibernated with:
> # echo disk > /sys/power/state
> 
> I would recommend filling new bug against kernel.

Just for the record, I have now discovered the cause of machine not shutting down at the end of hibernate. This happens whenever I forget to unplug my pinnacle dvb-t USB TV stick. This may be a motherboard feature rather than a hibernate bug. However it does not bother me because I can force a shutdown by holding the power button for a few seconds. Then resume works fine on power up
(subject to the acpi=off requirement described above)

Comment 39 aaronsloman 2012-05-27 21:25:08 UTC
UPDATE: I have just installed kernel 3.3.7-1.fc16.i686 on my laptop and the bug still persists. After installation I had two successful resumes from pm-hibernate followed a failed resume causing reboot.

As before the resume almost completes (the counter seems to get to 100% or very close) and then fails just when the screen should change back to its pre-hibernate state.

So I have reverted to my scheme described in comment #37, using a script to replace grub.cfg with an edited version containing acpi=off before invoking pm-hibernate. This mechanism has been working perfectly for some weeks: pm-hibernate is very fast and resume always succeeds using the extra flag.

I have found the same on my desktop PC still running 3.3.4-3.fc16.i686. I don't always install new kernels on the PC since installing a new kernel is disruptive, requiring a full reboot and losing all my saved virtual desktops.

Because this slightly messy solution seems to work so reliably on both my laptop and desktop PC, both of which use the i915 module, I wonder if something could be built into the hibernate-resume code to make it unnecessary for users to mess around with grub.cfg for resume to work.

Could the hibernate/resume code check for presence of i915, and if it is in use do the equivalent of acpi=off during resume, before the final step of restoring machine state?

Then presumably when the state is restored, the original acpi setting will be restored anyway, which is what happens to me now if I use acpi=off for resume but not for boot.

Unfortunately I am not a kernel programmer so I would not be able to try implementing this suggestion. However, if someone else can do this I would be willing to test it if given full instructions (preferably with a 32-bit f16 kernel rpm to install).

Thanks.

Comment 40 aaronsloman 2012-06-15 22:35:00 UTC
Today I installed kernel 3.3.8-1.fc16.i686 on my Dell E 6410 laptop (not yet on my desktop machine). After pm-hibernate it resumed successfully three times, but on the fourth occasion, as in the past, it crashed and rebooted, just before restoring the screen. (Ie. the resume counter seemed to reach 100%)

So I've again resorted to the regime described in Comment #37, namely altering grub.cfg temporarily before calling pm-hibernate, so that on resume the kernel line includes 'acpi=off'

However I have now noticed something that I previously missed. When I resume with the acpi=off switch set in grub.cfg (which always resumes perfectly) it reports using only 1 thread for decompression (though it still goes very fast). If I resume without that setting then it reports using 3 threads for decompression.

In both cases pm-hibernate uses 3 threads for compression.

Could the crash+reboot on resume be connected with use of 3 threads for decompression?

I don't know why resume uses only 1 thread with acpi=off. Is there some way to tell pm-hibernate that resume should use only one thread to decompress, to see if that is sufficient to prevent the intermittent crashing on resume?

I had previously noticed that it sometimes used only one thread and sometimes three, but had not connected the difference with resuming successfully or failing and had not linked the number of threads to acpi=off.

When it fails to complete resuming, I had been looking for some kind of trace in /var/log/ but found nothing obviously relevant. Is that because the crash during resume occurs before file systems are mounted, so records cannot be written?

I have not had a crash during resume with acpi=off for a long time, on either my laptop or my desktop machine, both running Fedora 16. So, in case the tip is useful for others I have created a file describing what I do:
http://www.cs.bham.ac.uk/~axs/laptop/hibernate-on-linux.html (I don't know if it could be relevant to non-fedora users.)

Comment 41 aaronsloman 2012-06-15 22:35:00 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Problem persists in 3.3.2-6.fc16.i686 and in 3.3.4-1.fc16.i686+Problem persists with kernel 3.3.8-1.fc16.i686

Comment 42 aaronsloman 2012-06-20 19:40:54 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1,2 @@
-Problem persists with kernel 3.3.8-1.fc16.i686+Problem persists with kernel 3.3.8-1.fc16.i686
+and with kernel 3.4.2-1.fc16.i686

Comment 43 aaronsloman 2012-07-27 22:15:42 UTC
Just installed kernel 3.4.6-1.fc16.i686 on laptop (Dell E6410) problem persists. I still have to use acpi=off for resume from hibernate.

Note also that as reported in Bug #842291 the sequence
 pm-hibernate
 boot into Windows 7 
 in windows do restart
 resume linux from grub menu

prevents the next pm-hibernate from working. It just fails and returns to the state before the hibernate command. Then the only option is to shutdown completely and reboot.

The failure to hibernate can be avoided by using shutdown instead of restart in windows.

Comment 44 aaronsloman 2012-08-14 14:20:59 UTC
Installed Fedora 17 with kernel 3.5.1-1.fc17.i686 on Dell E6410

The problem persists: i.e. resume from hibernate normally fails to complete, unless resume uses acpi=off

There seems to be an interaction with bug Bug #842291 (On dual-boot machine if I shutdown from MS Windows 7 and then start F16 the next pm-hibernate works, but if I restart/reboot from Win7 then hibernate fails.) In F17 (not sure about F16) the failure to hibernate occurs only after resume with acpi=off.

Comment 45 Jaroslav Škarvada 2012-08-16 16:06:56 UTC
According to attachment 604651 [details] in bug 842291 it seems your BIOS doesn't match ACPI spec:
> [    0.000000] ACPI Warning: 32/64 FACS address mismatch in FADT - two FACS tables! (20120320/tbfadt-378)
> [    0.000000] ACPI Warning: 32/64X FACS address mismatch in FADT - 0xDB76BF40/0x00000000DB76ED40, using 32 (20120320/tbfadt-502)

AFAIK according to spec. there shouldn't be two tables and the FACS is critical for resume to work correctly. But I am not sure whether this is the source of your problem. You are running 32 bit kernel and it selected 32 bit table thus it could work. Also it seems you are running BIOS version A11, the latest is A12, but from the Dell changelog it doesn't seem that the A12 contains any relevant fixes. I tried to reproduce this on various machines with e1000e and Core i5 CPU, but I wasn't successful to reproduce, so it may be BIOS problem.

You may try the following alternative methods of hibernate/resume:

1) Check what options are supported on your platform by:
# cat /sys/power/disk
[platform] shutdown reboot

In the above example it indicates the "platform" mode is the default and that it also supports "shutdown" and "reboot" modes.

2) Create arbitrary file in /etc/pm/config.d, e.g. /etc/pm/config.d/conf
that contains the following line:
HIBERNATE_MODE="shutdown"

Then hibernate/resume as usual. Also try the "reboot" mode. They will bypass the firmware and it may help in case of broken BIOS.

Comment 46 aaronsloman 2012-08-16 21:14:39 UTC
(In reply to comment #45)
> According to attachment 604651 [details] in bug 842291 it seems your BIOS
> doesn't match ACPI spec:
> > [    0.000000] ACPI Warning: 32/64 FACS address mismatch in FADT - two FACS tables! (20120320/tbfadt-378)
> > [    0.000000] ACPI Warning: 32/64X FACS address mismatch in FADT - 0xDB76BF40/0x00000000DB76ED40, using 32 (20120320/tbfadt-502)

Thanks for taking the trouble. This is beyond my technical competence,  but ..

> ... But I am not sure whether this is the
> with e1000e and Core i5 CPU, but I wasn't successful to reproduce, so it may
> be BIOS problem.
> ...Also it seems you are running BIOS version A11, the latest is A12, 

I thought I had the latest bios because I searched for updates using the machine's 'service tag' and it told me there were no updates. But following your report I've searched directly and found the A12 bios update (dated 6/5/2012) and installed it.

I am pleased to report that since then I have managed (using F17 3.5.1-1 kernel ) to resume from hibernate without acpi=off five times in succession. So the new bios definitely seems to have changed something. I'll keep trying hibernate and resume without acpi=off and see if the success persists for several days. If it does I'll report back here.

I shall now see whether it also affects the problem in Bug #842291 (On dual-boot machine if I shutdown from MS Windows 7 and then start F16 the next pm-hibernate works, but if I restart/reboot from Win7 then hibernate fails.)

I'll study your other suggestions later: they may no longer be needed.

Thanks very much.

Comment 47 aaronsloman 2012-08-17 09:40:16 UTC
(Further reply to comment #45)
> According to attachment 604651 [details] in bug 842291 it seems your BIOS
> doesn't match ACPI spec:
> 
> AFAIK according to spec. there shouldn't be two tables and the FACS is
> critical for resume to work correctly. But I am not sure whether this is the
> source of your problem. You are running 32 bit kernel and it selected 32 bit
> table thus it could work. Also it seems you are running BIOS version A11,
> the latest is A12, but from the Dell changelog it doesn't seem that the A12
> contains any relevant fixes. I tried to reproduce this on various machines
> with e1000e and Core i5 CPU, but I wasn't successful to reproduce, so it may
> be BIOS problem.

After updating the bios to A12 and finding that resume worked without acpi=off, I looked more closely at the other available Dell driver upgrades for E6410+32bit Win7 (which I had somehow missed previously ) and found some of them included bios upgrades, so I installed them also, including some that affected video, audio, hard drive firmware, and Alps touchpad, e.g.:

Intel_multi-device_A09_R307992.exe (65MB) Recommended 09/02/2012 v. 8.15.10.2418, A09
Applies to: Dell Device Intel GMA HD

Dell-Driver DELL_MULTI-TOUCH-TOUCHPAD_A10_R315893.exe

(one effect seems to be that the touchpad works better on windows but for some reason scrolling by touch no longer works on F17 -- I'll have to investigate that separately.)

After the installation I repeated my hibernate+resume tests a few times and everything seems to be working, though I may not have tried all the contexts for testing restart from windows.

I'll continue testing and report again later before confirming that everything works.

Summary: problems with hibernate+resume and switching between linux and windows while hibernated seem to be resolved in Fedora 17 kernel 3.5.1-1.fc17.i686 (tested using graphical boot+ Openbox window manager, and (my favourite) boot to runlevel 3, then login and after updates if required launch X with startx and use Ctwm window manager. This has survived several hibernate resume cycles. 

So it's possible that users of gnome, kde, etc. (which I find monstrosities) may have different experiences.

For the record:
One thing I now miss is the grub menu reminding me that the machine was last hibernated not shut down, so I shall use the technique of Comment #37, with a 'do-hibernate' script that switches between two versions of grup.cfg, one of which displays 'RESUME' and the other doesn't, except that I'll no longer use 'acpi=off' in the second.

Comment 48 Jaroslav Škarvada 2012-08-17 11:10:38 UTC
(In reply to comment #47)
> For the record:
> One thing I now miss is the grub menu reminding me that the machine was last
> hibernated not shut down, so I shall use the technique of Comment #37, with
> a 'do-hibernate' script that switches between two versions of grup.cfg, one
> of which displays 'RESUME' and the other doesn't, except that I'll no longer
> use 'acpi=off' in the second.
>
Interesting idea and I think it is feasible (clean way) with the current pm-utils grub hook and grub2 environment. Please file RFE bug about it on component pm-utils, version rawhide.

Comment 49 aaronsloman 2012-08-18 02:27:53 UTC
(In reply to comment #47)
I wrote previously:
> Summary: problems with hibernate+resume and switching between linux and
> windows while hibernated seem to be resolved in Fedora 17 kernel
> 3.5.1-1.fc17.i686 (tested using graphical boot+ Openbox window manager, and
> (my favourite) boot to runlevel 3, then login and after updates if required
> launch X with startx and use Ctwm window manager. This has survived several
> hibernate resume cycles.

Unfortunately, resume from hibernate without acpi=off still sometimes fails.

I've now installed kernel 3.5.2-1 and so far it resumed successfully two or three times and failed twice. I shall have to experiment to see if there's a pattern.

I wonder if there could be connection with my switching grub.cfg to boot into level 3 instead of 5. I may be able to do a bit more testing over the weekend, but things are still changing as I gradually build up my previous collection of installed packages.I don't know if any of them could interact badly with hibernate.

Strangely, another user is reporting that kernel-3.5.2-1.fc17.x86_64 doesn't even hibernate, in Bug #788433 (Core i7 cannot pm-hibernate/pm-suspend/thaw properly), whereas it hibernates every time for me.

Comment 50 Jaroslav Škarvada 2012-08-20 07:26:05 UTC
(In reply to comment #49)
> I wonder if there could be connection with my switching grub.cfg to boot
> into level 3 instead of 5.
>
It shouldn't fail more in runlevel 3 (or whatever it is called now). In case it fails in runlevel 5 and not 3, it is mostly X.org driver issue.

Comment 51 aaronsloman 2012-08-21 00:38:11 UTC
After several more failures to resume (nearly completes resume, then reboots), using both 3.5.1-1.fc17.i686 and 3.5.2-1.fc17.i686, I've reluctantly decided to stop experimenting and go back to the previous solution, namely before hibernate modify grub.cfg so that acpi=off is added for use when resuming, and then make sure it's not there when doing a full reboot.

On my desktop PC that has now been working for 73 days without a full reboot, as shown by 'uptime', in fedora 16.

Comment 52 John Schmitt 2012-08-27 05:43:08 UTC
aaronsloman,

would your technique of editing grub and wrapping pm-hibernate work similarly for pm-suspend?  How much does it matter that you're not editing /etc/default/grub and invoking grub2-mkconfig?

Comment 53 aaronsloman 2012-08-27 21:41:29 UTC
Apologies for being slow to respond to comments/questions, owing to other commitments.

(In reply to comment #52)
> would your technique of editing grub and wrapping pm-hibernate work
> similarly for pm-suspend?  How much does it matter that you're not editing
> /etc/default/grub and invoking grub2-mkconfig?

I don't know, as I don't ever use suspend. That's because normally, except when travelling, if I hibernate my laptop (running 32-bit F17) I don't know whether I'll next resume in one or two days or one or two weeks, as most of my work uses a desktop PC (running 32-bit F16), and that hibernates and resumes so fast, that I would not see any benefit in using suspend. (I also don't know the energy costs).

Also I don't see how it would be possible to use acpi=off for resume from suspend since I don't think resume after suspend goes through the grub boot menu. I suspect the processes of resuming from hibernate and from suspend are so different that the cases of non-resuming would be very different. In my case if I don't use acpi=off resume from  hibernate often fails, but only at the last minute, after the compressed state has been read in and the display should switch back to the state in which the pm-hibernate command was given. I don't know whether at that point the resume from hibernate process is similar to the resume from suspend process. Perhaps an expert reading this will comment.

Comment 54 aaronsloman 2012-08-31 13:15:02 UTC
(In reply to comment #48 -- Jaroslav Škarvada 2012-08-17)
> (In reply to comment #47)
> > For the record:
> > One thing I now miss is the grub menu reminding me that the machine was last
> > hibernated not shut down, so I shall use the technique of Comment #37, with
> > a 'do-hibernate' script that switches between two versions of grup.cfg, one
> > of which displays 'RESUME' and the other doesn't, except that I'll no longer
> > use 'acpi=off' in the second.
> >
> Interesting idea and I think it is feasible (clean way) with the current
> pm-utils grub hook and grub2 environment. Please file RFE bug about it on
> component pm-utils, version rawhide.
 
I have now done that: bug #853419 2 RFE: After pm-hibernate grub menu should be changed with "RESUME" indicated for the relevant OS while allowing other options e.g. boot a different OS"

Comment 55 aaronsloman 2012-09-04 02:13:09 UTC
Just installed kernel 3.5.3-1.fc17.i686 #1 SMP Wed Aug 29 19:25:38 UTC 2012

So far I have managed to use pm-hibernate and successful resume without acpi=off 
four times. Could a resume bug have been fixed?

I'll go on trying and report back here later.

Comment 56 aaronsloman 2012-09-04 19:48:40 UTC
Three more successful resumes today, after pm-hibernate, without using acpi=off.

I wonder if there's some place I can read what changed in the kernel from  kernel-3.5.3-1.fc17.i686 to help me decide whether the sequence of successful resumes from pm-hibernate has an explanation and can be relied on, or whether I have just been lucky so far, and cannot rely on it. 

Using acpi=off after pm-hibernate has (for me) guaranteed successful resume for several weeks on my F17 laptop and for nearly three months on PC running F16 with kernel 3.3.7-1.fc16.i686

Comment 57 aaronsloman 2012-09-05 00:23:14 UTC
Two more successful resumes from pm-hibernate without acpi=off. I've begun to wonder whether this could be connected with differences between NetworkManager and wicd.

For some time (two or three years?) I have always removed or disabled NetworkManager and instead used wicd because it has a very much better, well integrated, user interface, which can be launched from a shell command.

However in Fedora 17 I've had problems with wicd apparently not saving or using security information I give it via wicd-client, so as an experiment I recently disabled wicd and started using only NetworkManager -- despite disliking its user interface intensely.

So perhaps pm-hibernate was expecting NM to be available during resume and that's why it often crashed near the final stages of resuming from hibernate.
Could acpi=off have prevented that?

If resume goes on working for some time, I may try switching back to using wicd, and then see if crashing during resume restarts.

Comment 58 aaronsloman 2012-09-05 10:59:55 UTC
(In reply to comment #57)
> ....
> If resume goes on working for some time, I may try switching back to using
> wicd, and then see if crashing during resume restarts.

Twice tried running pm-hibernate while wicd was running, and still managed to resume without acpi=off.

So it looks as if the previous resume failures were not caused by wicd.
So far successful resume/thaw 13 times since booting 3.5.3-1.fc17.i686

It really looks as if something has been fixed.

Comment 59 aaronsloman 2012-09-06 00:07:43 UTC
(In reply to comment #58)
Bad news!

I wrote:
> So it looks as if the previous resume failures were not caused by wicd.
> So far successful resume/thaw 13 times since booting 3.5.3-1.fc17.i686
> 
> It really looks as if something has been fixed.

I decided to investigate whether the apparent fix was due to the latest kernel or one of the other updates. So I rebooted into the previous kernel 3.5.2-3.fc17.i686. 

After the second attempt to resume it failed to complete, and rebooted. So I went back to the latest kernel, 3.5.3-1.fc17.i686 which had previosly apparently been working perfectly.

Not any more: I made several attempts to resume after pm-hibernate, and only about half succeeded.

So it seems to be random behaviour. Not surprising on a 4-core intel core-i5.
The message file shows that booting on different occasions produces different sequences of events. But I don't know if that randomness is the source of the failure to resume.

I then wondered whether after a failed resume, the recovery boot state might in some way be affected by the previous failure. So after the last failed resume, I allowed the first boot to complete, then rebooted again.

Since that second reboot after the last resume failure the machine has hibernated and resumed successfully at least 6 times, without any reboot needed.

I'll report later whether that pattern continues. If anyone wants to look at the messages files for boot sequences followed quickly by resume failure and boot sequence followed by a string of successful resumes, ended by me trying something different, please let me know.

Comment 60 aaronsloman 2012-09-08 17:00:06 UTC
3.5.3-1.fc17.i686 resumed successfully several more times after pm-hibernate then resume failed for no apparent reason -- just before the final screen update.

So I conclude that my previous successful resumes with this kernel were just due to luck. There's something disconcertingly random involved.

I have reverted to using acpi=off for resume from hibernate.

Comment 61 aaronsloman 2012-09-23 00:48:23 UTC
Installed kernel 3.5.4-1.fc17.i686 and other updates, including xorg-x11-drv-intel-2.20.7-1.fc17.i686 

Result: no change -- mostly fails to complete resume after pm-hibernate

This is on Dell Latitude E6410 laptop, using intel graphics.

So at first I went back back to my old solution, using this new kernel: have two grub.cfg entries, one for booting and one for resume (using acpi=off)

Decided to try a new solution.

I've noticed that acpi=off when resuming makes it boot with only one cpu instead of four, then when the resume/thaw is complete it goes back to 4 cpus, as shown by nproc or lscpu. Using only 1 cpu for decompression does not make much difference to the speed, and it is worth putting up with a little extra time required, in order to have successful resume.

I wonder if the reason acpi=off allows resume to complete is that the reduction to one cpu during boot+resume avoids synchronization problems that cause resume to fail often.

So I looked for a way to make it boot with only 1 cpu, and found that I could replace acpi=off with maxcpus=1 in the grub file for resume from hibernate. This required modifying the edit script referred to in comments #36 and #37
Replace the edit command: s/$/ acpi=off/p
with this
    s/$/ maxcpus=1/p

I have now tried that on my Dell Latitude E6410

I have run pm-hibernate/resume using this method 10 times and resume always worked (as it did reliably with acpi=off). So I'll switch to using maxcpus=1 as my default for resume and will report back later.

Note: on my desktop PC running fedora 16 with kernel 3.3.7-1.fc16.i686  the previous method, using acpi=off, has worked without resume failing for 106 days (as shown by 'uptime'). If the new method proves successful on the laptop, I'll install it on the PC

Comment 62 aaronsloman 2012-09-25 02:23:11 UTC
(In reply to comment #61)
> Installed kernel 3.5.4-1.fc17.i686 and other updates, including
> xorg-x11-drv-intel-2.20.7-1.fc17.i686 
> ....
> So I looked for a way to make it boot with only 1 cpu, and found that I
> could replace acpi=off with maxcpus=1 in the grub file for resume from
> hibernate. 
> ....
> I have run pm-hibernate/resume using this method 10 times and resume always
> worked (as it did reliably with acpi=off). So I'll switch to using maxcpus=1
> as my default for resume and will report back later.
> 
> Note: on my desktop PC running fedora 16 with kernel 3.3.7-1.fc16.i686  the
> previous method, using acpi=off, has worked without resume failing for 106
> days (as shown by 'uptime'). If the new method proves successful on the
> laptop, I'll install it on the PC.

It was successful on the laptop, and I did install it on the PC. So now both laptop and desktop machines use maxcpus=1 instead of acpi=off in grub file for resume from hibernate, and both resume successfully.

Maxcpus=1 affects only the resume process. Once the resume has completed 'nproc' and 'lscpu' show four cpus active. 

pm-hibernate continues to use 3 threads for compression.

The fact that this prevents resume/thaw crashing after hibernate, suggests that a bug in the multi-process decompression code was causing resume to fail.

That happens in both Fedora 16 and Fedora 17 on 32 bit linux with intel graphics and core i5 cpus.

Others have reported problems with thaw/resume using 64-bit linux. I don't know if maxcpus=1 solves the problem for any of them.

Comment 63 aaronsloman 2012-10-03 03:26:41 UTC
Because maxcpus=1 us a far more specific fix than acpi=off I have created a new but report to replace this one:
Bug #862475 "Why do I need maxcpus=1 to resume from pm-hibernate in 32-bit Fedora 16 on Viglen Desktop PC, Fedora 17 on Dell E6410 laptop, both with intel core i5 cpu, intel graphics"

All future comments should go to Bug #862475

Comment 64 Fedora End Of Life 2013-01-16 22:53:00 UTC
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 65 aaronsloman 2013-01-16 23:11:53 UTC
I have left this as F16, even though the problem referred to persists in F17. 

However a more targeted work-around using maxcpus=1 rather than acpi=off is described in this report on the bug:

Bug #862475 - Why do I need maxcpus=1 to resume from pm-hibernate in 32-bit Fedora 16 on Viglen Desktop PC, Fedora 17 on Dell E6410 laptop, both with intel core i5 cpu, intel graphics ?

In Fedora 17, there is still something causing resume from pm-hibernate to crash and reboot immediately after expanding the hibernate image, before restoring the display.
maxcpus=1 stops it crashing on resume, but should not be necessary.

Most recently tested on kernel 3.6.11-1.fc17.i686

Comment 66 Fedora End Of Life 2013-02-14 01:23:52 UTC
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.