Bug 788433 (thaw)
Summary: | Core i7 cannot pm-hibernate/pm-suspend/thaw properly | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Arne Woerner <arne_woerner> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 17 | CC: | a.sloman, bug, burghardt, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, marmalodak, maurizio.antillon, mishu |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-08-01 17:19:33 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 781749 |
Description
Arne Woerner
2012-02-08 08:26:56 UTC
3.2.5-3.fc16.x86_64 has this bug, too: i just did a "find /sys | grep fan" after 2 otherwise successful hibernate/thaw cycles, and my GNOME crashed (b&w text mode with panic messages) and i had to reboot... -arne 3.2.7-1.fc16.x86_64 still allows that bug: kernel BUG at fs/inode.c:429! invalid opcode: 0000 [#1] SMP CPU 1 Pid: 26157, comm: crond Tainted: G I 3.2.7-1.fc16.x86_64 #1 [...] after i changed the following: 1. hdd (WDC WD10EARS-00Y5B1) write cache disabled 10+ seconds before pm-hibernate call (my theory was that the harddisc doesnt flush) 2. HIBERNATE_MODE="shutdown" (before it was "platform") 3. HIBERNATE_RESUME_POST_VIDEO="yes" (before it was commented out) i was able to thaw 7 times without intermediate reboot/panic, which is much more than before (about every 3rd thaw crashed immediately and the others within some hours). w00t -arne hum i forgot to mention, that i also emptied the swap space shortly before hibernate: /sbin/swapoff LABEL=SWP1TB /sbin/swapon -a when i didnt, i got this again: [39770.118546] ------------[ cut here ]------------ [39770.118552] WARNING: at lib/list_debug.c:26 __list_add+0x6d/0xa0() [39770.118554] Hardware name: To Be Filled By O.E.M. [39770.118556] list_add corruption. next->prev should be prev (ffff88022a04cbf8), but was (null). (next=ffff88022a04cbf8). [39770.118557] Modules linked in: tcp_lp bnep bluetooth rfkill ppdev parport_pc lp parport fuse nfs fscache auth_rpcgss nfs_acl lockd nf_conntrack_tftp ipt_LOG nf_conntrack_ipv4 ip6t_REJECT nf_defrag_ipv4 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables coretemp w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore i2c_i801 snd_page_alloc r8169 iTCO_wdt iTCO_vendor_support mii cdc_acm microcode sunrpc uinput i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] [39770.118589] Pid: 455, comm: udevd Tainted: G I 3.2.9-1.fc16.x86_64 #1 [39770.118591] Call Trace: [39770.118597] [<ffffffff8106e53f>] warn_slowpath_common+0x7f/0xc0 [39770.118599] [<ffffffff8106e636>] warn_slowpath_fmt+0x46/0x50 [39770.118602] [<ffffffff812ca9bd>] __list_add+0x6d/0xa0 [39770.118605] [<ffffffff8118ece8>] __d_instantiate+0x58/0xe0 [39770.118607] [<ffffffff8118ef27>] d_instantiate+0x47/0x80 [39770.118610] [<ffffffff811ec473>] sysfs_lookup+0xf3/0x110 [39770.118613] [<ffffffff811845d5>] d_alloc_and_lookup+0x45/0x90 [39770.118615] [<ffffffff81190f05>] ? d_lookup+0x35/0x60 [39770.118618] [<ffffffff81186ba1>] do_lookup+0x2b1/0x3a0 [39770.118620] [<ffffffff81186ff1>] link_path_walk+0x141/0x880 [39770.118624] [<ffffffff8116562c>] ? kmem_cache_alloc_trace+0x10c/0x140 [39770.118627] [<ffffffff8126ea2a>] ? selinux_file_alloc_security+0x4a/0x80 [39770.118630] [<ffffffff81185b1d>] ? path_init+0x2cd/0x3a0 [39770.118633] [<ffffffff81188fa8>] path_openat+0xb8/0x3c0 [39770.118636] [<ffffffff811b5eab>] ? fsnotify_put_event+0x5b/0xa0 [39770.118639] [<ffffffff811893d2>] do_filp_open+0x42/0xa0 [39770.118641] [<ffffffff81184aeb>] ? getname_flags+0x3b/0x260 [39770.118644] [<ffffffff8119510f>] ? alloc_fd+0x4f/0x150 [39770.118647] [<ffffffff81178c57>] do_sys_open+0xf7/0x1d0 [39770.118650] [<ffffffff81178d50>] sys_open+0x20/0x30 [39770.118653] [<ffffffff815eaac2>] system_call_fastpath+0x16/0x1b [39770.118655] ---[ end trace a7919e7f17c0a727 ]--- when i did it again last night, there was no oops... i commented out the HIBERNATE_RESUME_POST_VIDEO line again... -arne that trace is very interesting. We've had other reports of it, and I suspected they might be hibernate related. The finger of blame is pointing at i915 right now, as many people have noted that booting with nomodeset makes their hibernate problems go away. yup but why does swapoff+swapon change anything here then? or is it just coincidential, because the bug doesnt happen everytime? if you could run the kernel-debug build at http://koji.fedoraproject.org/koji/buildinfo?buildID=304798 that might turn up a different trace that might be helpful to us to track this down. (it's going to be considerably slower than the regular build, due to the extra checking). i installed that kernel-debug package... but i cant reboot before tomorrow (ongoing tv recording)... :-) as far as i can c the cores r rather idle... in next night i will test if my hard disc write cache is flushed properly before the disc is powered down... :-) could it be, that "nomodeset" is the cause for an empty swap area? on my box the swap space is almost unused (currently just 60KiB), because it has 8GiB main mem... but it seems to be important that it is 100% unused, when hibernation begins... could it be that my swap area is too big, so that it writes to the wrong parts? # swapon -s Filename Type Size Used Priority /dev/sda3 partition 8388604 60 0 with 3.2.9-1.fc16.x86_64.debug thaw worked good (no oops/panic/crash since hours) without any trick (hdd write cache was on and swap space was not empty when hibernation started, but i used HIBERNATE_MODE "shutdown" instead of "platform")... with 3.2.9-2.fc16.x86_64.debug i could produce a bad kernel panic: it didnt even log it to syslog and i couldnt c the head of the oops... and i just terminated a process that was started shortly after thaw... now i will again empty the swap space before hibernate and c if it crashes... emptying the swap space is no workaround... it crashed today... now i test tuxonice from the atrpms repo... tuxonice causes crashes, too... [mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update. [mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update. [mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update. [Mass hibernate bug update] Dave Airlied has found an issue causing some corruption in the i915 fbdev after a resume from hibernate. I have included his patch in this scratch build: http://koji.fedoraproject.org/koji/taskinfo?taskID=3940545 This will probably not solve all of the issues being tracked at the moment, but it is worth testing when the build completes. If this seems to clear up the issues you see with hibernate, please report your results in the bug. seems to work on my box now... no hibernate related crashes since i updated to 3.3.0-4... it did it again (after i used a non-debug kernel again *blush*): kernel:[212214.655575] ------------[ cut here ]------------ kernel:[212214.655603] kernel BUG at mm/huge_memory.c:2394! kernel:[212214.655624] invalid opcode: 0000 [#1] SMP kernel:[212214.655645] CPU 1 kernel:[212214.655654] Modules linked in: binfmt_misc tcp_lp ppdev parport_pc lp parport fuse bnep bluetooth rfkill nfs fscache auth_rpcgss nfs_acl ip6t_REJECT nf_conntrack_tftp nf_conntrack_ipv6 nf_defrag_ipv6 ipt_LOG nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables lockd coretemp w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd iTCO_wdt iTCO_vendor_support r8169 mii microcode soundcore snd_page_alloc i2c_i801 uinput sunrpc i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] kernel:[212214.655926] kernel:[212214.655936] Pid: 9468, comm: chrome Not tainted 3.3.0-8.fc16.x86_64 #1 To Be Filled By O.E.M. To Be Filled By O.E.M./H61M-ITX kernel:[212214.655985] RIP: 0010:[<ffffffff81175aa3>] [<ffffffff81175aa3>] __split_huge_page_pmd+0xc3/0xf0 kernel:[212214.656028] RSP: 0018:ffff88017fa2fcc8 EFLAGS: 00010282 kernel:[212214.656050] RAX: 80000000028000e7 RBX: ffff88022f7e2d80 RCX: 0000000000000000 kernel:[212214.656079] RDX: 0000000000000001 RSI: ffff88022979ab98 RDI: 80000000028000e7 kernel:[212214.656108] RBP: ffff88017fa2fce8 R08: ffff8801590dcff0 R09: 0000000000000100 kernel:[212214.656136] R10: 0000000000000004 R11: 0000000000000206 R12: ffffea0001020000 kernel:[212214.656167] R13: ffff8802255ad390 R14: ffff8802255ad390 R15: 0000000000000000 kernel:[212214.656194] FS: 00007f78c5988980(0000) GS:ffff88023fa40000(0000) knlGS:0000000000000000 kernel:[212214.656215] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel:[212214.656231] CR2: 00007f78cc92ba24 CR3: 000000010b50c000 CR4: 00000000000406e0 kernel:[212214.656250] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel:[212214.656270] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 kernel:[212214.656290] Process chrome (pid: 9468, threadinfo ffff88017fa2e000, task ffff8802344f4590) kernel:[212214.656311] Stack: kernel:[212214.656318] 0000000000000008 00007f78ce48f000 00007f78ce48f000 00007f78ce49e000 kernel:[212214.656342] ffff88017fa2fe08 ffffffff81145cda ffff88023fdece00 0000000000000000 kernel:[212214.656373] 000000007fa2fd38 ffffffff8112d0bd ffff88017fa2fd38 ffff88017fa2fe80 kernel:[212214.656404] Call Trace: kernel:[212214.656415] [<ffffffff81145cda>] unmap_vmas+0x8aa/0x900 kernel:[212214.656432] [<ffffffff8112d0bd>] ? update_page_reclaim_stat+0x2d/0x70 kernel:[212214.656450] [<ffffffff8112d87c>] ? __pagevec_lru_add+0x1c/0x20 kernel:[212214.656468] [<ffffffff81145dd2>] zap_page_range+0xa2/0xd0 kernel:[212214.656485] [<ffffffff815f7720>] ? do_page_fault+0x200/0x4f0 kernel:[212214.656501] [<ffffffff81142c96>] sys_madvise+0x296/0x740 kernel:[212214.656517] [<ffffffff815fbca9>] system_call_fastpath+0x16/0x1b kernel:[212214.657205] Code: 89 e7 e8 41 7e fb ff 49 8b 7d 00 48 89 f8 66 66 66 90 a8 80 75 15 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 66 83 43 6c 0 1 eb eb <0f> 0b e8 a0 70 47 00 4c 89 e7 0f 1f 00 e8 5b 7e fb ff 84 c0 75 kernel:[212214.658962] RIP [<ffffffff81175aa3>] __split_huge_page_pmd+0xc3/0xf0 kernel:[212214.659895] RSP <ffff88017fa2fcc8> kernel:[212214.711344] HDMI hot plug event: Codec=3 Pin=7 Presence_Detect=0 ELD_Valid=1 kernel:[212214.712166] HDMI status: Codec=3 Pin=7 Presence_Detect=0 ELD_Valid=0 kernel:[212214.883850] HDMI hot plug event: Codec=3 Pin=7 Presence_Detect=1 ELD_Valid=1 kernel:[212214.884647] HDMI status: Codec=3 Pin=7 Presence_Detect=1 ELD_Valid=1 kernel:[212215.093331] ---[ end trace aaf1961c9b232428 ]--- 3.3.0-8 still had the hibernate memory corruption bug. Update to the latest 3.3.1 build, and see if it's still reproducible. yup - i did that already before it was in updates-testing... :-) running 3.3.1-3.fc16.x86_64 currently... no crash or oops with 3.3.1 until today... btw: the cursor is still blinking when it goes to text mode... or is it turned off shortly before the power goes down? or did i misunderstand sth? (In reply to comment #19) > yup - i did that already before it was in updates-testing... :-) > running 3.3.1-3.fc16.x86_64 currently... > no crash or oops with 3.3.1 until today... Erm... but your crash in comment #17 was clearly not from 3.3.1. So do you have another crash you haven't reported with 3.3.1? yes, my last crash was with a pre-3.3.1 kernel (3.3.0-8)... my "yup" was to copy ur "update" counsel... then i wanted to say, that there was _no_ crash/oops with 3.3.1 until today... sorry - i m no native speaker... there was no hibernate/thaw related crash since i updated to 3.3.1 until today (i hibernate 9 times per week). no hibernate/thaw related crash since i updated to 3.3.1 until today... it is back with 3.3.4-3.fc17.x86_64... it didnt complete thaw today but rebooted automatically... (In reply to comment #24) > it is back with 3.3.4-3.fc17.x86_64... > it didnt complete thaw today but rebooted automatically... Your symptoms sound similar to my experiences with 32-bit fedora 16 on both a desktop PC (intel graphics and core i5) and dell latitude E6410 notebook (also intel graphics and core i5). I am currently using 3.3.4-3.fc16.i686 on both machines. I depend a lot on pm-hibernate (previously tuxonice, but that no longer works for me). My normal mode of working is to boot up very rarely. When I do it's in level 3 (no graphics) which is useful for installing updates, or doing checking. I then run 'startx' to invoke window manager, either openbox or ctwm (I prefer the latter though some would find it old-fashioned). Then usually at least twice a day I use pm-hibernate instead of shutdown. That way I can go for weeks or months without a reboot, and all my ten virtual desktops with various unfinished tasks keep their state. The laptop is sometimes left hibernated for several days, as I mainly use it for seminars and when travelling. That mode of working served me well for several years with two previous dell laptops and my previous desktop PC. But in the last couple of years there have been serious problems with hibernate, apparently related to the i915 module. One major problem recently fixed was that every now and again hibernate would fail to complete, requiring a forcible shutdown and reboot. That problem went away a month or two ago -- a very great improvement. Since then I have had your problem on both machines -- thaw after hibernate sometimes works, but not always: in the exception cases it get very near the end of resuming (shown by the percentage display) and then the screen goes blank and it reboots. There's no record in /var/log to indicate what went wrong. The frequency with which this unwanted reboot happens seems to change with new kernel releases. A couple of times, after a kernel update, I thought the problem had gone away because hibernate/resume had worked on both machines for several days. Then one, and later the other, machine would fail to resume. The problem seems to be worse on the laptop: failure to complete thaw is more common. I have found a workaround that I can live with though it is a nuisance when there's a kernel upgrade. I have two menu entries in /boot/grub2/grub.cfg the default labelled RESUME at the top of the list and the alternative labelled BOOT as the next option. The only difference is that in the RESUME case I add to the 'linux ... ' line at the end 'acpi=off'. I can't do that when booting as too many things stop working (e.g. screen brightness control, cpu throttling, checking state of battery, and others). But if I have it when resuming from hibernate there seem to be no detectable effects except that resume always works. (See also Bug #806315) This is tolerable except that kernel updates are a great nuisance. I have to manually edit grub.cfg to recreate the two entries, then boot into the new kernel, then ensure that the default is set to boot with the acpi=off switch before I first use pm-hibernate. I have not found any documentation that helps me understand what's going on, and how acpi=off can help. (I discovered the tip buried in a file on the internet, but can't recall where. But it gave no explanation.) Like you I thought for a while that the failure to resume properly might have something to do with the swap area being too small or too big or needing to be refreshed, but eventually ruled that out. My ideal solution would be for someone to alter the resume code to check if i915 module is in use, and if so resume with acpi=off (or its equivalent) -- perhaps re-setting it after resume has completed. But I am not a kernel developer and could not contribute to that. today thaw-ing crashed my box again... it didnt even log a single syslog message and rebooted automatically... but it seems like, it logged some messages during hibernation that should have been logged on thaw: good hibernation+thaw: May 22 23:15:34 vaako NetworkManager[562]: <info> (eth0): carrier now OFF (device state 10) May 22 23:15:34 vaako dbus[586]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' May 22 23:15:34 vaako dbus-daemon[586]: dbus[586]: [system] Activating service name='org.freedesktop.nm_dispatcher' (using servicehelper) May 22 23:15:34 vaako dbus-daemon[586]: dbus[586]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' May 22 23:15:37 vaako kernel:[711700.424187] HDMI hot plug event: Codec=3 Pin=7 Presence_Detect=0 ELD_Valid=1 May 22 23:15:37 vaako kernel:[711700.424232] HDMI status: Codec=3 Pin=7 Presence_Detect=0 ELD_Valid=0 May 22 23:15:37 vaako kernel:[711700.602662] HDMI hot plug event: Codec=3 Pin=7 Presence_Detect=1 ELD_Valid=1 May 22 23:15:37 vaako kernel:[711700.602705] HDMI status: Codec=3 Pin=7 Presence_Detect=1 ELD_Valid=1 May 23 06:01:10 vaako kernel:[711700.815770] PM: Syncing filesystems ... done. May 23 06:01:10 vaako kernel:[711700.877306] Freezing user space processes ... (elapsed 0.01 seconds) done. May 23 06:01:10 vaako kernel:[711700.888846] PM: Preallocating image memory... bad hibernation+thaw: May 23 23:42:35 vaako NetworkManager[562]: <info> (eth0): carrier now OFF (device state 10) May 23 23:42:38 vaako kernel:[775213.309193] HDMI hot plug event: Codec=3 Pin=7 Presence_Detect=0 ELD_Valid=1 May 23 23:42:38 vaako kernel:[775213.309248] HDMI status: Codec=3 Pin=7 Presence_Detect=0 ELD_Valid=0 May 23 23:42:38 vaako kernel:[775213.484684] HDMI hot plug event: Codec=3 Pin=7 Presence_Detect=1 ELD_Valid=1 May 23 23:42:38 vaako kernel:[775213.484727] HDMI status: Codec=3 Pin=7 Presence_Detect=1 ELD_Valid=1 May 23 23:42:38 vaako kernel:[775213.697724] PM: Syncing filesystems ... May 23 23:42:38 vaako kernel:[775213.784723] HDMI status: Codec=3 Pin=7 Presence_Detect=1 ELD_Valid=1 May 23 23:42:38 vaako kernel:[775213.788206] HDMI: detected monitor H22-1W at connection type HDMI May 23 23:42:38 vaako kernel:[775213.788208] HDMI: available speakers: FL/FR May 23 23:42:38 vaako kernel:[775213.788210] HDMI: supports coding type LPCM: channels = 2, rates = 32000 44100 48000, bits = 16 20 24 May 24 06:01:20 vaako kernel:imklog 5.8.10, log source = /proc/kmsg started. today it crashed again during thaw... after the first hibernation after a fresh boot to kernel 3.4.2-4.fc17.x86_64... last night it failed to hibernate: 1. the left monitor went into standby, 2. but the right monitor presented a non-blinking cursor, 3. and the keyboard caps-lock lamp blinked... :-) 3.4.4-3.fc17.x86_64 crashed and auto-rebooted on thaw today... 3.4.5-2.fc17.x86_64 just failed to hibernate: 1. the left monitor went into standby, 2. but the right monitor presented a non-blinking cursor (last night it showed funny error messages, that were not logged in /var/log/messages or /var/log/pm-suspend.log, but it hibernated+thawed properly), 3. and the keyboard caps-lock lamp blinked, :-) 4. pm-hibernate start-up coincided with the start-up of my every-5min-incremental-backup-script (mostly: tar, find and bzip2)... today during hibernation it said on all ttys (pts/...) "kernel:[313621.835438] do_IRQ: 0.108 No irq handler for vector (irq -1)" but it thaw-ed properly... i use that acpi=off on thaw (but on on boot) trick of aaronsloman, too, now... :-) (In reply to comment #31) > today during hibernation it said on all ttys (pts/...) > "kernel:[313621.835438] do_IRQ: 0.108 No irq handler for vector (irq -1)" > > but it thaw-ed properly... > i use that acpi=off on thaw (but on on boot) trick of aaronsloman, too, > now... :-) Noticed a typo: that should be "(but NOT on boot)". It has been working fine for me for some time, on both Dell Latitude E6410 and also Desktop PC, both running Fedora 16 (32 bit). However I recently discovered a problem, reported in Bug #842291, namely the sequence 1. pm-hibernate 2. power up and boot into Windows 7 3. in windows do restart 4. resume linux from grub menu prevents the next pm-hibernate from working. It just fails to hibernate and returns to the state before the hibernate command. Then the only option is to shutdown completely and reboot. I eventually discovered that the failure to hibernate can be avoided by using shutdown instead of restart in windows in step 3. It took me a long time to work this out, so I thought I should pass on the warning! when i wrote "(but on on boot)" i meant "(but acpi=on on boot)"... :-) is there anybody who uses Windows 7? *rotfl* (In reply to comment #33) > when i wrote "(but on on boot)" i meant "(but acpi=on on boot)"... :-) Sorry. Brain slow late at night (UK). > is there anybody who uses Windows 7? *rotfl* Two or three times a year! This time I received a file in docx format, which Libreoffice could not read. Tried MS document reader on Win7, and it also failed, but offered to download a converter, which successfully converted to odt, to my surprise. 3.4.6-2.fc17.x86_64 just failed to hibernate... 1. the caps lock lamp blinked 2. the hard disc made funny noise 3. and just 1 of 2 monitors fell asleep 4. when i pressed the reset button it rebooted (it did not thaw) inspite of that acpi=off-on-thaw trick... 3.5.0-2.fc17.x86_64 doesnt hibernate at all... tried it 3 times... after some time it had a kernel oops... e. g. about a LOCKUP of cpu7... Sounds as if this may be a recurrence of, or related to, an old hibernate bug fixed a few months ago referenced in #789708 and #785384 . I am still using 32 bit fedora 16 kernel 3.4.6-1.fc16.i686 (the latest available to me). I have no problems with hibernate (unless I forget to unplug my dvb-tv usb dongle before hibernating, in which case it hibernates without shutting down completely - power remains on). Resume still requires acpi=off (In reply to comment #35) > 3.4.6-2.fc17.x86_64 just failed to hibernate... > 1. the caps lock lamp blinked > 2. the hard disc made funny noise > 3. and just 1 of 2 monitors fell asleep > 4. when i pressed the reset button it rebooted (it did not thaw) > inspite of that acpi=off-on-thaw trick... I now have kernel 3.4.7-1.fc16.i686 on my core i5 laptop (Dell E6410) and hibernate works fine. Also resume/thaw with acpi=off. Without that, resume still fails and leads to reboot. Perhaps there has been some change in FC17 not included in FC16 which interferes with hibernate. 'uptime' shows that my core i5 desktop PC, still on kernel 3.3.7-1.fc16.i686, has been hibernating and resuming (with acpi=off) without problems since 22 May 2012, often hibernating and resuming several times in one day, except for a few days when I've been away. Because of the convenience of never rebooting I've disabled kernel upgrades on that machine. kernel 3.5.1-1.fc17.x86_64 cant hibernate here... *sob* the kernel oops said something about the swapper process and the stack trace something about intel_idle() and then a lot of other functions about "idle"... kernel-3.5.2-1.fc17.x86_64 cant hibernate... That's strange. I now have 3.5.2-1.fc17.i686 installed on my core i5 laptop (on which I've recently upgraded the bios and a few other things provided on the Dell website), and pm-hibernate works perfectly for me, as it has been doing for months. I am still having trouble resuming, however, unless I add acpi=off to the boot menu options (only for reboot). 1. u run a 32bit kernel on ur i5? isnt it a 64bit CPU? 2. with "(only for reboot)" u mean "(only for thaw)"? (In reply to comment #42) > 1. u run a 32bit kernel on ur i5? isnt it a 64bit CPU? It's a 64 bit cpu, but I don't need a 64 bit linux -- for my usage it would waste memory and add to compatibility problems, so I use 32 bit fedora which runs fine on 64 bit cpu with 32 bit support. > 2. with "(only for reboot)" u mean "(only for thaw)"? Sorry, I mistyped 'resume' (=thaw) as 'reboot'. Apologies for confusion. 1. oki... maybe a 32bit kernel is easier... :-) 2. np i found that it sometimes fails to thaw some CPUs: vaako kernel:[40619.352527] Disabling non-boot CPUs ... vaako kernel:[40619.354515] CPU 1 is now offline vaako kernel:[40619.356723] CPU 2 is now offline vaako kernel:[40619.358708] CPU 3 is now offline vaako kernel:[40619.361177] CPU 4 is now offline vaako kernel:[40619.362804] CPU 5 is now offline vaako kernel:[40619.363734] Broke affinity for irq 23 vaako kernel:[40619.363743] Broke affinity for irq 44 vaako kernel:[40619.364798] CPU 6 is now offline vaako kernel:[40619.366331] CPU 7 is now offline vaako kernel:[40619.366613] Extended CMOS year: 2000 vaako kernel:[40619.366694] PM: Creating hibernation image: vaako kernel:[40619.455782] PM: Need to copy 470555 pages vaako kernel:[40619.367849] Extended CMOS year: 2000 vaako kernel:[40619.368335] microcode: CPU0 updated to revision 0x28, date = 2012-04-24 vaako kernel:[40619.368367] Enabling non-boot CPUs ... vaako kernel:[40619.368429] Booting Node 0 Processor 1 APIC 0x2 vaako kernel:[40619.381744] NMI watchdog: enabled, takes one hw-pmu counter. vaako kernel:[40619.382179] microcode: CPU1 updated to revision 0x28, date = 2012-04-24 vaako kernel:[40619.382182] CPU1 is up vaako kernel:[40619.382267] Booting Node 0 Processor 2 APIC 0x4 vaako kernel:[40624.405679] CPU2: Not responding. vaako kernel:[40624.405913] Error taking CPU2 up: -5 vaako kernel:[40624.405994] Booting Node 0 Processor 3 APIC 0x6 vaako kernel:[40629.431685] CPU3: Not responding. vaako kernel:[40629.432149] Error taking CPU3 up: -5 vaako kernel:[40629.432320] Booting Node 0 Processor 4 APIC 0x1 vaako kernel:[40634.485766] CPU4: Not responding. vaako kernel:[40634.485909] Error taking CPU4 up: -5 vaako kernel:[40634.485980] Booting Node 0 Processor 5 APIC 0x3 vaako kernel:[40639.542151] CPU5: Not responding. vaako kernel:[40639.542270] Error taking CPU5 up: -5 vaako kernel:[40639.542347] Booting Node 0 Processor 6 APIC 0x5 vaako kernel:[40644.601686] CPU6: Not responding. vaako kernel:[40644.601804] Error taking CPU6 up: -5 vaako kernel:[40644.601888] Booting Node 0 Processor 7 APIC 0x7 vaako kernel:[40644.616176] NMI watchdog: enabled, takes one hw-pmu counter. vaako kernel:[40644.616673] microcode: CPU7 updated to revision 0x28, date = 2012-04-24 vaako kernel:[40644.616677] CPU7 is up kernel 3.5.3-1.fc17.x86_64 just tried to kill the idle task during pm-hibernate (before it fell asleep)... that caused a kernel oops... it doesnt do that when i try to hibernate without graphix from single user mode... I thought that being able to insert "acpi=off" on thaw would allow my machine to hibernate but doing that uncovered new issues. Should these issues perhaps be broken into other bugs? I haven't been able to isolate all the symptoms, but these are some of the things I've seen: 1. machine generates ext4 error messages, sometimes requires fsck on boot 2. sometimes doesn't actually powerdown the computer 3. operates very slowly when the GUI apps are being thawed the 3.5.4-1.fc17.x86_64 kernel hates me 2... does somebody know why that is? intellinux refuses to say if they can reproduce it on their boxes... :-) I am using 32-bit version of this kernel on Dell Latitude E6410. Hibernate/thaw had been working with acpi=off used in grub before thaw. Various people told me that was overkill. However, one of its effects seemed to be to allow only one cpu to be active during the resume process. So I've now changed grub for resume to include maxcpus=1 instead of acpi=off, and this seems to be successful, as described in bug #806315 comment 61 I have no idea whether this will generalise to Core i7 + 64 bits. um... but it doesnt even hibernate with 3.5... should i try to hibernate with 1 cpu? i will try that maxcpu trick with 3.4... i doesnt hibernate with 3.5, even when i turn off all but 1 cpus... :-) then it complained about some exception in the lzo compression thingies... (In reply to comment #50 and #51) > um... but it doesnt even hibernate with 3.5... Apologies: I should have checked what you meant by Comment #48 > should i try to hibernate with 1 cpu? I don't know how to give pm-hibernate an instruction to use only 1 cpu. 'maxcpus=1' is used as a boot flag, so it can work only for boot/thaw, not hibernate, unless I've misunderstood something. I guess if you use that flag for full boot that will restrict your machine to only 1 cpu. I don't know what effects that could have. Anyhow, I have had no trouble with pm-hibernate completing successfully with all cpus available, since 24th May 2012, using Fedora 16 (on desktop and laptop machines) and more recently Fedora 17 (only on laptop). But I use 32 bit linux and have core i5, not i7. I don't have enough expertise to know whether the difference between 32-bit fedora nd 64-bit fedora or between i5 and i7 could explain your problems with hibernate. It could be a motherboard problem, or might be fixable with bios update, which others suggested to me when I was having trouble because hibernate failed. There's extensive discussion of hibernate (not resume) problems in Bug #785384 reporting work done mainly by Bojan Smojver to make it work. The last complaint there was reported was in July, but turned out to be a different problem. See comment 125 in that bug report. > i will try that maxcpu trick with 3.4... Just in case that wasn't a typo: it's 'maxcpus' not 'maxcpu' (In reply to comment #51) > i doesnt hibernate with 3.5, even when i turn off all but 1 cpus... :-) I suspect anyone working on this problem will need to know exactly what you did to 'turn off all but 1 cpus'. Otherwise nobody can replicate your test. > then it complained about some exception in the lzo compression thingies... Since hibernate bugs seem to have been fixed for other Fedora users around May 2012, it may be a good idea for you to start a new bugreport dealing with pm-hibernate only (resume still has bugs others are reporting, and I don't know about suspend). If you give full details of your hardware configuration, bios revision, the boot parameters you are using, hibernate commands you use, kernels you have tried, any error reports you get, it may be possible for someone to work out which difference accounts for problems you have and others do not. Just a suggestion! In my case pm-hibernate has worked flawlessly on PC and laptop. On PC running F16 with kernel 3.3.7-1.fc16.i686 #1 SMP Tue May 22 14:14:30 UTC 2012 'uptime' records 102 days without reboot, with a total of about 150 successful hibernates recorded in /var/log/messages* During that time I have used acpi=off for resume except for the last few resumes, when I used maxcpus=1. Without one or other of those boot flags in grub.cfg when restarting from hibernate, resume sometimes succeeds and sometimes fails (always at the point where graphical screen should be restored). To me that suggests a synchronisation bug in resume. But pm-hibernate always reports using 3 threads to compress on both my machines, and seems to be flawless now. oki so this bug report is about "why does thaw need that acpi=off/maxcpus=1 trick?"? and the new one will b about "why doesnt 3.5 hibernate?"? See Bug #862475 - Why do I need maxcpus=1 to resume from pm-hibernate in 32-bit Fedora 16 on Viglen Desktop PC, Fedora 17 on Dell E6410 laptop, both with intel core i5 cpu, intel graphics? I have no idea whether there's some difference between Core i5 and Core i7 that produces different behaviours. So I phrased the new bugreport entirely in terms of i5. i dont have access to that box anymore... -arne This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. I reproduced this today. $ uname -r 3.9.6-200.fc18.x86_64 Both pm-hibernate and suspend (usually invoked by shutting the lid) have worked consistently for me since mid June in F18 on a core i5 machine (Dell Latitude E6410) 3.9.6-200.fc18.x86_64 #1 SMP Thu Jun 13 18:56:55 UTC 2013 Perhaps there's a difference between core i5 and i7. [This is my first 64bit linux and I've been very impressed at how smoothly it also supports 32 bit programs.] On the same machine suspend was very unreliable in 32-bit F17 (i.e. usually resume failed) and resume from hibernate worked only with 'maxcpus=1' in grub.cfg I don't know if my issue is related to this bug. On a toshiba Z930 laptop (i7) second suspend fails freezing the system, while first suspend from a fresh boot goes fine. After searching a bit, this seems related to an ACPICA bug: https://github.com/acpica/acpica/commit/34f226fa2643f1d2e6527ea4edb24947cfe1fb6a that was fixed on 20130626 release. As far I know, this release has neither been merged in the 3.9.9 nor in the 3.10 kernel. Maybe the patch could be applied in the next Fedora kernel? so this bug is still active? should we (a) bump the Fedora version of this bug to FC18? or (b) make a new bug report (because I do not have a Core i7 anymore)? :-) -Arne In my experience the bug is alive across Fedora and kernel versions. I installed Fedora 18 a few months ago, upgraded regularly until 19 release (via fedup) and it's still here. IMHO it does not seems to be tied to a particular processor, maybe to the chipset and/or the bios. Maybe the assignee should decide about the classification of the bug. Mario (In reply to aaronsloman from comment #58 on 2013-07-05) > Both pm-hibernate and suspend (usually invoked by shutting the lid) have > worked consistently for me since mid June in F18 on a core i5 machine (Dell > Latitude E6410) This is still true. Now on kernel: 3.9.9-201.fc18.x86_64 Still no problem with either pm-hibernate (sometimes used several times a day) or suspend triggeed by shutting lid. When I was using Fedora 17 (32 bit) suspend usually failed to resume, and resume from pm-hibernate required maxcpus=1, which was a nuisance, but made it totally reliable for me. But both just work normally now. Could the persistent bug(s) be hardware dependent? (Upgrading to F18 caused serious NetworkManager problems for me with Enterprise wifi, because of altered security mechanisms and wicd was unusable, but I think NM works now after I found out, by chance, which files to edit -- only partially tested. Everything else seems to be fine.) Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |