Bug 628897
Summary: | resuming from suspend doesn't work | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Rodd Clarkson <rodd> |
Component: | kernel | Assignee: | John Feeney <jfeeney> |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 14 | CC: | anton, awilliam, dougsland, gansalmon, itamar, jfeeney, jonathan, kernel-maint, kmcmartin, madhu.chinakonda, mcepl, mishu, vabibiz, wdc, xgl-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-08-16 19:30:05 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Rodd Clarkson
2010-08-31 10:26:50 UTC
I'm guessing this is a radeon driver problem, we now have multiple reports of suspend/resume broken on radeon in 2.6.34 and 2.6.35. Can you try to debug further using these instructions? http://fedoraproject.org/wiki/KernelCommonProblems#Suspend.2FResume_failure I've tried these instructions but the results I get are somewhat jumbled. Firstly, the instructions lack clarity, and maybe this is my problem. They read: "After the hang, reboot, boot up again." but I'm not sure if this means: "After the hang, reboot the system" or am I meant to reboot the system and then boot again (and if so shouldn't that be reboot again?) How do I do this? What I did. I've just held down the power button until the system shut itself down and then started the system using the power button. This then booted back into f14. From here I logged in as a regular user and opened a terminal window. Then I tried: $ dmesg | grep "hash matches" hash matches drivers/base/power/main.c:520 button PNP0C0D:00: hash matches And finally: $ find /sys/bus/pci/drivers/ -name "PNP0C0D:00:" which didn't result in any results. I'm on the couch all day today, so feel free to badger me for more feedback. (In reply to comment #3) > > $ dmesg | grep "hash matches" > hash matches drivers/base/power/main.c:520 > button PNP0C0D:00: hash matches > > > And finally: > $ find /sys/bus/pci/drivers/ -name "PNP0C0D:00:" > Yeah, that's the ACPI lid switch, in /sys/bus/acpi/devices *** Bug 615560 has been marked as a duplicate of this bug. *** This isn't a xorg-x11-drv-ati bug. I can trigger this from runlevel 3 (or it's new equivalent.) I've tried modprobe -r <module> removing quite a few modules and this has not benefit. The system still won't resume. I have the same problem with hibernate/resume as I do with suspend/resume. In neither case does the system resume properly and the outcome has the same symptoms. However, when resuming for hibernate I get a little output on the screen on resume which might help isolate where the failure occurs. I haven't got a photo of the first 'clue' and I haven't seen it since, but I saw a bunch of output on the screen when resuming about problems with ucb. Unfortunately I can't be more specific than this. Created attachment 443036 [details]
Text seen on resume from hibernate (image 1)
This is text seen before resume from hibernate failed.
Created attachment 443037 [details]
Text seen on resume from hibernate (image 1)
Text seen on resume after hibernate. This was just before the system locked up.
Hopefully it gives some clue to where the system gets to before failing.
The time between the two images above was about 12 seconds. In http://fedoraproject.org/wiki/KernelCommonProblems#Suspend.2FResume_failure it suggests: * If the system fails to resume, see if the system is locked up completely by hitting the caps lock key. * If the capslock light doesn't toggle, or the failure is during suspend, try again, but this time before suspending, activate the pm_trace functionality with echo 1 > /sys/power/pm_trace. <snip> Actually, the problem isn't the system failing to suspend, it's failing to resume. Is this the same thing. I've tried this option anyway (as you can see above) and the only real result is that I have to fsck a bunch of /dev/sda[x] devices to get the timestamps right again. * Try rmmod'ing various modules before doing the suspend. If this makes things work again, retry with a smaller set of modules unloaded. Keep retrying until you narrow down which module is to blame. I've tried this too. Below is a list of all the modules running and after switching to runlevel3 I am able to remove (using modprobe -r) all the starred modules. Regardless the system freezes immediately after "Loading image data pages". fuse sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables * uinput * snd_hda_codec_atihdmi * snd_hda_codec_idt * snd_hda_intel * snd_hda_codec * snd_hwdep * btusb * bluetooth * snd_seq * snd_seq_device * snd_pcm * snd_timer * uvcvideo * snd * tg3 * dell_laptop * iTCO_wdt * iTCO_vendor_support * rfkill * microcode * videodev * v4l1_compat * v4l2_compat_ioctl32 * soundcore * joydev * i2c_i801 * snd_page_alloc * dcdbas * dell_wmi * wmi ipv6 * firewire_ohci * sdhci_pci * sdhci * firewire_core * crc_itu_t * mmc_core * video * output radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core Can you try -52 from koji? http://koji.fedoraproject.org/koji/buildinfo?buildID=193668 And do this as root before trying to suspend with that kernel: echo 0 > /sys/power/pm_async (pm_async will be disabled in -53) Tried -52 and had no luck. Just to be sure I did the right thing. I logged in as me (username rodd) and then opened a terminal. I then run 'sudo su -' to become root and then I ran 'echo 0 > /sys/power/pm_async'. After this I selected System > Shutdown > Suspend resume failed I also tried suspending from gdm but had no better luck (this time with async still enabled). Do you want me to try being in run level 3 and removing all the modules I can and turning off async or is this just a waste of time. Tried -54 and this doesn't work either. What other feedback can I supply? I'm happy to work toward addressing this, but I'm going to need to idea what more to tell you. Is there some backtrace I can run that will log to somewhere? Any chance this bug has something to do with these problems? https://bugs.launchpad.net/ubuntu/+source/linux/+bug/553498 From the bug: "This particular "hang on resume" fix relates only to a specific problem relating to the SCI_EN bit which seems to be peculiar to Intel Core i3, Core i5, and Core i7 CPU's" Has the been included in 2.6.34. I know that this bug mostly refers to 2.6.32, but some mention of 2.6.34 is made. Or could it be related to this: http://www.kernelpodcast.org/2010/05/10/20100418-linux-kernel-podcast/ From the page: "The implementation of this complex VMA tracking was suffering from a bug that Borislav Petkov kept hitting in performing a suspend/resume cycle on his system" "This isn't a xorg-x11-drv-ati bug. I can trigger this from runlevel 3 (or it's new equivalent.)" you're still using the kernel mode setting. Test runlevel 3 with 'nomodeset' kernel parameter. (In reply to comment #16) > Any chance this bug has something to do with these problems? > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/553498 > > From the bug: "This particular "hang on resume" fix relates only to a specific > problem relating to the SCI_EN bit which seems to be peculiar to Intel Core i3, > Core i5, and Core i7 CPU's" > > Has the been included in 2.6.34. I know that this bug mostly refers to 2.6.32, > but some mention of 2.6.34 is made. That fix went into 2.6.32.17 and 2.6.34.2: acpi-unconditionally-set-sci_en-on-resume.patch (In reply to comment #17) > Or could it be related to this: > > http://www.kernelpodcast.org/2010/05/10/20100418-linux-kernel-podcast/ > > From the page: "The implementation of this complex VMA tracking was suffering > from a bug that Borislav Petkov kept hitting in performing a suspend/resume > cycle on his system" And that went 2.6.34-rc4: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=ea90002b0fa7bdee86ec22eba1d951f30bf043a6 How are you suspending the machine? By closing the lid and then opening it? Have you tried different methods than whatever you're using? I haven't tried nomodeset so will report back on this soon. As for how I've suspended the machine, the problem occurs both with suspend or hibernate and it doesn't seem to matter how I trigger it. Close the lid, press the power button, run pm-suspend or pm-hibernate as root from a terminal. Suspend from a logged in user, from gdm or from a tty term in runlevel3 or runlevel5. I've just tried nomodeset and it doesn't help. Interestingly, I also can't log in from gdm either. It starts to log in and then throws me back to gdm again. that's not particularly interesting, I wouldn't expect X to work at all well in that configuration. likely you're getting the vesa driver and it can't cope with some desktop acceleration thing you have enabled. it was purely for console suspend testing. Created attachment 446581 [details]
Output from lspci: This configuration DOES suspend with the 2.6.34.6-54 kernel update.
For what is worth, I've attached lspci output from my system.
The 2.6.34.6-54 kernel update corrects the suspend/resume problems I was having that began when 2.6.33.6-147 (working) was updated to 2.6.34.6-47 (broken).
Tried -55 with no luck. You really need to get the i915 driver completely out of the picture to see if that's the source of the problem. nomodeset + runlevel 3 should do it, or blacklisting the i915 driver. The system needs to be in plain VGA text mode. I don't understand the reference to the i915 driver. I've got an ATI graphics card so the i915 driver shouldn't be loaded should it? I'll give it a try tomorrow anyhow and see how it goes. Of course, it should be radeon driver from xorg-x11-drv-ati package. The rest of Chuck's comment is correct. I started my system with nomodeset + runlevel 3 and it's started with the old 'big text' style terminal, but I'm pretty sure that the radeon module was still loaded. Should I try to rmmod this, or can I start it without this stuff starting. Starting with 'nomodeset 3' didn't improve resuming at all, but maybe I need to have the radeon stuff not loaded. I've loaded -56 with 'nomodeset 3' I've then run 'modprobe-r radeon' and pm-suspend with no improvement. I then loaded -56 again with 'nomodeset 3' I then removed as many modules as possible and then ran pm-suspend with no improvement. This is what was running when I tried to suspend: Module Size Used by sunrpc 198573 1 cpufreq_ondemand 8764 4 acpi_cpufreq 7693 1 freq_table 3955 2 cpufreq_ondemand,acpi_cpufreq ipt_MASQUERADE 2296 1 iptable_nat 4890 1 nf_nat 19999 2 ipt_MASQUERADE,iptable_nat ip6t_REJECT 4111 2 nf_conntrack_ipv6 17856 10 ip6table_filter 1671 1 ip6_tables 17580 1 ip6table_filter ipv6 275768 46 ip6t_REJECT,nf_conntrack_ipv6 Can I remove any of these modules and how? I've just grabbed opensuse-11.3 live x86_64 gnome cd and booted from this. On this distro, suspend and (more importantly) resume works fine on my laptop. opensuse is running 2.6.34-12-desktop. what can we learn from this? what more information can I supply you to track down the issue? how do we compare these two kernels to see what's different? Is it possible to boot from the opensuse kernel on my system to see if this resolves the issue? When you see the black screen, can you change TTY sessions? I notice that my F14 machine gets stuck here, however I can ctl+alt+f2, and then ctl+alt+f1 back to gnome and my desktop appears. My keyboard is unresponsive after resume. Hitting caps lock doesn't even toggle the light. You might not have the same issue as me. I've tested Ubuntu 10.10 Live CD which is running 2.6.35-22 and suspend/resume works fine on my machine. So, to recap OpenSUSE Live CD with kernel 2.6.34 suspends and resumes. Ubuntu Live CD with kernel 2.6.35 suspends and resumes. Fedora 14 Live CD with kernel 2.6.35 suspends BUT DOES NOT resume. What is Fedora doing with it's kernel (or could this be some other software) that is making resume fail where it works on other distros? Can I run fedora using these other kernels to see if they work? Is this possible? Hi Rodd, There's really nothing interesting in 2.6.35 relating to Radeon DRM or ACPI that could explain why this would fail and other kernels would succeed. The closest we have to upstream right now is the rawhide kernel, could I get you to test that? I'll follow up your email to the list with a link to one if so. --Kyle Why are we limiting thinking to radeom drm or acpi? I'll try to F15 kernel but this hasn't worked for me on any 2.6.34 or 2.6.35 fedora kernel. Earlier kernels had issues with suspend, but now the problem is with resuming. Rodd: Kyle actually re-assigned the bug from xgl-maint to Matthew Garrett: his comment was an explanation for why, i.e., he agrees with you, it's probably not a radeon drm issue, so the bug should be assigned as a general suspend/resume bug and not assigned to the X team. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers Kyle, I've tried the kernel-2.6.36-0.36.rc7.git3.fc15 (without perf as there were dependency issues) and my f14 system can now suspend AND resume. Will this help figure out what is wrong so that a patch can be applied to the f13 and f14 kernels? I've tried the same kernel (kernel-2.6.36-0.36.rc7.git3.fc15) on f13 and it works there too. I've just tried kernel-2.6.35.6-50.fc14.x86_64 and it still has the same issues. I would just report that I can join the "resume fail" team. I already filled the bug here https://bugzilla.redhat.com/show_bug.cgi?id=650525 as I did not find your before. So my bug is most likely duplicate of yours. I did nearly the same tests what are described in your bug except the testing of kernel-2.6.36-0.36.rc7.git3.fc15. Generally I see only one difference (except the HW), my system can resume with the 32 bit F14 distribution (only tested on one PC what does not have 64bit support). As I did tests also from "single" mode, I think that the failure is not related to any Xorg driver. Venca, welcome aboard (I think ;-]). I've just tried kernel-2.6.35.6-51.fc14.x86_64 from koji with no better outcome. I was hoping this might work since the changelog said: "Backport polling fixes + radeon hang fixes from upstream" No such luck. Here are another test results. The following kernels: kernel-2.6.36-0.36.rc7.git3.fc15 kernel-2.6.36-1.fc15.x86_64 also do not work. I have question for those guys who know more about the suspend/resume procedures: Is there something what could be wrongly set into the laptop CMOS/permanent memory what can remain there and could cause this problems? The point is, that my laptop starts to have also problem with rebooting/restarting the system. When I do restart, the computer hangs with black screen and I have to turn it off and on. I have tried the reboot=b/w kernel argument, but no success. Also when I turn off and then on the laptop when it hangs after the failed resume, the screen flash for a moment in the way like it resumes. And then it starts the post/boot procedure. I never observed that with my working Fedora 12. I'm not sure that Venca's got exactly the same problem as me. I'm not having issues with 2.6.36 kernels (and in fact I'm relying on them to use f14 with suspend/resume). However, after a failed resume I too experience slowness moving through the BOIS start up. It doesn't hang for me, but it's not the usual start up. I've tried kernels 2.6.35.9-62 and 2.6.35.9-64 and both still fail to resume. A couple of other things I've noticed. When waking from resume successfully, the bluetooth and wireless lights on my laptop come on almost immediately after the HDD light flashes (they are all next the each other). When it fails, the bluetooth and wifi lights never come on. Does this help isolate where the resume is failing? Also, I can confirm that this isn't a specific hardware problem because I've recently had my laptop replaced with an identical model (thanks Dell) and I've still got the problem. I'm pretty convinced that this is related to a patch that Fedora is using with the 2.6.34 and 2.6.35 kernels since I've been able to suspend and resume with these kernels on other distros and because 2.6.36 works on f14. Could this be the case? This message is a notice that Fedora 14 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 14. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At this time, all open bugs with a Fedora 'version' of '14' have been closed as WONTFIX. (Please note: Our normal process is to give advanced warning of this occurring, but we forgot to do that. A thousand apologies.) Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, feel free to reopen this bug and simply change the 'version' to a later Fedora version. Bug Reporter: Thank you for reporting this issue and we are sorry that we were unable to fix it before Fedora 14 reached end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" (top right of this page) and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping |