Bug 769657 - Resume from suspend fails, black screen [NEEDINFO]
Summary: Resume from suspend fails, black screen
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-12-21 16:03 UTC by Stefan Kirrmann
Modified: 2015-06-04 18:31 UTC (History)
12 users (show)

Fixed In Version: kernel-3.1.8-2.fc16
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-01-15 20:00:18 UTC
Type: ---
bhelgaas: needinfo? (ghoerner)


Attachments (Terms of Use)
dmidecode for SL510 (13.34 KB, text/plain)
2011-12-30 09:43 UTC, Stefan Kirrmann
no flags Details
dmidecode from Dell Studio 1557 (15.01 KB, text/plain)
2012-01-04 00:11 UTC, Gregory S. Hoerner
no flags Details
dmesg from Dell Studio 1557 (100.59 KB, text/plain)
2012-01-04 00:17 UTC, Gregory S. Hoerner
no flags Details
dmesg from SL510 (72.96 KB, text/plain)
2012-01-04 00:24 UTC, Stefan Kirrmann
no flags Details
Dell Studio 1557 dmesg 3.1.0-7 pci=crs (122.70 KB, text/plain)
2012-01-05 06:13 UTC, Gregory S. Hoerner
no flags Details
Dell Studio 1557 dmesg 3.1.6-1 pci=crs (81.90 KB, text/plain)
2012-01-05 06:13 UTC, Gregory S. Hoerner
no flags Details
Dell Studio 1557 dmesg 3.1.6-1 pci=nocrs (79.01 KB, text/plain)
2012-01-05 06:14 UTC, Gregory S. Hoerner
no flags Details
dmesg from SL510 withouth pci=nocrs (75.29 KB, text/plain)
2012-01-09 12:21 UTC, Stefan Kirrmann
no flags Details

Description Stefan Kirrmann 2011-12-21 16:03:31 UTC
Description of problem: Resume from suspend fails. This problem persists for me since kernel 2.6.35 (Fedora 14, 15 and 16), perhaps even in earlier versions, but I can't remember for sure.
After suspend, resume hangs, the screen remains black and no keyboardlights turn on. Sysreq keys don't work and I have to a hardware reset.


How reproducible:
Suspend and try to resume

  
Actual results:
resume hangs and screen remains black.


Additional info:
Somewhere along solving the problem I tried to configure the kernel with a kernel config file used by Archlinux x86_64 and to my suprise it even worked and solved my problem. So it seems it's just a matter of the kernel configuration.
So I tried to compile the Fedora kernel withouth patches, etc. but that didn't helped.
I also tried to figure out which kernel option(s) is/are causing the problem but until now I had no success.

hash matches in dmesg doesn't point to something useful.

I have the same problem on 3 different machines (Thinkpad SL510, and 2 different desktop workstations with i7 cpus).

Graphic driver are also different (tried nouveau, nv and intel).

Of course pm log only shows a successful suspend.

Anything I could try or any log I should attach?

Comment 1 John Gotts 2011-12-26 00:24:26 UTC
Does booting with pci=nocrs fix the problem?

Comment 2 Stefan Kirrmann 2011-12-28 22:12:49 UTC
pci=nocrs does fix the problem for the SL510 Thinkpad, thanks! But it doesn't work for the two i7 machines. Any other kernel parameter I could try?

Comment 3 John Gotts 2011-12-28 23:03:52 UTC
Try hpet=disable for those.

Comment 4 Stefan Kirrmann 2011-12-29 13:26:33 UTC
tried hpet=disable on both, but it doesn't help

Comment 5 Dave Jones 2011-12-29 18:01:44 UTC
Stefan, can you attach the output of dmidecode for the SL510 please ?

Comment 6 Stefan Kirrmann 2011-12-30 09:43:44 UTC
Created attachment 550045 [details]
dmidecode for SL510

Comment 7 Gregory S. Hoerner 2012-01-02 05:48:00 UTC
Just wanted to add that pci=nocrs worked perfectly on a Dell Studio 1557 (i7 w/ATI HD4570) as well; thanks!

Comment 8 John Gotts 2012-01-02 16:51:23 UTC
Hi, Gregory, please check out this bug and try out Dave Jones' fixed kernel.

https://bugzilla.redhat.com/show_bug.cgi?id=770308

Comment 9 Gregory S. Hoerner 2012-01-02 20:01:47 UTC
John, here's what I've found (as far a resuming from sleep):

1. 3.1.6-1 with the pci=nocrs option *works* with the open-source ATI driver
2. 3.1.6-1 with the pci=nocrs option *does not work* with the proprietary ATI driver
3. Dave Jones' 3.1.6-2 *does not work* with the open source or proprietary ATI drivers, with OR without pci=nocrs
4. 3.1.6-1 *works* with the "hpet=disable" AND the proprietary ATI driver

I did not try "hpet=disable" with the open source drivers because this configuration does work, and the proprietary driver gives me up to 2 hours of additional battery life over the open source one.

If anyone is interested in any more information about my system, or configurations, I'd be happy to provide them.

Thanks All :)

Comment 10 Dave Jones 2012-01-03 16:30:32 UTC
that's very strange. 1 and 3 should be the same effect.
Can you attach your output of dmidecode please ? It's possible your BIOS is slightly different.

Comment 11 Dave Jones 2012-01-03 16:31:54 UTC
oh wait, 1557. Yes, I'll need that dmidecode. It'll be different from the 1536 that we already fixed.

Comment 12 Bjorn Helgaas 2012-01-03 20:05:32 UTC
Stefan & Gregory, can you please attach complete dmesg logs from your Thinkpad SL510 and Dell Studio 1557?

We can add blacklist entries to automatically set "pci=nocrs", but if we understand what the problem is, we may be able to make a better fix that also helps other machines.

Comment 13 Josh Boyer 2012-01-03 23:00:09 UTC
For everyone that doesn't find pci=nocrs helpful, there is a kernel-3.1.7-1.fc16 that will be in the next updates testing that contains a revert of a patch that broke resume for many people.  That might be worth trying if pci=nocrs isn't working for you.

Comment 14 Gregory S. Hoerner 2012-01-04 00:11:23 UTC
Created attachment 550580 [details]
dmidecode from Dell Studio 1557

Comment 15 Gregory S. Hoerner 2012-01-04 00:17:24 UTC
Created attachment 550581 [details]
dmesg from Dell Studio 1557

Note, this dmesg is from my current state with the "hpet=disable" kernel parameter (so sleep/resume works). It's also "docked" on my desk,  so there's a ridiculous amount of hardware attached. If you need one from a different state, please let me know.

Comment 16 Stefan Kirrmann 2012-01-04 00:24:03 UTC
Created attachment 550582 [details]
dmesg from SL510

Comment 17 Stefan Kirrmann 2012-01-04 00:25:30 UTC
Comment on attachment 550582 [details]
dmesg from SL510

booted with pci=nocrs

Comment 18 Bjorn Helgaas 2012-01-04 17:40:41 UTC
(In reply to comment #15)
> Created attachment 550581 [details]
> dmesg from Dell Studio 1557
> 
> Note, this dmesg is from my current state with the "hpet=disable" kernel
> parameter (so sleep/resume works).

This dmesg is with PCI _CRS enabled (the default), and sleep/resume works.  You mention in comment #7 that "pci=nocrs" works on your Dell 1557, but I'm not sure whether you meant "pci=nocrs makes my machine work better," or merely that "pci=nocrs works on my machine but doesn't solve any problem."  Can you clarify?

I suspect the latter, and that the suspend/resume problems on your machine are related to HPET, not to PCI _CRS.  If that's the case, we shouldn't add a blacklist entry for the Dell 1557.

Comment 19 Gregory S. Hoerner 2012-01-04 19:24:00 UTC
Bjorn,

I'm thinking at this point that the issues are entirely related to the video drivers, and not so much the kernel/hardware otherwise. I'll walk through my entire sequence of events:

1. I installed a stock F16, 3.1.6-1 kernel. Everything worked except sleep/resume and terrible battery life.
2. I carried the laptop around for a few days with the lid open and decided this was unacceptable and to nail this down. I found this bug.
3. I added the pci=nocrs kernel parameter (at this point still using the open source radeon driver) and sleep/resume worked perfectly.
4. I decided to tackle the battery life issue, and got the fglrx (catalyst) driver running from RPMFusion (what a hassle messing w/SELinux). At this point I had great battery life (extra 2 hours), but sleep/resume was broken again. The pci=nocrs parameter was still in place (added in /etc/default/grub).
5. I removed the parameter and still no sleep/resume.
6. John suggested Dave Jones' patched kernel, so I installed that, still no luck (w/o pci=nocrs), added it back, still no luck (still using catalyst at this point).
7. I decided to try the patched kernel with the radeon driver, so I killed catalyst and went back to stock (radeon). The patched kernel wouldn't work with or without the parameter (pci=nocrs).
8. Went back to the stock kernel and catalyst, added the "hpet=disable" parameter and now sleep/resume works fine again.

SO...
- With the 3.1.6-1 kernel and open source radeon driver, I needed "pci=nocrs" alone to make sleep/resume work. 
- With the 3.1.6-1 kernel and proprietary catalyst driver, I need "hpet=disable" alone to make sleep/resume work.
- No other combination has worked so far (I rebooted dozens of times, taking notes of which parameters/kernel/driver I was using each time).
- I'm also using "nomodeset" when using catalyst otherwise catalyst causes an OOPS and I have no X; I don't think this would alter the other variables, I just don't get the "pretty" Fedora boot progress animation.
- There was no appreciable/noticeable difference in functionality/speed when the pci=nocrs parameter was added, and nothing else seemed to be affected.

Comment 20 John Gotts 2012-01-04 20:17:00 UTC
I don't know how productive a comment this is but the catalyst drivers have never been stable for me and my Studio 1536.

The way that I test video driver stabilitity is to run all of the xscreensaver hacks. If any of them cause the machine to lock up then the driver is not ready for production. I evaluated the catalyst drivers several times and my machine reproducibly froze solid until I removed them.

Comment 21 Bjorn Helgaas 2012-01-04 20:30:10 UTC
(In reply to comment #19)
Wow, I'm impressed.  Not many people are so diligent and thorough.

So the comment #15 dmesg is with stock FC 16 3.1.6-1 kernel with "hpet=disable" and fglrx/catalyst.  Kernel uses _CRS and sleep/resume works fine.  You shouldn't need to use "hpet=disable", but that's material for a different bug report.

Let's go back to events 1-3:  With stock FC16 (3.1.6-1 and radeon), sleep/resume doesn't work on your Dell Studio 1557.  Adding "pci=nocrs" makes it work.  That's got to be fixed.

Can you temporarily revert to that stock FC16 config with radeon and collect dmesg logs with and without "pci=nocrs"?  There must be some important different, but I don't see it yet.

Comment 22 Bjorn Helgaas 2012-01-04 21:51:09 UTC
(In reply to comment #17)
> Comment on attachment 550582 [details]
> dmesg from SL510
> 
> booted with pci=nocrs

Thanks, Stefan.  Would you mind also attaching a dmesg log without "pci=nocrs"?

Comment 23 Gregory S. Hoerner 2012-01-05 06:11:10 UTC
Bjorn,

I have some interesting test results for you. I've been tweaking my config, and didn't want to mess with it, so I decided to do a fresh install on my eSATA/USB Flash drive (this is important to note) to get the logs.

I've normally had great results with this drive (~90MB/s read & ~40MB/s write) when used with eSATA, but the install took FOREVER (i7 felt like an Atom). The activity light was blinking longer than the DVD was spinning.

Here are the kernel results:

* 3.1.0-7 (DVD version) resume works fine *without* the pci=nocrs (IE pci=crs)
* 3.1.6-1 (After Yum Update) resume *didn't work* (as before) without pci=nocrs
* 3.1.6-1 *with* pci=nocrs resume worked (as before)

I will attach the dmesg logs in a minute, the filenames are self-explanatory.

HOWEVER, you asked previously if anything seemed different with the pci=nocrs parameter added, and the answer this time is YES... the flash drive ran great, it booted as fast as my Corsair Performance 3 SSD; everything was as fast and responsive as it should be.

Now that I know it makes such a huge difference with the drive, I am even more interested in tracking down the issue.

P.S. - John, I would definitely agree the catalyst drivers aren't stable. The issues I've noticed are slow redraw after login with the Gnome3 Shell, random Gnome3 Shell crashes, and GTK+ based drop-downs are slanted. The crashes only take a second before the shell re-initializes, and the drop-downs are still readable, so to me it's a small price to pay to almost double the battery life :)

Comment 24 Gregory S. Hoerner 2012-01-05 06:13:22 UTC
Created attachment 550815 [details]
Dell Studio 1557 dmesg 3.1.0-7 pci=crs

Comment 25 Gregory S. Hoerner 2012-01-05 06:13:56 UTC
Created attachment 550816 [details]
Dell Studio 1557 dmesg 3.1.6-1 pci=crs

Comment 26 Gregory S. Hoerner 2012-01-05 06:14:24 UTC
Created attachment 550817 [details]
Dell Studio 1557 dmesg 3.1.6-1 pci=nocrs

Comment 27 Bjorn Helgaas 2012-01-05 19:51:50 UTC
(In reply to comment #23)
> Here are the kernel results:
> 
> * 3.1.0-7 (DVD version) resume works fine *without* the pci=nocrs (IE pci=crs)
> * 3.1.6-1 (After Yum Update) resume *didn't work* (as before) without pci=nocrs
> * 3.1.6-1 *with* pci=nocrs resume worked (as before)

Huh.  I'm stumped.  On your machine, "pci=nocrs" makes absolutely no difference as far as PCI resource allocation.  All three dmesg logs show exactly the same assignments.

There must be something else that accounts for the resume issues and the USB flash drive performance difference.  Can you double-check these results, maybe starting from power-off and varying the order in which you run them?

Comment 28 Bjorn Helgaas 2012-01-06 06:11:33 UTC
(In reply to comment #23)
> HOWEVER, you asked previously if anything seemed different with the pci=nocrs
> parameter added, and the answer this time is YES... the flash drive ran great,
> it booted as fast as my Corsair Performance 3 SSD; everything was as fast and
> responsive as it should be.

pci=nocrs only affects PCI resource allocation (MMIO and I/O port BARs).  That can make a device work or not work, but normally it doesn't affect *performance*.

Is it possible that the slow USB flash performance was with an install kernel booted with different options than the normal kernel?  For example, maybe the install kernel uses "safe" options that make USB interrupts not work as well, and the normal kernel has good performance regardless of "pci=nocrs".

Comment 29 Josh Boyer 2012-01-06 12:43:35 UTC
(In reply to comment #28)
> Is it possible that the slow USB flash performance was with an install kernel
> booted with different options than the normal kernel?  For example, maybe the
> install kernel uses "safe" options that make USB interrupts not work as well,
> and the normal kernel has good performance regardless of "pci=nocrs".

We don't build separate install kernels and the options on the command line are the same by default, or whatever a user has put there.  It's possible an install initramfs might be missing some modules that are present on an installed system, but it is still the same kernel.

Comment 30 Bjorn Helgaas 2012-01-07 15:43:34 UTC
(In reply to comment #29)
> We don't build separate install kernels and the options on the command line are
> the same by default, or whatever a user has put there.  It's possible an
> install initramfs might be missing some modules that are present on an
> installed system, but it is still the same kernel.

OK, so I guess we'll have to assume that the only difference between the fast & slow USB flash is "pci=nocrs".  Gregory, can you reproduce those fast & slow boots with the same kernel (only difference being "pci=nocrs") and collect the dmesg, "lspci -vv" and lsmod output, and /proc/interrupts?

Comment 31 Fedora Update System 2012-01-08 02:26:41 UTC
kernel-3.1.8-2.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.1.8-2.fc16

Comment 32 Gregory S. Hoerner 2012-01-08 07:18:18 UTC
Bjorn,

I finally got a chance to do some more testing. The 3.1.8-2 kernel seems to have fixed the issue with suspend/resume (works without adding pci=nocrs), so I think it's safe to say this bug is "closed" (at least for me).

I have, however, gotten some much more inconsistent results with the flash drive; with all 4 kernels tested, I have not been able to get any form of consistency. I think to be fair, I should open a new bug for it, but I have to gather a lot more data first. I will do this tomorrow and list the bug number in case you're interested in following it.

I am also interested in why I have to use hpet=disable in order for resume to work with the Catalyst drivers. I haven't tested those with this new kernel, but if the same holds true, do you think it's appropriate to submit a bug for that as well since it may be related to those closed-source drivers, or should I just cross my fingers and hope ATI fixes things?

Comment 33 Stefan Kirrmann 2012-01-09 12:21:48 UTC
Created attachment 551547 [details]
dmesg from SL510 withouth pci=nocrs

Sorry it took me a while, but it's my girlfriends laptop ;-)

Comment 34 Bjorn Helgaas 2012-01-10 00:17:36 UTC
(In reply to comment #32, comment #33)
> The 3.1.8-2 kernel seems to have fixed the issue with suspend/resume
> (works without adding pci=nocrs),

That's just bizarre.

Stefan, can you try 3.1.8-2, too?  I looked at your dmesg logs and the only difference is this (- is with pci=nocrs, + is without):

    -pci 0000:00:1f.3: BAR 0: assigned [mem 0xb8100000-0xb81000ff 64bit]
    +pci 0000:00:1f.3: BAR 0: assigned [mem 0xc0100000-0xc01000ff 64bit]

This is the i801_smbus device.  I don't see any reason why it would work at 0xb8100000 but not at 0xc0100000.

If 3.1.8-2 doesn't fix it, maybe we need to get a suspend/resume person to look at this.

> I am also interested in why I have to use hpet=disable in order for resume to
> work with the Catalyst drivers.

No harm in filing a bug (it at least gives a way to identify a specific issue).  It doesn't interest me, but I can't speak for Red Hat.

Comment 35 Fedora Update System 2012-01-11 06:19:25 UTC
Package kernel-3.1.8-2.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.1.8-2.fc16'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0363/kernel-3.1.8-2.fc16
then log in and leave karma (feedback).

Comment 36 Bjorn Helgaas 2012-01-11 16:15:51 UTC
(In reply to comment #32)
> I finally got a chance to do some more testing. The 3.1.8-2 kernel seems to
> have fixed the issue with suspend/resume (works without adding pci=nocrs), so I
> think it's safe to say this bug is "closed" (at least for me).

I'm sorry, I missed the fact that 3.1.8-2.fc16 contains the quirk that automatically turns off PCI _CRS.  (You should be able to confirm this by looking at the dmesg.  It should contain "PCI: Ignoring host bridge windows from ACPI" even though you didn't supply "pci=nocrs").

Actually, just for completeness, can you attach the 3.1.8-2.fc16 dmesg?  It *should* be essentially identical to your "3.1.6-1 pci=nocrs" log.

We still need to get to the bottom of *why* "pci=nocrs" makes a difference.  On your machine (Dell 1557), it seems to make no difference in PCI allocations (see comment #27).

Comment 37 Fedora Update System 2012-01-15 20:00:18 UTC
kernel-3.1.8-2.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 38 Nayden Isapov 2012-04-01 18:31:08 UTC
Asus K50AB, Fedora 16, kernel 3.3.0-8.fc16:
The screen is black after wake from suspend, and no input device is working.

Regards!

Comment 39 Bjorn Helgaas 2015-06-04 18:31:36 UTC
Hi Gregory,

We added http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e702781fa846 for your machine.  But I'm not convinced that quirk really fixed anything, and I'd like to remove it.

If you still have that machine (Dell Studio 1557), I wonder if you could confirm that the quirk is unnecessary?  The quirk turns off _CRS, so booting with "pci=use_crs" is equivalent to removing the quirk.

From comment #19, with the 3.1.6-1 kernel and open source radeon driver, you needed "pci=nocrs" to make sleep/resume work.  I reviewed the dmesg logs from comment #25 and comment #26 again, and I don't see anything relevant that was changed by "pci=nocrs", although I don't think those logs included a sleep/resume cycle.

When booting a recent kernel, e.g., v4.0, if you can find any problem that happens with "pci=use_crs" but doesn't happen otherwise, I'd really appreciate a report at http://bugzilla.kernel.org.  If you find one, please attach both dmesg logs to the bugzilla.


Note You need to log in before you can comment on or make changes to this bug.