Bug 754043

Summary: Thinkpad R61 sometimes fails to suspend (regression)
Product: [Fedora] Fedora Reporter: Pierre Ossman <pierre-bugzilla>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: gansalmon, itamar, jonathan, jpazdziora, kernel-maint, madhu.chinakonda, pierre-bugzilla, sgraf
Target Milestone: ---Keywords: Reopened
Target Release: ---Flags: pierre-bugzilla: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-27 16:11:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
/var/log/messages none

Description Pierre Ossman 2011-11-15 08:46:10 UTC
I've recently upgraded from Fedora 14 to Fedora 16. Unfortunately that caused a regression in that suspend sometimes hangs. It has begun the suspend procedure (BIOS has started blinking the suspend LED), but it stops along the way. This has happened twice now since I upgraded two weeks ago. I generally suspend twice a day (once to work, and once from work).

The machine is entirely unresponsive when this happens. The screen has switched VT and contains a few dmesg lines with pnp suspend info.

Hardware is a Thinkpad R61 with a Nvidia Quadro NVS 140M.

The machine suspended like clockwork whilst on Fedora 14. I was running Compiz at that time, so the graphics card should have seen similar loads (I mention this since graphics drivers at least used to be a common suspect).

Comment 1 Pierre Ossman 2011-11-17 09:48:06 UTC
I'm also seeing occasional problems resuming. The machine will finish most of the resume (the BIOS emits a beep and turns off the blinking suspend LED), but then it just hangs. The screen is never turned on and it doesn't respond to any buttons (except the power button for four seconds).

TBH, it is a bit disappointing to see problems like this re-emerge. I was hoping that suspend was more or less a solved problem under Linux. :/

Comment 2 Chuck Ebbert 2011-11-18 07:11:37 UTC
(In reply to comment #1)
> 
> TBH, it is a bit disappointing to see problems like this re-emerge. I was
> hoping that suspend was more or less a solved problem under Linux. :/

The problem is that patches that fix broken suspend on one machine break others, then the patches that fix _that_ break still more machines, and so on forever and ever.

Comment 3 Pierre Ossman 2011-11-18 08:32:32 UTC
A never ending battle, huh...

So what can I do to help pinpoint the problem? Backing the kernel is not possible I guess as that would cross not just one, but two distro boundaries. And generally you don't get squat in messages for suspend failures.

I did notice this though, during a successful suspend/resume:

Nov 16 09:29:38 mjolnir kernel: [83275.401145] [drm] nouveau 0000:01:00.0: Idling channels...
Nov 16 09:29:38 mjolnir kernel: [83275.401387] [drm] nouveau 0000:01:00.0: Suspending GPU objects...
Nov 16 09:29:38 mjolnir kernel: [83275.405361] snd_hda_intel 0000:00:1b.0: PCI INT B disabled
Nov 16 09:29:38 mjolnir kernel: [83277.527759] [drm] nouveau 0000:01:00.0: And we're gone!
Nov 16 09:29:38 mjolnir kernel: [83277.527795] nouveau 0000:01:00.0: PCI INT A disabled
Nov 16 09:29:38 mjolnir kernel: [83277.527816] NMI: PCI system error (SERR) for reason a0 on CPU 0.
Nov 16 09:29:38 mjolnir kernel: [83277.527818] Dazed and confused, but trying to continue
Nov 16 09:29:38 mjolnir kernel: [83277.538052] nouveau 0000:01:00.0: power state changed by ACPI to D3
Nov 16 09:29:38 mjolnir kernel: [83277.538137] PM: suspend of devices complete after 2278.770 msecs

Minor warning, or something to be concerned about?

Comment 4 Pierre Ossman 2011-12-20 10:05:49 UTC
This seemed to resolve itself with a kernel update, but now it's back. So it would seem that 3.1.4-1 works, but 3.1.5-2 does not. Testing 3.1.5-1 right now, in case it is a RH patch that is causing issues.

Comment 5 Pierre Ossman 2011-12-22 10:03:43 UTC
Oddly enough, 3.1.5-1 seems to work fine. I just got 3.1.5-6, we'll see how that goes.

Comment 6 Pierre Ossman 2012-02-08 09:10:50 UTC
Has been working fine for the kernels so far, but kernel-3.2.3-2.fc16.x86_64 breaks things again. It seems to reliably hang every other resume. It gets stuck somewhere in the early resume process as the suspend led is still blinking (which it only does when it's in the process of suspending or resuming).

Comment 7 Pierre Ossman 2012-02-17 17:48:28 UTC
And works again with kernel-3.2.5-3.fc16.x86_64. Are you guys having a commit war? :)

Comment 8 Dave Jones 2012-03-22 17:04:58 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 9 Dave Jones 2012-03-22 17:08:07 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 10 Dave Jones 2012-03-22 17:18:58 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 11 Jan Pazdziora 2012-07-26 17:13:54 UTC
I see what seems to be the same behaviour on my T410 with Fedora 17, kernels 3.3 and 3.4.

Basically, two suspends work just fine -- the moon sign flashed a couple of times and then the machine turned off. The third suspend however never finishes properly -- it just keeps flashing and the machine never switches off.

Comment 12 Pierre Ossman 2012-08-20 11:09:54 UTC
Things were nice and calm for a while, but 3.4.7 seems to have broken things again. It generally locks up on the second resume.

Comment 13 Dave Jones 2012-10-23 15:30:32 UTC
# Mass update to all open bugs.

Kernel 3.6.2-1.fc16 has just been pushed to updates.
This update is a significant rebase from the previous version.

Please retest with this kernel, and let us know if your problem has been fixed.

In the event that you have upgraded to a newer release and the bug you reported
is still present, please change the version field to the newest release you have
encountered the issue with.  Before doing so, please ensure you are testing the
latest kernel update in that release and attach any new and relevant information
you may have gathered.

If you are not the original bug reporter and you still experience this bug,
please file a new report, as it is possible that you may be seeing a
different problem. 
(Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).

Comment 14 Pierre Ossman 2012-10-25 07:57:18 UTC
Still broken. Hangs on resume.

Comment 15 Jan Pazdziora 2012-10-25 18:11:13 UTC
(In reply to comment #13)
> # Mass update to all open bugs.
> 
> Kernel 3.6.2-1.fc16 has just been pushed to updates.
> This update is a significant rebase from the previous version.

On my T410, with kernel-PAE-3.6.2-4.fc17.i686 I am able to suspend by closing lid and it really suspends (the moon stops flashing) and resumes, about ten times over the course of the last 48 hours -- something I was not able to do for months. Fingers crossed, hopefully it's permanent.

Comment 16 Pierre Ossman 2012-11-01 12:42:50 UTC
Okay, I think I'm starting to see a pattern here. A key factor seems to be the laptop dock. Every time it has failed to resume, it has been when I've docked it in suspended mode and tried to resume.

Note that it does not always hang when resuming in the dock. But I've so far not seen it fail to resume outside the dock, at least not with the recent kernels.

Comment 17 Fedora End Of Life 2013-01-16 14:31:07 UTC
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 18 Fedora End Of Life 2013-02-13 15:34:47 UTC
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 19 Pierre Ossman 2013-03-06 08:17:13 UTC
kernel-3.8.1-201.fc18.x86_64 brought this delightful issue back. Locked up twice so far. Once was outside of the dock, so it's not related to that.

One difference is that the display is restored now, so it seems to get through the low level resume fine.

Some of /var/log/messages from the resume is intact, and contains things like this:

Mar  5 11:19:16 mjolnir kernel: [133586.363108] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH TLB flush idle timeout fail
Mar  5 11:19:16 mjolnir kernel: [133586.363108] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_STATUS  : 0x01000001 BUSY ROP
Mar  5 11:19:16 mjolnir kernel: [133586.363108] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS0: 0x00000000
Mar  5 11:19:16 mjolnir kernel: [133586.363108] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS1: 0x00000000
Mar  5 11:19:16 mjolnir kernel: [133586.363108] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS2: 0x00200000 ROP
Mar  5 11:19:16 mjolnir kernel: [133587.839008] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH TLB flush idle timeout fail
Mar  5 11:19:16 mjolnir kernel: [133587.839008] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_STATUS  : 0x01000001 BUSY ROP
Mar  5 11:19:16 mjolnir kernel: [133587.839008] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS0: 0x00000000
Mar  5 11:19:16 mjolnir kernel: [133587.839008] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS1: 0x00000000
Mar  5 11:19:16 mjolnir kernel: [133587.839008] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS2: 0x00200000 ROP
Mar  5 11:19:16 mjolnir kernel: [133589.840007] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH TLB flush idle timeout fail
Mar  5 11:19:16 mjolnir kernel: [133589.840007] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_STATUS  : 0x01000001 BUSY ROP
Mar  5 11:19:16 mjolnir kernel: [133589.840007] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS0: 0x00000000
Mar  5 11:19:16 mjolnir kernel: [133589.840007] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS1: 0x00000000
Mar  5 11:19:16 mjolnir kernel: [133589.840007] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS2: 0x00200000 ROP
Mar  5 11:19:18 mjolnir kernel: [133591.853479] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH TLB flush idle timeout fail
Mar  5 11:19:18 mjolnir kernel: [133591.853479] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_STATUS  : 0x01000001 BUSY ROP
Mar  5 11:19:18 mjolnir kernel: [133591.853479] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS0: 0x00000000
Mar  5 11:19:18 mjolnir kernel: [133591.853479] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS1: 0x00000000
Mar  5 11:19:18 mjolnir kernel: [133591.853479] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS2: 0x00200000 ROP

Comment 20 Pierre Ossman 2013-03-06 08:19:26 UTC
Created attachment 705804 [details]
/var/log/messages

Here's what's left in /var/log/messages from the two failed resumes.

Comment 21 Justin M. Forbes 2013-10-18 21:04:50 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs.

Fedora 18 has now been rebased to 3.11.4-101.fc18.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19.

If you experience different issues, please open a new bug report for those.

Comment 22 Justin M. Forbes 2013-11-27 16:11:03 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  

It has been over a month since we asked you to test the 3.11 kernel updates and let us know if your issue has been resolved or is still a problem. When this happened, the bug was set to needinfo.  Because the needinfo is still set, we assume either this is no longer a problem, or you cannot provide additional information to help us resolve the issue.  As a result we are closing with insufficient data. If this is still a problem, we apologize, feel free to reopen the bug and provide more information so that we can work towards a resolution

If you experience different issues, please open a new bug report for those.