Bug 695034

Summary: GPU hangs, reboot required i915_hangcheck_elapsed errors
Product: [Fedora] Fedora Reporter: Leek Soup <nerdgal7501>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 14CC: airlied, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, p, sgallagh
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-16 13:48:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 717210    

Description Leek Soup 2011-04-09 22:56:20 UTC
Description of problem:
A user was playing Neverball (from Fedora games) when the system froze. I was able to get to a tty login and kill the process. However, we then got hundreds error messages. Getting back to GUI we found it was unusable. (Still frozen)

Version-Release number of selected component (if applicable):
kernel 2.6.35.6-45.fc14

How reproducible:
intermittent

Steps to Reproduce:
1. play Neverball Hard, "Pipe"
2. freezes
3. go to TTY (ctl-alt-F2) and log in, kill process
  
Actual results:
Hundreds messages on screen, similar to this with n represent numbers:
[nnn.nnnnnn] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting nnnnn at nnnnn)
[nnn.nnnnnn] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
...

Switch back to GUI is possible but no work can be done -- it is still frozen.

From the TTY, I was able to log in as root and do a shutdown -r to reboot the system.

Expected results:
No GPU hang. If process has to be killed, system returns to normal after it.

Additional info:

Comment 1 Stephen Gallagher 2011-06-22 16:56:16 UTC
I am also receiving the "[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed" message in /var/log/messages on Fedora 15.

Every time this happens, I'm experiencing a significant (sometimes up to 2 seconds) lag performing Gnome Shell actions like changing workspaces or entering/exiting the overlay.

There is a patch to fix this available on the linux-kernel mailing list:
http://marc.info/?l=linux-kernel&m=130833614308275&w=2

Please provide a kernel update with this fix ASAP. This bug is making Gnome 3 nearly unusable on Sandy Bridge hardware.

Comment 2 Pádraig Brady 2011-06-25 16:01:04 UTC
I've tested the patch in comment #1 as part of 2.6.38.8-34 (not pushed to bodhi yet):
https://koji.fedoraproject.org/koji/buildinfo?buildID=250085

It fixes the easily reproducible issue for me.

thanks!

Comment 3 Stephen Gallagher 2011-06-27 00:28:32 UTC
I have also been testing kernel-2.6.38.8-34.fc15 successfully.

Comment 4 Josh Boyer 2011-08-31 19:54:49 UTC
Dave, it seems the patches you added to the koji build referenced in comment #2 solved the issues for people on F15.  Are those (or just the patch in comment #1) safe to bring back to F14, which is 2.6.35 based?

Comment 5 Fedora End Of Life 2012-08-16 13:48:39 UTC
This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping