Bug 1306937 - GPU hangs after i915 backport
Summary: GPU hangs after i915 backport
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-intel
Version: 23
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Adam Jackson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-12 08:57 UTC by frans
Modified: 2016-12-20 18:39 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-20 18:39:21 UTC
Type: Bug


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
FreeDesktop.org 94101 0 None None None 2016-05-12 07:18:12 UTC

Description frans 2016-02-12 08:57:54 UTC
With intervals of 5 seconds to several minutes the whole system freezes for about 5 seconds. Sometimes it freezes forever, sometimes the gnome-shell restarts.

dmesg then tells me:

[  184.013519] [drm] stuck on render ring
[  184.014589] [drm] GPU HANG: ecode 9:0:0x87f99ff9, in gnome-shell [2466], reason: Ring hung, action: reset
[  184.014596] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  184.014600] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  184.014604] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  184.014608] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  184.014613] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  184.016433] drm/i915: Resetting chip after gpu hang
[  186.008063] [drm] RC6 on

I have already filed a bug here: 
https://bugs.freedesktop.org/show_bug.cgi?id=94101 with a GPU crash dump
but I wanted to let you know that most likely it has been introduced at Fedora with the kernel update from 4.3.3-300 to 4.3.3-303 (the i915 backport). 4.3.3-300 works for me.

The graphic chip is an Intel Iris 540 inside a i5-6260U.


Version-Release number of selected component (if applicable):

  Fedora 23 Workstation, up to date with kernel 4.3.3-303+ running


How reproducible:

The behavior can be witnessed right after Gnome has started. It happens about every few seconds until the system crashes permanently - work is not possible.

Steps to Reproduce:
Just start Fedora 23 on an Intel NUC6i5SY

Comment 1 frans 2016-02-12 09:05:06 UTC
This bug report looks very similar but refers to an i910 and is much older:
https://bugzilla.redhat.com/show_bug.cgi?id=1227202

Comment 2 David H. Gutteridge 2016-05-12 07:12:08 UTC
Please note the external FreeDesktop.org bug ID referenced is incorrect.

Comment 3 frans 2016-05-12 07:18:50 UTC
Thanks - changed it

Comment 4 Milos Kaurin 2016-05-29 12:18:48 UTC
I confirm that I have the same hardware and the same issue.

C/P from the related FreeDesktop bug report:

I have exactly the same hardware:
[  +0.000006] Hardware name:                  /NUC6i5SYB, BIOS SYSKLi35.86A.0044.2016.0512.1734 05/12/2016

I can reproduce the mini-hangs/complete system freezes as frans described on kernels 4.3.3-303.fc23+. Issues are always reported in dmesg.

Kernel 4.3.3-300 gives me various i915 related stack traces, but with no impact to usability. (no hangs/freezes)

The best method of reproducing mini hangs in kernels 4.3.3-303.fc23+ that I've found so far is: Run Kodi (requires RPMfusion repos), then start Firefox. The issues start appearing almost immediately.

I also confirm what Jani Nikula said: Setting "i915.i915_enable_rc6=0" has no impact on the kernels pertaining to this issue. You can look into dmesg where it will be clearly stated that this particular kernel parameter is unknown.

Comment 5 Milos Kaurin 2016-05-30 10:53:46 UTC
(In reply to Milos Kaurin from comment #4)
> I also confirm what Jani Nikula said: Setting "i915.i915_enable_rc6=0" has
> no impact on the kernels pertaining to this issue. You can look into dmesg
> where it will be clearly stated that this particular kernel parameter is
> unknown.

I was wrong. From the other bug report, copy/pasting again:


Just tried it - "4.4.9-300.fc23.x86_64" with "i915.enable_rc6=0". No i915 stack traces (or any other) with kodi and firefox running for several minutes now.

The only thing I see in dmesg regarding rc6 is this line close to boot:

[  +0.088745] Setting dangerous option enable_rc6 - tainting kernel


For those having the issue, and if you want to use the latest kernel:

1. As root, edit /etc/default/grub
2. Find the line that starts with GRUB_CMDLINE_LINUX
   * Append i915.enable_rc6=0
   * Example: GRUB_CMDLINE_LINUX="rhgb quiet i915.enable_rc6=0"
3a. If you have a BIOS system, follow this guide[1]
3b. If you have a UEFI syste, follow this guide[2]
4. Reboot
5. Check  "uname -a" and dmesg to check whether you have set this option correctly
7. Test

NOTE: The rc6 option for the i915 appears to control power saving settings[3], so expect higher power consumption.

I hope this helps.


[1] Updating GRUB 2 on BIOS systems: https://fedoraproject.org/wiki/GRUB_2?rd=Grub2#Updating_GRUB_2_configuration_on_BIOS_systems
[2] Updating GRUB 2 on UEFI systems: https://fedoraproject.org/wiki/GRUB_2?rd=Grub2#Updating_GRUB_2_configuration_on_UEFI_systems
[3] http://blog.vivi.eng.br/?p=162

Comment 6 Milos Kaurin 2016-06-03 20:17:28 UTC
No longer occurs with the new 4.5.5-201.fc23 kernel:

$ cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-4.5.5-201.fc23.x86_64 root=UUID=638709e8-d595-4259-8a1c-4525083174ee ro rhgb quiet

Comment 7 frans 2016-06-11 09:19:39 UTC
The machine still hangs! It hasn't crashed yet (updated and re-activated RC6 yesterday) but every couple of seconds the machine does not react and produces the same error in the log:


[95618.288809] [drm] stuck on render ring
[95618.289426] [drm] GPU HANG: ecode 9:0:0x85dfbfff, in gnome-shell [2271], reason: Ring hung, action: reset
[95618.289841] ------------[ cut here ]------------
[95618.289890] WARNING: CPU: 1 PID: 32114 at drivers/gpu/drm/i915/intel_display.c:11440 intel_mmio_flip_work_func+0x45f/0x470 [i915]()
...
...
[95618.290176]  [<ffffffff810d7980>] ? kthread_create_on_node+0x250/0x250
[95618.290183]  [<ffffffff818b641f>] ret_from_fork+0x3f/0x70
[95618.290187]  [<ffffffff810d7980>] ? kthread_create_on_node+0x250/0x250
[95618.290192] ---[ end trace 700d22f5ceb77f1b ]---
[95618.291837] drm/i915: Resetting chip after gpu hang
[95620.276896] [drm] RC6 on

$ cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-4.5.6-200.fc23.x86_64+debug root=/dev/mapper/luks-3fa0d5af-c698-432c-bffc-946a8d6e6017 ro rd.luks.uuid=luks-3fa0d5af-c698-432c-bffc-946a8d6e6017 rhgb quiet LANG=en_US.UTF-8

Comment 8 frans 2016-06-11 09:27:34 UTC
In case it helps: I have a Intel NUC with a Skylake 6260U processor (with Iris 540 on board) and a Thinkpad X260 with a Skylake 6200U processor (with Iris 520).

The problem does not occur on the Thinkpad (Iris 520)

Comment 9 Fedora End Of Life 2016-11-24 15:31:25 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 10 Fedora End Of Life 2016-12-20 18:39:21 UTC
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.