Bug 1041906 - i915 *ERROR* Failed to reset chip
Summary: i915 *ERROR* Failed to reset chip
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-intel
Version: 20
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Adam Jackson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-12 20:00 UTC by Jaroslaw Gorny
Modified: 2015-06-29 13:29 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-06-29 13:29:42 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
output of /sys/class/drm/card0/error from google chrome crash on Macbook (851.86 KB, text/plain)
2014-09-21 05:46 UTC, Steven Ellis
no flags Details
gpu crash dump in /sys/class/drm/card0/error on Dell Inspiron 1420 (845.99 KB, text/plain)
2014-11-22 04:50 UTC, Christopher Tubbs
no flags Details


Links
System ID Private Priority Status Summary Last Updated
FreeDesktop.org 83423 0 None None None Never

Description Jaroslaw Gorny 2013-12-12 20:00:49 UTC
Description of problem:

It occured for me for the first time on current F20 kernel (3.11.10-300.fc20.x86_64). Never saw this on previous kernels (neither for F20 nor for previous releases so far).
While working on my laptop, suddenly screen went black. No option to recover. I closed the lid (machine went to sleep properly) and opened it after few seconds - the screen was back.
In journal I've found the following:

Dec 10 14:08:59 harissa kernel: [10617.707091] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
Dec 10 14:08:59 harissa kernel: [10617.707125] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
Dec 10 14:08:59 harissa kernel: [10617.708889] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xb58c000 ctx 0) at 0xb58cbf4
Dec 10 14:09:00 harissa kernel: [10618.211057] [drm:i915_reset] *ERROR* Failed to reset chip.



Version-Release number of selected component (if applicable):
3.11.10-300.fc20.x86_64

How reproducible:
It has occured only once so far (using this kernel since ~16h).

Steps to Reproduce:
1. Work normally and wait for the issue to happen

Additional info:

00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (primary) (rev 0c) (prog-if 00 [VGA controller])
        Subsystem: Lenovo ThinkPad T61/R61
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 43
        Region 0: Memory at f8000000 (64-bit, non-prefetchable) [size=1M]
        Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at 1800 [size=8]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee0200c  Data: 41d1
        Capabilities: [d0] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
                Bridge: PM- B3+
        Kernel driver in use: i915

Comment 1 Mykola Dvornik 2013-12-22 07:49:02 UTC
Happens to me as well.

kernel-3.12.5-302.fc20.x86_64
xorg-x11-drv-intel-2.21.15-5.fc20.x86_64

Dmesg says:

[  179.901023] [drm] stuck on render ring
[  179.901033] [drm] capturing error event; look for more information in /sys/class/drm/card0/error
[  185.896402] [drm] stuck on render ring
[  192.912349] [drm] stuck on render ring
[  198.911731] [drm] stuck on render ring
[  204.920976] Watchdog[2137]: segfault at 0 ip 00007f3913ebcc3e sp 00007f38fd307670 error 6 in chrome[7f391079d000+59b1000]
[  223.929759] [drm] stuck on render ring
[  229.929123] [drm] stuck on render ring
[  230.518478] [drm:i915_reset] *ERROR* Failed to reset chip.

Comment 2 Mykola Dvornik 2013-12-22 08:43:24 UTC
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
	Subsystem: Dell Device 04b8
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 47
	Region 0: Memory at f2400000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at 5000 [size=64]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee0200c  Data: 4142
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a4] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: i915

Comment 3 Steven Ellis 2014-09-20 04:36:55 UTC
I'm also seeing this under Fedora 20

Kernel - 3.15.10-201.fc20.x86_64

cat /sys/class/drm/card0/error

GPU HANG: ecode 0:0x9f47f9fd, in chrome [3065], reason: Ring hung, action: reset
Time: 1411184453 s 809298 us
Kernel: 3.15.10-201.fc20.x86_64
Active process (on ring render): chrome [3065]
Reset count: 0
Suspend count: 0
PCI ID: 0x2a02
EIR: 0x00000000
IER: 0x00028053
PGTBL_ER: 0x00000000
FORCEWAKE: 0x00000000
DERRMR: 0x00000000
CCID: 0x00000000
Missed interrupts: 0x00000000
  fence[0] = d2390000d22200d
  fence[1] = ed8d0000ed4e01d
  fence[2] = d7100000d6d101d
  fence[3] = ef880000ef4901d
  fence[4] = d7d00000d79101d
  fence[5] = d1ba0000d17b00d
  fence[6] = b5f80000b5b901d
  fence[7] = d6d00000d69101d
  fence[8] = 87ec0000879d02d
  fence[9] = d2120000d1fb00d
  fence[10] = 3f7e00003b7f09d
  fence[11] = c06c0000c02d01d
  fence[12] = d4100000d3d101d
  fence[13] = d2790000d23a00d
  fence[14] = d1fa0000d1bb00d
  fence[15] = a9030000a8c401d
  INSTDONE_0: 0xff45f8fd
  INSTDONE_1: 0x000fffdf
  INSTDONE_2: 0x00000000
  INSTDONE_3: 0x00000000
render command stream:
  HEAD: 0x69c1aa18
  TAIL: 0x0001aaf8
  CTL: 0x0001f001
  HWS: 0x00000000

Comment 4 Steven Ellis 2014-09-20 04:39:04 UTC
Looks like a way  to reproduce the bug is to run Tweetdeck within Google-Chrome

Comment 5 Steven Ellis 2014-09-21 05:46:30 UTC
Created attachment 939699 [details]
output of /sys/class/drm/card0/error from google chrome crash on Macbook

Comment 6 Steven Ellis 2014-09-21 05:47:44 UTC
Identified the same bug on the Intel Mesa bugzilla.

Most recent crash output below

Sep 21 17:31:42 macdora kernel: [drm] stuck on render ring
Sep 21 17:31:42 macdora kernel: [drm] GPU HANG: ecode 0:0x9f47f9fd, in chrome [3036], reason: Ring hung, action: reset
Sep 21 17:31:42 macdora kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Sep 21 17:31:42 macdora kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Sep 21 17:31:42 macdora kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Sep 21 17:31:42 macdora kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Sep 21 17:31:42 macdora kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Sep 21 17:31:43 macdora kernel: [drm:i915_reset] *ERROR* Failed to reset chip: -110
Sep 21 17:31:47 macdora kernel: Watchdog[3043]: segfault at 0 ip 00007f59e58239de sp 00007f59cd944670 error 6 in chrome[7f59e16cd000+547d000]
Sep 21 17:31:52 macdora kernel: [drm:i915_gem_wait_for_error] *ERROR* Timed out waiting for the gpu reset to complete
Sep 21 17:31:52 macdora kernel: [drm] GMBUS [i915 gmbus vga] timed out, falling back to bit banging on pin 2
Sep 21 17:31:52 macdora kernel: ------------[ cut here ]------------
Sep 21 17:31:52 macdora kernel: WARNING: CPU: 1 PID: 1833 at drivers/gpu/drm/i915/intel_display.c:931 assert_pll+0x68/0x70 [i915]()
Sep 21 17:31:52 macdora kernel: PLL state assertion failure (expected on, current off)
Sep 21 17:31:52 macdora kernel: Modules linked in: tcp_lp fuse ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM iptable_mangle tun bridge stp llc ebtable_nat ebtables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter xt_physdev ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack wl(POE) cfg80211 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device iTCO_wdt iTCO_vendor_support joydev snd_pcm appletouch hid_appleir coretemp applesmc input_polldev snd_timer microcode i2c_i801 rfkill sky2 snd shpchp lpc_ich mfd_core sbs acpi_cpufreq sbshc soundcore apple_bl binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc firewire_ohci i915 firewire_core i2c_algo_bit crc_itu_t drm_kms_helper ata_generic pata_acpi drm i2c_core video
Sep 21 17:31:52 macdora kernel: 
Sep 21 17:31:52 macdora kernel: CPU: 1 PID: 1833 Comm: upowerd Tainted: P        W  OE 3.15.10-201.fc20.x86_64 #1
Sep 21 17:31:52 macdora kernel: Hardware name: Apple Inc. MacBook4,1/Mac-F22788A9, BIOS     MB41.88Z.00C1.B00.0802091535 02/09/08
Sep 21 17:31:52 macdora kernel:  0000000000000000 0000000072dd7e7a ffff8801373ff9a8 ffffffff816ef848
Sep 21 17:31:52 macdora kernel:  ffff8801373ff9f0 ffff8801373ff9e0 ffffffff8108927d 0000000000000001
Sep 21 17:31:52 macdora kernel:  000000000000a800 ffff8800b7ceb000 ffff880037db5000 0000000000000001
Sep 21 17:31:52 macdora kernel: Call Trace:
Sep 21 17:31:52 macdora kernel:  [<ffffffff816ef848>] dump_stack+0x45/0x56
Sep 21 17:31:52 macdora kernel:  [<ffffffff8108927d>] warn_slowpath_common+0x7d/0xa0
Sep 21 17:31:52 macdora kernel:  [<ffffffff810892fc>] warn_slowpath_fmt+0x5c/0x80
Sep 21 17:31:52 macdora kernel:  [<ffffffffa00da55b>] ? gen4_read32+0x4b/0xc0 [i915]
Sep 21 17:31:52 macdora kernel:  [<ffffffffa00e8008>] assert_pll+0x68/0x70 [i915]
Sep 21 17:31:52 macdora kernel:  [<ffffffffa00ed931>] intel_crtc_load_lut+0x1c1/0x1e0 [i915]
Sep 21 17:31:52 macdora kernel:  [<ffffffffa00edb66>] i9xx_crtc_enable+0x216/0x420 [i915]
Sep 21 17:31:52 macdora kernel:  [<ffffffffa00f0947>] __intel_set_mode+0x827/0x1640 [i915]
Sep 21 17:31:52 macdora kernel:  [<ffffffffa00f3e16>] intel_set_mode+0x16/0x30 [i915]
Sep 21 17:31:52 macdora kernel:  [<ffffffffa00f404c>] intel_get_load_detect_pipe+0x21c/0x4c0 [i915]
Sep 21 17:31:52 macdora kernel:  [<ffffffffa011d4fd>] intel_tv_detect+0x10d/0x560 [i915]
Sep 21 17:31:52 macdora kernel:  [<ffffffff811f6666>] ? path_openat+0x176/0x670
Sep 21 17:31:52 macdora kernel:  [<ffffffffa002fd5e>] status_show+0x3e/0x80 [drm]
Sep 21 17:31:52 macdora kernel:  [<ffffffff8145a050>] dev_attr_show+0x20/0x60
Sep 21 17:31:52 macdora kernel:  [<ffffffff816f53c2>] ? mutex_lock+0x12/0x2f
Sep 21 17:31:52 macdora kernel:  [<ffffffff81262fcc>] sysfs_kf_seq_show+0xcc/0x1e0
Sep 21 17:31:52 macdora kernel:  [<ffffffff81261963>] kernfs_seq_show+0x23/0x30
Sep 21 17:31:52 macdora kernel:  [<ffffffff8120a4fa>] seq_read+0x16a/0x3b0
Sep 21 17:31:52 macdora kernel:  [<ffffffff812621b5>] kernfs_fop_read+0xf5/0x160
Sep 21 17:31:52 macdora kernel:  [<ffffffff811e65bb>] vfs_read+0x9b/0x160
Sep 21 17:31:52 macdora kernel:  [<ffffffff811e7225>] SyS_read+0x55/0xd0
Sep 21 17:31:52 macdora kernel:  [<ffffffff816ff969>] system_call_fastpath+0x16/0x1b
Sep 21 17:31:52 macdora kernel: ---[ end trace de7443cb8fe1173c ]---
Sep 21 17:31:52 macdora kernel: ------------[ cut here ]------------
Sep 21 17:31:52 macdora kernel: WARNING: CPU: 1 PID: 1833 at drivers/gpu/drm/i915/intel_display.c:931 assert_pll+0x68/0x70 [i915]()
Sep 21 17:31:52 macdora kernel: PLL state assertion failure (expected on, current off)
Sep 21 17:31:52 macdora kernel: Modules linked in: tcp_lp fuse ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM iptable_mangle tun bridge stp llc ebtable_nat ebtables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter xt_physdev ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack wl(POE) cfg80211 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device iTCO_wdt iTCO_vendor_support joydev snd_pcm appletouch hid_appleir coretemp applesmc input_polldev snd_timer microcode i2c_i801 rfkill sky2 snd shpchp lpc_ich mfd_core sbs acpi_cpufreq sbshc soundcore apple_bl binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc firewire_ohci i915 firewire_core i2c_algo_bit crc_itu_t drm_kms_helper ata_generic pata_acpi drm i2c_core video

Comment 7 Christopher Tubbs 2014-11-22 04:44:33 UTC
I'm seeing pretty much the same (or nearly the same) issue in F20 playing YouTube videos in Google Chrome on my Dell Inspiron 1420 (i915 GPU). It's multiple times a day now... on nearly every other HTML5 video I play. The worst part is that Chrome doesn't even crash... it keeps playing the video in the background, but the Gnome "fail whale" blocks the screen and forces me to log out anyway.

Comment 8 Christopher Tubbs 2014-11-22 04:50:43 UTC
Created attachment 960067 [details]
gpu crash dump in /sys/class/drm/card0/error on Dell Inspiron 1420

Comment 9 Fedora End Of Life 2015-05-29 09:58:41 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 10 Fedora End Of Life 2015-06-29 13:29:42 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.