Bug 1457669

Summary: [drm] GPU HANG: ecode 8:0:0x86dffffd, in gnome-shell
Product: Red Hat Enterprise Linux 7 Reporter: Han Han <hhan>
Component: xorg-x11-drv-intelAssignee: Adam Jackson <ajax>
Status: CLOSED WONTFIX QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.4CC: ayadav, corsaro, dyuan, rhel, tpelka, xuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-15 07:37:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg, system log, gpu crash dump
none
GPU_crash_dump.gz
none
crash dump of Intel Iris Graphics 6100 none

Description Han Han 2017-06-01 06:49:09 UTC
Created attachment 1284007 [details]
dmesg, system log, gpu crash dump

Description of problem:
GPU HANG when launch gnome-shell with 'intel_iommu=on' in kernel cmdline. 

Version-Release number of selected component (if applicable):
xorg-x11-drv-intel-2.99.917-26.20160929.el7.x86_64
gnome-shell-3.22.3-10.el7.x86_64
kernel-3.10.0-675.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Start host with kernel option 'intel_iommu=on'. When launching gnome-shell or do something in desktop, GPU will hang.

2. There is such error msg in dmesg:
[   59.815042] DMAR: DRHD: handling fault status reg 2
[   59.815049] DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr fad13000
DMAR:[fault reason 23] Unknown
[   71.960492] [drm] GPU HANG: ecode 8:0:0x86dffffd, in gnome-shell [3687], reason: Hang on render ring, action: reset
[   71.960494] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   71.960496] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   71.960497] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   71.960497] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   71.960498] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   71.961039] drm/i915: Resetting chip after gpu hang
[   85.922269] drm/i915: Resetting chip after gpu hang


Actual results:
As above

Expected results:
GPU not hang

Additional info:
My GPU is 'VGA compatible controller: Intel Corporation HD Graphics 5500 (rev 09)'.
When without 'intel_iommu=on', bug is not reproduced.
I will add dmesg, system log, gpu crash dump in attachment.
See also:
https://bugs.freedesktop.org/show_bug.cgi?id=89360
https://bugzilla.redhat.com/show_bug.cgi?id=1436908

Comment 2 GV 2017-09-07 18:26:45 UTC
Sep  7 18:36:48 xxxxxx kernel: [drm] GPU HANG: ecode 9:0:0x84dfbefc, in X [1877], reason: Hang on render ring, action: reset
Sep  7 18:36:48 xxxxxx kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Sep  7 18:36:48 xxxxxx kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Sep  7 18:36:48 xxxxxx kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Sep  7 18:36:48 xxxxxx kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Sep  7 18:36:48 xxxxxx kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Sep  7 18:36:48 xxxxxx kernel: drm/i915: Resetting chip after gpu hang
Sep  7 18:36:48 xxxxxx kernel: [drm] RC6 on
Sep  7 18:36:48 xxxxxx kernel: [drm] GuC firmware load skipped
Sep  7 18:37:01 xxxxxx kernel: drm/i915: Resetting chip after gpu hang
Sep  7 18:37:01 xxxxxx kernel: [drm] RC6 on
Sep  7 18:37:01 xxxxxx kernel: [drm] GuC firmware load skipped
Sep  7 18:37:16 xxxxxx kernel: drm/i915: Resetting chip after gpu hang
Sep  7 18:37:16 xxxxxx kernel: [drm] RC6 on
Sep  7 18:37:16 xxxxxx kernel: [drm] GuC firmware load skipped
Sep  7 18:37:29 xxxxxx kernel: drm/i915: Resetting chip after gpu hang
Sep  7 18:37:29 xxxxxx kernel: [drm] RC6 on
Sep  7 18:37:29 xxxxxx kernel: [drm] GuC firmware load skipped
Sep  7 18:37:41 xxxxxx kernel: drm/i915: Resetting chip after gpu hang
Sep  7 18:37:41 xxxxxx kernel: [drm] RC6 on
Sep  7 18:37:41 xxxxxx kernel: [drm] GuC firmware load skipped

# lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)

Comment 3 GV 2017-09-07 18:27:33 UTC
Created attachment 1323409 [details]
GPU_crash_dump.gz

Comment 4 Giovanni Grieco 2018-04-08 11:54:40 UTC
Description of problem:
GPU HANG when launch systemctl start graphical.target with kernel flag "intel_iommu=on" 

Version-Release number of selected component (if applicable):
xorg-x11-drv-intel-2.99.917-31.20171025.fc27.x86_64
gnome-shell-3.26.2-4.fc27.x86_64
kernel-4.15.14-300.fc27.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Start host with kernel option 'intel_iommu=on'. 
2. GPU will hang at the start of GDM login screen.

Actual results:
dmesg:
[  354.039739] [drm] GPU HANG: ecode 8:-1:0x00000000, reason: Kicking stuck wait on rcs0, action: continue
[  354.039740] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  354.039740] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  354.039741] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  354.039741] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  354.039742] [drm] GPU crash dump saved to /sys/class/drm/card0/error

Expected results:
GPU not hang

Additional info:
Hardware: Apple MacBook Pro Early 2015 13-inch

Comment 5 Giovanni Grieco 2018-04-08 11:58:52 UTC
Created attachment 1418889 [details]
crash dump of Intel Iris Graphics 6100

Comment 7 RHEL Program Management 2021-01-15 07:37:22 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.