1843274 – i915 GPU Hang with kernel 5.7 on Haswell (Acer C720P Chromebook)

Bug 1843274 - i915 GPU Hang with kernel 5.7 on Haswell (Acer C720P Chromebook)

Summary: i915 GPU Hang with kernel 5.7 on Haswell (Acer C720P Chromebook)

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-06-02 23:56 UTC by Dale Turner
Modified:	2022-11-30 00:43 UTC (History)
CC List:	38 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2022-11-30 00:43:50 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Tail of dmesg showing gpu hang (1.04 KB, text/plain) 2020-06-02 23:56 UTC, Dale Turner	no flags	Details
GPU Crash Dump (27.42 KB, text/plain) 2020-06-02 23:58 UTC, Dale Turner	no flags	Details
sys class drm card0 error (3.10 KB, text/plain) 2020-07-20 17:16 UTC, tankey	no flags	Details
sys class drm card0 error on rc3 vanilla (26.77 KB, text/plain) 2020-07-20 17:24 UTC, tankey	no flags	Details
dmesg with lots of information (+boot context +i915 context) with disable_power_well=0 (1.03 MB, text/plain) 2020-07-20 19:11 UTC, tankey	no flags	Details
with 5.7.8 fedora kernel (30.92 KB, text/plain) 2020-07-20 20:45 UTC, tankey	no flags	Details
gpu logs on 5.7.9-200 Fedora kernel with default boot options (24.70 KB, text/plain) 2020-07-21 21:07 UTC, tankey	no flags	Details
(for memory : on vanilla kernel) 5.8.3 (19.85 KB, text/plain) 2020-08-24 19:09 UTC, tankey	no flags	Details
(vanilla kernel) 5.8.3 without initrd (21.47 KB, text/plain) 2020-08-24 19:10 UTC, tankey	no flags	Details
(vanilla kernel) 5.8.3 + boot options i915_enable_dc=0 (19.12 KB, text/plain) 2020-08-24 19:11 UTC, tankey	no flags	Details
(vanilla kernel) 5.8.3 + boot options i915_enable_dc=0 + cstate=1 (19.13 KB, text/plain) 2020-08-24 19:11 UTC, tankey	no flags	Details
(vanilla) now without any i915 firmwares (19.16 KB, text/plain) 2020-08-24 19:15 UTC, tankey	no flags	Details
5.9.0-rc7 gpu hangs reports (84.10 KB, text/plain) 2020-10-04 12:03 UTC, tankey	no flags	Details
5.9.9 + patch screenshot (61.94 KB, image/png) 2020-11-20 23:32 UTC, tankey	no flags	Details
[stock stable kernel up to date] sys class drm card0 error (17.30 KB, text/plain) 2020-11-24 02:55 UTC, tankey	no flags	Details
[rawhide kernel] drm error (21.61 KB, text/plain) 2020-11-24 02:57 UTC, tankey	no flags	Details
[rawhide] dmesg (115.23 KB, text/plain) 2020-11-24 02:57 UTC, tankey	no flags	Details
[rawhide] drm state (2.93 KB, text/plain) 2020-11-24 02:58 UTC, tankey	no flags	Details
View All

Description Dale Turner 2020-06-02 23:56:59 UTC

Created attachment 1694628 [details]
Tail of dmesg showing gpu hang

1. Please describe the problem:
Any time I boot into a kernel in the 5.7 series (RCs up to the current stable) I get a gpu hang (with sway and openbox). I can start the WM, and usually start a terminal, but starting anything else (firefox, caja, a gome, etc.) the gui hangs. I can kill the gui via a VT and restart it, but the problem continiues. To use my computer, I must boot into a kernel in the 5.6 series. I filed a bug upstream https://gitlab.freedesktop.org/drm/intel/-/issues/1805

2. What is the Version-Release number of the kernel:
5.7.0-1.fc33.x86_64 and all 5.7 release candidates before it

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
The first kernel 5.7 RC I tried

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
Always with a 5.7 kernel as illustarted in #1 above.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Yes as stated above.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Dale Turner 2020-06-02 23:58:39 UTC

Created attachment 1694629 [details]
GPU Crash Dump

Comment 2 tankey 2020-07-20 17:10:35 UTC

Identical issue here, with all kernel ≥ 5.7 :
Fedora 32 kernel update ;
And Vanilla kernel.

The result is unable to perform kernel updates on my machine (Acer Aspire C720P with Coreboot and Fedora since 6 years) :-(
(same issue on xorg or wayland)

i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:8edcfc7b

Comment 3 tankey 2020-07-20 17:16:57 UTC

Created attachment 1701790 [details]
sys class drm card0 error

Comment 4 tankey 2020-07-20 17:24:43 UTC

Created attachment 1701791 [details]
sys class drm card0 error on rc3 vanilla

Comment 5 tankey 2020-07-20 17:39:32 UTC

Comment on attachment 1701791 [details]
sys class drm card0 error on rc3 vanilla

On rc5 vanilla  (sorry for typo, not rc3 but this attachment is rc5 drm dump) 
same issue on all fedora kernel stable update, rawhide, and all ≥5.7 rc from kernel.org : all tested (and trying many workarounds)

Comment 6 tankey 2020-07-20 18:37:58 UTC

i915.enable_rc6=0 : doesnt seems to be honored anymore ( systool -m i915 -v no report about, and /sys/class/drm/card0/power/rc6_enable always return 1 : maybe/certainly i am doing thing wrong)
i915.enable_guc=0 : doesnt seems to have any effect
i915.enable_hangcheck=0 : completely freeze

Comment 7 tankey 2020-07-20 19:11:57 UTC

Created attachment 1701804 [details]
dmesg with lots of information (+boot context +i915 context) with disable_power_well=0

Hope this helps upstream
(i am at your disposal for the tests you require)

Comment 8 tankey 2020-07-20 20:45:12 UTC

Created attachment 1701810 [details]
with 5.7.8 fedora kernel

dmesg i915 +drm/card0/error +i915 options
on stock fedora 32 kernel

(with or without with boot option intel_iommu=on, same issue)

Comment 9 tankey 2020-07-20 21:11:33 UTC

drm @freedesktop : 
https://cgit.freedesktop.org/drm-tip/commit/?id=4b665377a8730e8882d976dce641ff7d8391dd98
https://gitlab.freedesktop.org/drm/intel/-/issues/2024

(and same issue on this https://bugzilla.redhat.com/show_bug.cgi?id=1457669 old one  but still active ?)

Comment 10 tankey 2020-07-20 21:30:15 UTC

an interesting discussion on the same (or near ?) problem : https://bbs.archlinux.org/viewtopic.php?id=256520&p=2
unfortunately despite the work of Dario and Loqs around the Clear kernel options, issue the same here (with stock 5.7 Fedora Kernel or Vanilla 5.8-rc5) with the intel_iommu=on,igf_off boot option.

I think I have explored all the possibilities that were availables to me, from 5.7 fedora to 5.8 vanilla with many tries and workarounds each time. I hope I don't have to throw this (beautiful) computer in the trash, because it works perfectly (with 6 hours of battery life) and it is not obsolete !

Comment 11 tankey 2020-07-21 21:07:37 UTC

Created attachment 1702001 [details]
gpu logs on 5.7.9-200 Fedora kernel with default boot options

Comment 12 tankey 2020-08-24 19:09:28 UTC

Created attachment 1712424 [details]
(for memory : on vanilla kernel) 5.8.3

Comment 13 tankey 2020-08-24 19:10:02 UTC

Created attachment 1712425 [details]
(vanilla kernel) 5.8.3 without initrd

Comment 14 tankey 2020-08-24 19:11:19 UTC

Created attachment 1712426 [details]
(vanilla kernel) 5.8.3 + boot options i915_enable_dc=0

Comment 15 tankey 2020-08-24 19:11:57 UTC

Created attachment 1712427 [details]
(vanilla kernel) 5.8.3 + boot options i915_enable_dc=0 + cstate=1

Comment 16 tankey 2020-08-24 19:15:52 UTC

Created attachment 1712428 [details]
(vanilla) now without any i915 firmwares

Same bug without any i915 firmwares
(<5.7 works smootlhy without fw, but >5.7 gpu hang again and again, here on a vanilla kernel, same on fedora stable same on fedora rawhide kernel)

Comment 17 Emilio 2020-09-26 23:10:06 UTC

Same here, remaining on 5.6 until this bug is fixed in the kernel. Latest attempt was 5.8.10.

Adding more information from lspci:

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
        Subsystem: Gigabyte Technology Co., Ltd Device d000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 26
        Region 0: Memory at f7800000 (64-bit, non-prefetchable) [size=4M]
        Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at f000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee01004  Data: 4021
        Capabilities: [d0] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a4] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: i915
        Kernel modules: i915

Comment 18 tankey 2020-10-04 12:03:29 UTC

Created attachment 1718750 [details]
5.9.0-rc7 gpu hangs reports

Comment 19 Vittorio 2020-10-06 09:31:10 UTC

Emilio, I have your same experience on a desktop with same
VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)

on an ASROCK H81M-DGS motherboard.
And I am stuck with kernel 5.6.

Comment 20 Ondřej Kolín 2020-10-07 16:53:32 UTC

I am affected by this bug as well. Had to go back to 5.6.x Fedora 32, both Wayland and Xorg

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)

Related syslog entry says (https://pastebin.com/zh8x74R6, can provide more information interested):
Oct 07 18:25:17 localhost.localdomain kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:8edcfc79, in gnome-shell [1672]
Oct 07 18:25:17 localhost.localdomain kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Oct 07 18:25:17 localhost.localdomain kernel: i915 0000:00:02.0: [drm] gnome-shell[1672] context reset due to GPU hang

Comment 21 Zachary 2020-10-08 19:08:03 UTC

Me too. Various 5.6.* all fine. Various 5.7.* & 5.8.* all fail ..

kernel: i915 0000:00:02.0: GPU HANG: ecode 7:1:85ddfffd, in Xorg [9919]
kernel: i915 0000:00:02.0: Resetting chip for stopped heartbeat on rcs0
kernel: i915 0000:00:02.0: Xorg[9919] context reset due to GPU hang

lspci ..

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
        DeviceName:  Onboard IGD
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7851
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 30
        Region 0: Memory at f7800000 (64-bit, non-prefetchable) [size=4M]
        Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at f000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee02004  Data: 4026
        Capabilities: [d0] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a4] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: i915
        Kernel modules: i915

Comment 22 Nigel J. Terry 2020-11-03 02:58:46 UTC

Same problem on my iMac. 

00:02.0 VGA compatible controller: Intel Corporation Device 0d22 (rev 08) (prog-if 00 [VGA controller])
        Subsystem: Apple Inc. Device 0122
        Flags: bus master, fast devsel, latency 0, IRQ 39
        Memory at 98000000 (64-bit, non-prefetchable) [size=4M]
        Memory at 90000000 (64-bit, prefetchable) [size=128M]
        I/O ports at 2000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 2
        Capabilities: [a4] PCI Advanced Features
        Kernel driver in use: i915
        Kernel modules: i915

Comment 23 Nigel J. Terry 2020-11-03 03:55:10 UTC

I don't know if this helps, maybe eliminates some potential causes. It is WORKING on my laptop, which has different Intel graphics. At least some Intel graphics work :-)

Linux localhost.localdomain 5.8.16-300.fc33.x86_64 #1 SMP Mon Oct 19 13:18:33 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07) (prog-if 00 [VGA controller])
        DeviceName: Intel Kabylake UHD Graphics ULT GT2
        Subsystem: Hewlett-Packard Company Device 83fa
        Flags: bus master, fast devsel, latency 0, IRQ 128
        Memory at b0000000 (64-bit, non-prefetchable) [size=16M]
        Memory at a0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at 4000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [40] Vendor Specific Information: Len=0c <?>
        Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 2
        Capabilities: [100] Process Address Space ID (PASID)
        Capabilities: [200] Address Translation Service (ATS)
        Capabilities: [300] Page Request Interface (PRI)
        Kernel driver in use: i915
        Kernel modules: i915

Comment 24 Vittorio 2020-11-03 09:18:23 UTC

Driver i915 1.6.0 works fine with build 20200114.
After that it fails, as in builds 20200313 and 20200515.
What changed from 20200114 to 20200313?

Comment 25 Menno 2020-11-12 08:39:24 UTC

Hi folks,

I suffer myself with this issue also a long time. Stil on a 5.6 kernel.

Saw this issue upstream: https://gitlab.freedesktop.org/drm/intel/-/issues/2413
which is resolved with this patch: https://patchwork.freedesktop.org/patch/395580/?series=82783&rev=1

However I am not in the situation right now to test it myself on Fedora.

Maybe it is related with your can help you.

Comment 26 Vittorio 2020-11-12 10:10:39 UTC

I believe the culprit is in a modification of driver i915 version 1.6.0

Build 20200114 is fine, and I am using it on kernel 5.6

Build 20200313 and successive do not work, from kernel 5.7

Comment 27 Nigel J. Terry 2020-11-13 14:22:05 UTC

I have built both F32 & F33 versions of the 5.9.8 kernel with the above patch https://patchwork.freedesktop.org/patch/395580/?series=82783&rev=1

Both are running fine (so far). At least they both boot, which the earlier ones did not. It seems that the patch has fixed my problems on my iMac.

Comment 28 Menno 2020-11-18 12:56:51 UTC

Last sunday I dit the same thing (built the F33 version 5.9.8-200 with the patch https://patchwork.freedesktop.org/patch/395580/?series=82783&rev=1) on my hardware:


00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
	Subsystem: ASRock Incorporation Device 0402
	Flags: bus master, fast devsel, latency 0, IRQ 29
	Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	I/O ports at f000 [size=64]
	Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [d0] Power Management version 2
	Capabilities: [a4] PCI Advanced Features
	Kernel driver in use: i915
	Kernel modules: i915


Can confirm that this patch fixes the GPU HANG issues on my system.

Comment 29 Nerijus Baliūnas 2020-11-18 13:32:29 UTC

Could please anyone share the builds?

Comment 30 Menno 2020-11-18 15:25:04 UTC

(In reply to Nerijus Baliūnas from comment #29)
> Could please anyone share the builds?

My build (without the debug rpms) is here: http://www.mediafire.com/folder/hrtjd0b1logi8/x86_64

Also note that the version numbering of mine is not accurate. It is the same version number as the offical F33.

Comment 31 Nerijus Baliūnas 2020-11-19 13:12:05 UTC

Unfortunately your build did not help here:
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07) (prog-if 00 [VGA controller])
	Subsystem: Lenovo Device 225c
	Flags: bus master, fast devsel, latency 0, IRQ 142
	Memory at 2ffa000000 (64-bit, non-prefetchable) [size=16M]
	Memory at b0000000 (64-bit, prefetchable) [size=256M]
	I/O ports at e000 [size=64]
	Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
	Capabilities: [40] Vendor Specific Information: Len=0c <?>
	Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
	Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [d0] Power Management version 2
	Capabilities: [100] Process Address Space ID (PASID)
	Capabilities: [200] Address Translation Service (ATS)
	Capabilities: [300] Page Request Interface (PRI)
	Kernel driver in use: i915
	Kernel modules: i915

/sys/class/drm/card0/error:
GPU HANG: ecode 9:1:85dffffb, in Xwayland [2587]
Kernel: 5.9.8-200.fc33.x86_64 x86_64
Driver: 20200715
Time: 1605790670 s 675207 us
Boottime: 3541 s 672470 us
Uptime: 3539 s 87386 us
Capture: 4298208768 jiffies; 615242 ms ago
Active process (on ring rcs0): Xwayland [2587]
Reset count: 0
Suspend count: 0
Platform: KABYLAKE
Subplatform: 0x0
PCI ID: 0x5917
PCI Revision: 0x07
PCI Subsystem: 17aa:225c
IOMMU enabled?: 1
DMC loaded: yes
DMC fw version: 1.4
RPM wakelock: yes
PM suspended: no
GT awake: yes

Comment 32 Vittorio 2020-11-20 10:45:22 UTC

(In reply to Menno from comment #30)

Menno, your build works on my PC but my VGA is same as yours:

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)

Comment 33 tankey 2020-11-20 23:32:05 UTC

Created attachment 1731512 [details]
5.9.9 + patch screenshot

5.9.9 + patch screenshot : no more error collected / huge thanks to Chris Wilson (and Menno for pointing here !)

Comment 34 tankey 2020-11-20 23:40:20 UTC

(In reply to tankey from comment #33)
> Created attachment 1731512 [details]
>  
> 5.9.9 + patch screenshot : no more error collected / huge thanks to Chris
> Wilson (and Menno for pointing here !)

text :
[tankey@localhost ~]$ cat /etc/fedora-release
Fedora release 32 (Thirty Two)
[tankey@localhost ~]$
[tankey@localhost ~]$ free -m
              total        used        free      shared  buff/cache   available
Mem:           1801         371         911          74         518        1152
Swap:          3930           0        3930
[tankey@localhost ~]$
[tankey@localhost ~]$ systemd-analyze
Startup finished in 1.073s (kernel) + 4.198s (userspace) = 5.272s
graphical.target reached after 4.169s in userspace
[tankey@localhost ~]$
[tankey@localhost ~]$ grep "model name" /proc/cpuinfo  |uniq
model name      : Intel(R) Celeron(R) 2955U @ 1.40GHz
[tankey@localhost ~]$
[tankey@localhost ~]$ su -
Mot de passe :
[root@localhost ~]# cat /sys/class/drm/card0/error
No error state collected
[root@localhost ~]#
[root@localhost ~]# uname -rv
5.9.9-BZ1843274 #1 SMP Fri Nov 20 21:34:42 CET 2020
[root@localhost ~]#
[root@localhost ~]# date
sam. 21 nov. 2020 00:24:37 CET

Comment 35 tankey 2020-11-21 04:33:34 UTC

in case of someone want a build for c720p :  https://equilibriste.org/index.php/s/txdaszMDyjnC5Dk
please read the readme.txt

Comment 36 Vittorio 2020-11-21 06:04:47 UTC

Comment on attachment 1731512 [details]
5.9.9 + patch screenshot

Most important is /sbin/lspci|grep VGA

Comment 37 tankey 2020-11-21 07:55:29 UTC

(In reply to Vittorio from comment #36)
> Comment on attachment 1731512 [details]
> 5.9.9 + patch screenshot
> 
> Most important is /sbin/lspci|grep VGA

one again : Haswell-ULT Integrated Graphics Controller

Comment 38 tankey 2020-11-24 02:55:37 UTC

Created attachment 1732801 [details]
[stock stable kernel up to date] sys class drm card0 error

Comment 39 tankey 2020-11-24 02:57:16 UTC

Created attachment 1732802 [details]
[rawhide kernel] drm error

Comment 40 tankey 2020-11-24 02:57:53 UTC

Created attachment 1732803 [details]
[rawhide] dmesg

Comment 41 tankey 2020-11-24 02:58:28 UTC

Created attachment 1732804 [details]
[rawhide] drm state

Comment 42 Nigel J. Terry 2020-11-26 22:01:58 UTC

I just updated to vanilla 5.9.10-200.fc33.x86_64 and the problem appears resolved. Seems the fix has made it into the kernel

Happy Thanksgiving!!

Comment 43 Vittorio 2020-11-27 06:24:50 UTC

After installing kernel-5.9.10-200.fc33.x86_64 the issue is still there.

My VGA is 

VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)

Comment 44 Menno 2020-11-27 10:41:02 UTC

Same here,

Nov 27 11:30:45 hoppie.home kernel: Linux version 5.9.10-200.fc33.x86_64 (mockbuild.fedoraproject.org) (gcc (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6), GNU ld version 2.35-14.fc33) #1 SMP Mon Nov 23 18:12:50 UTC 2020
Nov 27 11:31:52 hoppie.home kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:85ddfffd, in gnome-shell [4166]

lspci -v -s 00:02.0

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
	Subsystem: ASRock Incorporation Device 0402
	Flags: bus master, fast devsel, latency 0, IRQ 29
	Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	I/O ports at f000 [size=64]
	Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [d0] Power Management version 2
	Capabilities: [a4] PCI Advanced Features
	Kernel driver in use: i915
	Kernel modules: i915

Comment 45 Rocco 2020-12-04 07:01:27 UTC

Same thing happening to me.

I had to load kernel 5.9.9-200.fc33.x86_64 because 5.9.10-200 was causing my system to hang.

I do not see a journal log for the previous boot.


$ lspci -v -s 0:02
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 515 (rev 07) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. Device 1cfd
        Flags: bus master, fast devsel, latency 0, IRQ 125
        Memory at de000000 (64-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at f000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: i915
        Kernel modules: i915

$ sudo dmidecode -t SYSTEM
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.

Handle 0x0001, DMI type 1, 27 bytes
System Information
        Manufacturer: ASUSTeK COMPUTER INC.
        Product Name: UX305CA
        Version: 1.0       
        Serial Number: REDACTED     
        UUID: REDACTED
        Wake-up Type: Power Switch
        SKU Number: ASUS-NotebookSKU
        Family: UX

Handle 0x000C, DMI type 32, 20 bytes
System Boot Information
        Status: No errors detected

Comment 46 Rocco 2020-12-04 20:44:03 UTC

FYI, I just installed 5.9.11-200 and I am no longer affected by this bug.

$ uname -rv
5.9.11-200.fc33.x86_64 #1 SMP Tue Nov 24 18:18:01 UTC 2020

Comment 47 leonid naumenko 2020-12-05 12:46:00 UTC

Processor Intel G3260, Integrated video driver only, Fedora 32 (KDE Plasma). Update to Kernel 5.9.11-100.fc32. After i enter my password in login manager (SDDM) i only see a black screen with cursor.

Dec 05 15:05:24 localhost.localdomain kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:85ddfffd, in plasmashell [1477]
Dec 05 15:05:24 localhost.localdomain kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Dec 05 15:05:24 localhost.localdomain kernel: i915 0000:00:02.0: [drm] plasmashell[1477] context reset due to GPU hang
Dec 05 15:05:24 localhost.localdomain kernel: [drm:intel_gt_verify_workarounds [i915]] *ERROR* GT workaround lost on init! (e184=0/0, expected 2000200)

Comment 48 Ondřej Kolín 2020-12-18 21:38:24 UTC

Onfortunatly, this bug seems present on kernel-core-5.9.14-100.fc32.x86_64 (more specs are in previous comment), I have to freeze the kernel updates again.

Comment 49 Emilio 2020-12-18 23:27:38 UTC

Thanks for all the support/help in this bug report. I've moved on to AMD Ryzen 5 3400G with Radeon Vega Graphics

Comment 50 Davide Cesari 2020-12-23 11:41:53 UTC

This is just to inform that the same bug just appeared on CentOS 8 when upgrading from kernel 4.18.0-193.28.1.el8_2.x86_64 to 4.18.0-240.1.1.el8_3.x86_64, dmesg in the two cases says respectively:

 - Initialized i915 1.6.0 20190619
 - Initialized i915 1.6.0 20200114

Why was this buggy driver backported? Need to stick with the old kernel.

Comment 51 Dale Turner 2021-01-10 23:34:15 UTC

So, this bug has been getting some press lately. I finally built the rawhide kernel with the proposed patch ( https://lists.freedesktop.org/archives/intel-gfx/2021-January/257559.html ).

I put it in my COPR if anyone is interested. I'm using as I type this. https://copr.fedorainfracloud.org/coprs/dturner/TOS/build/1874571/

Thanks everyone!

Comment 52 Dale Turner 2021-01-18 22:26:29 UTC

The fix for this has been included in the kernel-5.11 rc4, which is now available on koji. I can confirm it works for me (Acer C720P - Haswell).

Comment 53 Domenico Ferrari 2021-02-15 13:44:33 UTC

Using kernel 5.11.0-0.rc7.149.fc34.x86_64 on Intel G3420.
So far, so good.

thanks

Comment 54 Hans de Goede 2021-02-15 13:55:27 UTC

There have also been reported some issues with the i915 mitigation stuff in bug 1925346

Testing has shown that to fully fix the issues with the new i915 mitigation stuff on Haswell the following commits are necessary on top of 5.11 :

e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")

Comment 55 Steeve McCauley 2021-02-28 16:13:49 UTC

Similar issue here.  I've also tried 5.11.2 but was still seeing significant problems.  Adding i915.mitigations=off fixed the problem.

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
	DeviceName:  Onboard IGD
	Subsystem: Dell Device 05a5
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 29
	Region 0: Memory at f7800000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at f000 [size=64]
	Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee00018  Data: 0000
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a4] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: i915
	Kernel modules: i915

Installing 5.10.19-200 from updates-testing seems to have fixed the problem for me,

sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-79396b21b2

Comment 56 Hans de Goede 2021-02-28 16:19:33 UTC

(In reply to Steeve McCauley from comment #55)
> Similar issue here.  I've also tried 5.11.2 but was still seeing significant problems

Hmm, where did you get your 5.11.2 build from ?

The 5.11.2-300 from Fedora: https://koji.fedoraproject.org/koji/buildinfo?buildID=1715703

Has the same fixes that were added to 5.10.19-200, so if you are still seeing issues with that specific build (we added some fixes as downstream patches), then I need to go over the 5.10.y changelog to see if there are somehow fixes there which are not in 5.11.2 .

Comment 57 Steeve McCauley 2021-02-28 17:28:49 UTC

It was from kernel.org.

No problems with the fedora 5.10-19-200 so far, so it's looking good.

Sorry for the confusion.

Comment 58 Hans de Goede 2021-02-28 20:07:13 UTC

(In reply to Steeve McCauley from comment #57)
> It was from kernel.org.
> 
> No problems with the fedora 5.10-19-200 so far, so it's looking good.
> 
> Sorry for the confusion.

No problem. It would be good if you can give the Fedora 5.11.2 kernel a try, it should work, but if it does not now would be a good time to find out (before we start pushing 5.11.y kernels to the updates repo), you can grab it here:

https://koji.fedoraproject.org/koji/buildinfo?buildID=1715703

Generic install instructions for installing a kernel from koji (the Fedora buildsystem) are here:

https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt

Note since this is an official build, there should be no need to disable secure-boot in this case.

Comment 59 Steeve McCauley 2021-02-28 22:25:16 UTC

No problems with it so far, seems stable and gpu isn't hanging (as 5.10.19-200).

$ sudo rpm -ivh --oldpackage kernel-core-5.11.2-300.fc34.x86_64.rpm kernel-modules-5.11.2-300.fc34.x86_64.rpm

reboot

$ cat /proc/cmdline 
BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.11.2-300.fc34.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet

Comment 60 Steeve McCauley 2021-02-28 23:01:21 UTC

Just as an aside, my kernel build time when from 44+ minutes to 8 minutes with these fixes!

$ grep "build time" *.out
build_20210224.out:00:46:13 linux-5.11.1> INFO 2021-02-24 18:12:51> Kernel build time 46m 13s - sudo wait 0s
build_20210227.out:00:44:40 linux-5.11.2> INFO 2021-02-27 09:24:17> Kernel build time 44m 40s - sudo wait 0s
build_20210228.out:00:08:24 linux-5.11.2> INFO 2021-02-28 13:32:35> Kernel build time 8m 24s - sudo wait 0s

this was after doing "make distclean"

And the CPU fan doesn't go insane during the rebuild.

Comment 61 Christian Kujau 2022-11-29 23:08:10 UTC

I haven't seen this happening in two years. Does this really apply to rawhide?

Comment 62 Dale Turner 2022-11-29 23:35:24 UTC

(In reply to Christian Kujau from comment #61)
> I haven't seen this happening in two years. Does this really apply to
> rawhide?

Yes. This seems to be fixed, for sure. This bug should probably be closed. 

Thanks, everyone!

Note You need to log in before you can comment on or make changes to this bug.

acaringi
airlied
bskeggs
dbranchini
dcesari
dev.rindeal+fedoraproject.org
domfe
emanuele
hdegoede
ichavero
itamar
jarodwilson
jeremy
jglisse
john.j5live
jonathan
josef
kernel-maint
lacyc3
leon.naumenko
lgoncalv
linville
masami256
mchehab
mjg59
m.schrage
nerijus
nigel
ondrej.kolin
pstodulk
redhat
rh
RMuscaritolo
steeve.mccauley
steved
tankey
vitti570
vondruch