Bug 1780800 - [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out
Summary: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 31
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-07 00:50 UTC by Chris Murphy
Modified: 2020-01-20 12:45 UTC (History)
37 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)
dmesg (132.39 KB, text/plain)
2019-12-07 00:50 UTC, Chris Murphy
no flags Details


Links
System ID Priority Status Summary Last Updated
freedesktop.org Gitlab drm intel issues 673 None None None 2019-12-10 21:24:53 UTC

Description Chris Murphy 2019-12-07 00:50:29 UTC
Created attachment 1642785 [details]
dmesg

1. Please describe the problem:

Complete GUI hang, cannot switch to tty, and cannot ssh into the machine either.


2. What is the Version-Release number of the kernel:
5.4.2-300.fc31.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

I've never had a full lockup like this until 5.4.2-300.fc31.x86_64, but only minimal testing with 5.4.0-2.fc32.x86_64.



4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Uncertain so far.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Uncertain of scope.


6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Chris Murphy 2019-12-07 00:51:25 UTC
00:02.0 VGA compatible controller [0300]: Intel Corporation Skylake GT2 [HD Graphics 520] [8086:1916] (rev 07) (prog-if 00 [VGA controller])
	Subsystem: Hewlett-Packard Company Device [103c:81a0]

model name	: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz

Comment 2 Chris Murphy 2019-12-07 01:09:29 UTC
Excerpts for search.


Dec 06 17:39:54 flap.local kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
Dec 06 17:39:54 flap.local kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 06 17:39:54 flap.local kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dec 06 17:39:57 flap.local kernel: Asynchronous wait on fence i915:gnome-shell[1470]:6952 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])

Comment 3 Chris Murphy 2019-12-10 21:24:53 UTC
Remove dup bug ID, replace with actual.

Comment containing commit reference that fixes this:
https://gitlab.freedesktop.org/drm/intel/issues/673#note_359912

Comment 4 Chris Murphy 2019-12-20 03:28:37 UTC
Still happens with 5.4.5-300.fc31.x86_64, but not 5.5.0rc1 or rc2.

Comment 5 Justin M. Forbes 2019-12-23 15:33:45 UTC
The patch was submitted to stable and rejected because it doesn't apply to 5.4.  I will give it a little time to see if it is properly backported before doing a 5.4.6 build.

Comment 6 youling257 2019-12-25 03:40:33 UTC
I have similar problem with kernel 5.5 rc3.

[  541.644847] Asynchronous wait on fence i915:surfaceflinger[1495]:119c8 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
[  546.268573] i915 0000:00:02.0: GPU HANG: ecode 8:1:0x84dfbffe, in surfaceflinger [1495], stopped heartbeat on rcs0
[  546.268622] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  546.268689] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  546.268755] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  546.268821] The GPU crash dump is required to analyze GPU hangs, so please always attach it.
[  546.268887] GPU crash dump saved to /sys/class/drm/card0/error
[  546.372596] i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0

Comment 7 Ingo Weiss 2020-01-06 16:00:10 UTC
Hi,

I'm experiencing the same issue on Fedora 31 with kernel 5.4.7-200.

Computer: Lenovo ThinkPad T580
GPU: Intel UHD 620
   00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07) (prog-if 00 [VGA
    controller])                                                                                  
           Subsystem: Lenovo Device 225a                                                          
           Flags: bus master, fast devsel, latency 0, IRQ 153                                     
           Memory at eb000000 (64-bit, non-prefetchable) [size=16M]                               
           Memory at a0000000 (64-bit, prefetchable) [size=256M]                                  
           I/O ports at e000 [size=64]                                                            
           [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]                             
           Capabilities: <access denied>                                                          
           Kernel driver in use: i915                                                             
           Kernel modules: i915
CPU: Intel Core i7-8650U
BIOS version: N27ET36W (1.22)

Reverting to 5.3.16-300 for the time being since it doesn't have this issue.

Comment 8 Chris Murphy 2020-01-06 16:43:01 UTC
Upstream issue reports that backporting the fix from 5.5 to 5.4 is non-trivial. And now there are a few attempts at reverting the change that introduced the problem, so even the revert is apparently not straightforward. Skylake and Kabylake CPUs are affected, but I'm not sure if it's all or a subset of those.

Comment 9 Diego Vasconcelos 2020-01-07 23:50:24 UTC
kernel: 5.4.8-200.fc31.x86_64
CPU: i5-8400
GPU: UHD Intel 630

i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
drm/i915 developers can then reassign to the right component if it's not a kernel issue.
jThe GPU crash dump is required to analyze GPU hangs, so please always attach it.
GPU crash dump saved to /sys/class/drm/card0/error
i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
i915 0000:00:02.0: Resetting chip for hang on rcs0
[drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}

Comment 10 Mirek Svoboda 2020-01-08 14:16:43 UTC
I experience the same issue. It never happened with 5.3 kernel.

My kernel:

```
Jan 08 10:04:23 localhost.localdomain kernel: microcode: microcode updated early to revision 0xca, date = 2019-09-26
Jan 08 10:04:23 localhost.localdomain kernel: Linux version 5.4.7-200.fc31.x86_64 (mockbuild@bkernel04.phx2.fedoraproject.org) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Tue Dec 31 22:25:12 UTC 2019
Jan 08 10:04:23 localhost.localdomain kernel: Command line: BOOT_IMAGE=(hd0,gpt5)/vmlinuz-5.4.7-200.fc31.x86_64 root=/dev/mapper/luks-b6994190-43c4-42f1-bc49-ab5cd4717038 ro rd.luks.uuid=luks-b6994190-43c4-42f1-bc49-ab5cd4717038 rd.lvm.lv=outer/fedora scsi_mod.use_blk_mq=1 noibrsnoibpb nopti nospectre_v2 nospectre_v1 l1tf=off nospec_store_bypass_disable no_stf_barrier mds=off mitigations=off
```

My hardware is i5-7200U.

Comment 11 Jakub Jankiewicz 2020-01-10 20:47:26 UTC
I have similar issue on Fedora 30

kernel: 5.4.8-100.fc30.x86_64.
Hardware: Laptop Dell Inspiron 15 5570 i7-8550U 

But in my case I was able to switch to tty after few tries. It sometimes freezing on ScreenSaver and sometimes don't (it's random). I've just upgraded to Fedora 30 from 29 few days ago, was not having issues with Fedora 29.

Comment 12 Rafal 2020-01-14 16:16:15 UTC
I experienced a complete hang without being able to do anything a few times over the past weeks with several kernel 5.4.X versions. Today, after update to 5.4.10, I experienced a hang which was released after a few seconds. Logs:

Jan 14 16:56:00 x1 kernel: i915 0000:00:02.0: Resetting rcs0 for stuck wait on rcs0
Jan 14 16:56:03 x1 kernel: Asynchronous wait on fence i915:gnome-shell[2073]:2672e timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Jan 14 16:56:08 x1 kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:0x85dfbfff, in code [3378], hang on rcs0
Jan 14 16:56:08 x1 kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jan 14 16:56:08 x1 kernel: Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Jan 14 16:56:08 x1 kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Jan 14 16:56:08 x1 kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Jan 14 16:56:08 x1 kernel: GPU crash dump saved to /sys/class/drm/card0/error
Jan 14 16:56:08 x1 kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0

Hardware:
Lenovo ThinkPad X1 Carbon 5th Gen

$ lspci -vs 00:02
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02) (prog-if 00 [VGA controller])
	Subsystem: Lenovo ThinkPad X1 Carbon 5th Gen
	Flags: bus master, fast devsel, latency 0, IRQ 130
	Memory at eb000000 (64-bit, non-prefetchable) [size=16M]
	Memory at 60000000 (64-bit, prefetchable) [size=256M]
	I/O ports at e000 [size=64]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: i915
	Kernel modules: i915

$ lscpu | grep 'Model name'
Model name:                      Intel(R) Core(TM) i5-7300U CPU @ 2.60GHz

Comment 13 Tadej Janež 2020-01-15 12:15:49 UTC
AFAICS, the backport patch "drm/i915/gt: Detect if we miss WaIdleLiteRestore" has been added to F31 2 days ago:
https://src.fedoraproject.org/rpms/kernel/c/9607b5faaa81022ed8b97f517c766202f9680744?branch=f31

It should be part of kernel-5.4.11-202.fc31:
https://bodhi.fedoraproject.org/updates/FEDORA-2020-3738c94456

And the new kernel-5.4.12-200.fc31:
https://bodhi.fedoraproject.org/updates/FEDORA-2020-e328697628

Comment 14 Chris Murphy 2020-01-16 20:50:03 UTC
In my opinion this patch should be reverted in Fedora kernels. It makes the problem unquestionably worse: it takes longer to experience the problem, but once it happens, it's a hard crash. I can't ssh in. I can't switch to a VT. System gets hot, fans go to max, and I have to force power off.

Comment 15 Jakub Jankiewicz 2020-01-16 22:37:21 UTC
I've always install new kernels, not sure which one it was (I think it was 5.4.10-100.fc30.x86_64, I've installed 5.4.11-102.fc30.x86_64 but didn't rebooted the system to take effect), but also got hard crash. I was not able to switch to TTY like previously, with few key hits.

Comment 16 Rafal 2020-01-17 09:14:50 UTC
I experienced hard crashes exactly like you described (including overheating) also with earlier versions, I think in both 5.4.7 and 5.4.8.


Note You need to log in before you can comment on or make changes to this bug.