Bug 2155242

Summary: [drm] GPU HANG: ecode
Product: [Fedora] Fedora Reporter: Ivan <rqs44471>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 37CC: acaringi, adscvr, airlied, alciregi, bskeggs, hdegoede, hpa, jarodwilson, jglisse, josef, jskarvad, kernel-maint, lgoncalv, linville, masami256, mchehab, ptalbert, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ivan 2022-12-20 13:50:36 UTC
1. Please describe the problem:
Total system freeze in random moments

2. What is the Version-Release number of the kernel:
Linux host 6.0.12-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Dec 8 16:58:47 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
From 5.19

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
Just open Nautilus, click twice in top of window to maximize.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Idkw

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Dec 20 14:21:37 host ModemManager[1444]: <info>  [sleep-monitor] system is about to suspend
Dec 20 14:21:37 host NetworkManager[1472]: <info>  [1671542497.1886] manager: sleep: sleep requested (sleeping: no  enabled: yes)
Dec 20 14:21:37 host kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
Dec 20 14:21:37 host systemd-logind[1383]: Suspending...
Dec 20 14:21:37 host systemd-logind[1383]: Lid closed.
Dec 20 14:21:24 host kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled
Dec 20 14:21:24 host kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Dec 20 14:21:24 host kernel: i915 0000:00:02.0: [drm] HuC authenticated
Dec 20 14:21:24 host kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9
Dec 20 14:21:24 host kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.1.1.bin version 70.1
Dec 20 14:21:24 host kernel: i915 0000:00:02.0: [drm] nautilus[6965] context reset due to GPU hang
Dec 20 14:21:24 host kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dec 20 14:21:24 host kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dec 20 14:21:24 host kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Dec 20 14:21:24 host kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in nautilus [6965]
Dec 20 14:21:20 host kernel: Asynchronous wait on fence 0000:00:02.0:gnome-shell[1746]:bcfa timed out (hint:intel_atomic_commit_ready [i915])
Dec 20 14:21:07 host audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=succe>
Dec 20 14:21:07 host systemd[1]: Started systemd-hostnamed.service - Hostname Service.

Comment 1 Jaroslav Škarvada 2022-12-22 21:31:18 UTC
For me the problem seems to have started with some recent kernel update (in December). I currently have:
kernel-6.0.14-300.fc37.x86_64

and I am also affected by this problem, my GPU:
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)

When the problem happens, the video freeze for a moment, my dmesg e.g.:
[   49.011518] i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:85dffffc, in compiz [2570]
[   49.012547] i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
[   49.012559] i915 0000:00:02.0: [drm] compiz[2570] context reset due to GPU hang
[   86.441754] i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:85dffffc, in compiz [2570]
[   86.442767] i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
[   86.442779] i915 0000:00:02.0: [drm] compiz[2570] context reset due to GPU hang

I have X.org not Wayland.

Comment 2 Jaroslav Škarvada 2022-12-22 22:06:40 UTC
For me it seems it's not kernel, but mesa-dri-drivers.
Affected version:
mesa-dri-drivers-22.2.3-1.fc37
Unaffected version:
mesa-dri-drivers-22.2.2-1.fc37

Comment 3 Jaroslav Škarvada 2022-12-22 22:58:53 UTC
For my problem I opened upstream issue:
https://gitlab.freedesktop.org/drm/intel/-/issues/7718