Bug 2224121 - Radeon RX6600 GPU hang leading to Xserver crash
Summary: Radeon RX6600 GPU hang leading to Xserver crash
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: mesa
Version: 42
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Adam Jackson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-19 21:02 UTC by cb-rhbugz
Modified: 2025-04-26 21:55 UTC (History)
13 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-05-28 13:33:13 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
full dmesg output (302.91 KB, text/plain)
2023-07-19 21:03 UTC, cb-rhbugz
no flags Details
Xorg log of crashed session (102.06 KB, text/plain)
2023-07-20 08:36 UTC, cb-rhbugz
no flags Details

Description cb-rhbugz 2023-07-19 21:02:33 UTC
Every 2-3 days or so, my X server freezes for about a minute and then exits. It seems to be related to new output causing scrolling in xterm, although I have also seen it in ghidra occasionally.

My hardware is an old i7-3770k with a recently fitted Radeon RX6600, so I don't know if this is a regression. It did not happen with the Intel iGPU.


Reproducible: Sometimes

Steps to Reproduce:
1. Run XFCE desktop with extensive use of xterm (the real old-fashion X11 xterm)
2. Generally use the desktop for web browsing, youtube, software development for a couple of days, using suspend-to-RAM overnight
3. Every so often, do something which results in the output scrolling in xterm
Actual Results:  
The X server freezes and after a minute or so, crashes back to the greeter/user login prompt.

In the frozen state, there is almost always an xterm in the process of scrolling where the new line is a corrupted black+white pattern instead of new text.

The dmesg has a lot of repeated instances of this:
[458041.735598] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32772, for process Xorg pid 390300 thread Xorg:cs0 pid 390320)
[458041.735611] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800109e12000 from client 0x1b (UTCL2)
[458041.735615] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00701031
[458041.735618] amdgpu 0000:03:00.0: amdgpu:     Faulty UTCL2 client ID: TCP (0x8)
[458041.735621] amdgpu 0000:03:00.0: amdgpu:     MORE_FAULTS: 0x1
[458041.735623] amdgpu 0000:03:00.0: amdgpu:     WALKER_ERROR: 0x0
[458041.735625] amdgpu 0000:03:00.0: amdgpu:     PERMISSION_FAULTS: 0x3
[458041.735627] amdgpu 0000:03:00.0: amdgpu:     MAPPING_ERROR: 0x0
[458041.735629] amdgpu 0000:03:00.0: amdgpu:     RW: 0x0


Expected Results:  
desktop should not crash

$ uname -a
Linux stando.fishzet.co.uk 6.3.11-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Jul  2 13:17:31 UTC 2023 x86_64 GNU/Linux

$ rpm -qa | grep Xorg
xorg-x11-server-Xorg-1.20.14-23.fc38.x86_64

Comment 1 cb-rhbugz 2023-07-19 21:03:30 UTC
Created attachment 1976606 [details]
full dmesg output

Comment 2 Olivier Fourdan 2023-07-20 07:32:08 UTC
Can you please also post the Xorg logs as well for completeness?

Comment 3 cb-rhbugz 2023-07-20 08:36:21 UTC
Created attachment 1976661 [details]
Xorg log of crashed session

Comment 4 Olivier Fourdan 2023-07-20 09:27:35 UTC
To my untrained eyes, the error looks like https://gitlab.freedesktop.org/drm/amd/-/issues/1598

There are a few other similar reports around as well.

Comment 5 Michel Dänzer 2023-07-20 11:04:17 UTC
It's a GPU hang, likely caused by a Mesa issue.

Comment 6 Olivier Fourdan 2023-07-20 11:37:16 UTC
(In reply to Michel Dänzer from comment #5)
> It's a GPU hang, likely caused by a Mesa issue.

Alright, let's move the bug to Mesa then.

Comment 7 cb-rhbugz 2023-11-08 17:27:56 UTC
This bug is still present in Fedora 39 (Mesa 23.2.1-2)

Comment 8 Aoife Moloney 2024-05-28 13:33:13 UTC
Fedora Linux 38 entered end-of-life (EOL) status on 2024-05-21.

Fedora Linux 38 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 9 cb-rhbugz 2024-07-06 00:12:16 UTC
This is still happening, although less frequently in Fedora 40

[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32771)
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:  in process firefox pid 587218 thread firefox:cs0 pid 587305
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800107093000 from client 0x1b (UTCL2)
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00301031
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32771)
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:  in process firefox pid 587218 thread firefox:cs0 pid 587305
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800107093000 from client 0x1b (UTCL2)
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32771)
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:  in process firefox pid 587218 thread firefox:cs0 pid 587305
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800107093000 from client 0x1b (UTCL2)
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32771)
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:  in process firefox pid 587218 thread firefox:cs0 pid 587305
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800107093000 from client 0x1b (UTCL2)
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
[Sat Jul  6 01:02:42 2024] amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
[Sat Jul  6 01:02:53 2024] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
[Sat Jul  6 01:03:03 2024] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=55752498, emitted seq=55752501
[Sat Jul  6 01:03:03 2024] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox pid 587218 thread firefox:cs0 pid 587305


$ rpm -qa | grep mesa
mesa-libGLU-9.0.3-4.fc40.x86_64
mesa-libGLU-devel-9.0.3-4.fc40.x86_64
mesa-libGLU-9.0.3-4.fc40.i686
mesa-va-drivers-freeworld-24.1.2-1.fc40.x86_64
mesa-vdpau-drivers-freeworld-24.1.2-1.fc40.x86_64
mesa-filesystem-24.1.2-7.fc40.x86_64
mesa-libglapi-24.1.2-7.fc40.x86_64
mesa-dri-drivers-24.1.2-7.fc40.x86_64
mesa-libgbm-24.1.2-7.fc40.x86_64
mesa-libEGL-24.1.2-7.fc40.x86_64
mesa-libGL-24.1.2-7.fc40.x86_64
mesa-filesystem-24.1.2-7.fc40.i686
mesa-libGL-devel-24.1.2-7.fc40.x86_64
mesa-libEGL-devel-24.1.2-7.fc40.x86_64
mesa-libOpenCL-24.1.2-7.fc40.x86_64
mesa-libgbm-devel-24.1.2-7.fc40.x86_64
mesa-libOSMesa-24.1.2-7.fc40.x86_64
mesa-vulkan-drivers-24.1.2-7.fc40.x86_64
mesa-libxatracker-24.1.2-7.fc40.x86_64
mesa-libglapi-24.1.2-7.fc40.i686
mesa-dri-drivers-24.1.2-7.fc40.i686
mesa-libgbm-24.1.2-7.fc40.i686
mesa-libEGL-24.1.2-7.fc40.i686
mesa-libGL-24.1.2-7.fc40.i686
mesa-libOSMesa-24.1.2-7.fc40.i686
mesa-vulkan-drivers-24.1.2-7.fc40.i686

Comment 10 Juan Ra 2024-10-30 13:24:25 UTC
A simillar bug is still present on F40 using gnome and wayland i have the 6600m and i have a similiar issues but on wayland, i entered system and a stay a few moment system crash and automatically kill gnome 
I reported here: https://bugzilla.redhat.com/show_bug.cgi?id=2322159

Comment 11 Aoife Moloney 2025-04-25 10:06:13 UTC
This message is a reminder that Fedora Linux 40 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 40 on 2025-05-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '40'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 40 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 12 cb-rhbugz 2025-04-26 21:55:26 UTC
Confirming this bug still happens with Fedora 42


Note You need to log in before you can comment on or make changes to this bug.