Created attachment 1788410 [details] kernel oops at time of hard lockup 1. Please describe the problem: On a roughly daily basis, sometimes more, the display locks up and the machine becomes unresponsive. Only way to recover is to hold down the power button. This is on a ThinkPad T495, "AMD Ryzen 7 PRO 3700U w/ Radeon Vega Mobile Gfx" 2. What is the Version-Release number of the kernel: 5.12.8-200.fc33.x86_64, but it may have started with 5.12.7. Definitely not seen with the 5.11 series. 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : Yes, this has not happened in anything approaching this frequency (had been extremely sporadic, ~ 1/month or even less) before this kernel series. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: Seems to happen when I'm on a Firefox page with a certain kind of animation. Specifically, github "spinner" icons used to show running github actions. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Haven't been able to test yet. 6. Are you running any modules that not shipped with directly Fedora's kernel?: No 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag.
Also found this in the logs: May 31 20:08:15 angua firefox-wayland.desktop[6063]: amdgpu: amdgpu_cs_query_fence_status failed.
I believe I''m also seeing this problem with my Lenovo E585, AMD Ryzen 7 2700U with Radeon Vega Mobile Gfx, Linux 5.12.7-300.fc34.x86_64 #1 SMP Wed May 26 12:58:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux I'm running the default Gnome Shell and Wayland. The problem began after the upgrade to Fedora 34, and perhaps just in the past week or so. I see that Arch users are also experiencing similar problems: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjiqsnp0_fwAhWRbs0KHczaDWEQFjAAegQIBBAD&url=https%3A%2F%2Fbbs.archlinux.org%2Fviewtopic.php%3Fid%3D266358&usg=AOvVaw3QdsrbUMFqzrEIjYF4wiHP They seem to think it's mesa or linux-firmware related. In my case, I have: linux-firmware.noarch 20210511-120.fc34 @updates linux-firmware-whence.noarch 20210511-120.fc34 mesa-dri-drivers.i686 21.1.1-1.fc34 @updates mesa-dri-drivers.x86_64 21.1.1-1.fc34 @updates mesa-filesystem.i686 21.1.1-1.fc34 @updates mesa-filesystem.x86_64 21.1.1-1.fc34 @updates mesa-libEGL.x86_64 21.1.1-1.fc34 @updates mesa-libGL.i686 21.1.1-1.fc34 @updates mesa-libGL.x86_64 21.1.1-1.fc34 @updates mesa-libgbm.x86_64 21.1.1-1.fc34 @updates mesa-libglapi.i686 21.1.1-1.fc34 @updates mesa-libglapi.x86_64 21.1.1-1.fc34 @updates mesa-libxatracker.x86_64 21.1.1-1.fc34 @updates mesa-vulkan-drivers.i686 21.1.1-1.fc34 @updates mesa-vulkan-drivers.x86_64 21.1.1-1.fc34 @updates The most recent instance occurred when I moved the mouse cursor after leaving the machine idle for many minutes, perhaps 30. The screen froze, then went black and I was forced to power down and reboot. On previous occasions, the lockup was preceded by severe performance degradation and corrupted screen image. This has happened several times today alone. The most recent log contains the following: 19:00:18 kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset begin! 19:00:18 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0 19:00:08 kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset end with ret = -110 19:00:08 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110 19:00:08 kernel: amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110) 19:00:08 kernel: [drm] kiq ring mec 2 pipe 1 q 0 19:00:07 kernel: amdgpu 0000:05:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available 19:00:07 kernel: [drm] reserve 0x400000 from 0xf40fc00000 for PSP TMR 19:00:07 kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset succeeded, trying to resume 19:00:07 kernel: [drm] free PSP TMR buffer 19:00:07 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0000 address=0x10dc40000 flags=0x0070] 19:00:07 kernel: amd_iommu_report_page_fault: 21 callbacks suppressed 19:00:07 kernel: amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10dc40000 flags=0x0070] 19:00:07 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox pid 3042 thread firefox:cs0 pid 3115 19:00:07 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x5 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800110c07000 from client 27 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32774, for process firefox pid 3042 thread firefox:cs0 pid 3115) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x5 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800110c09000 from client 27 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32774, for process firefox pid 3042 thread firefox:cs0 pid 3115) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x5 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800110c04000 from client 27 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32774, for process firefox pid 3042 thread firefox:cs0 pid 3115) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x5 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800110c05000 from client 27 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32774, for process firefox pid 3042 thread firefox:cs0 pid 3115) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x5 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800110c06000 from client 27 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32774, for process firefox pid 3042 thread firefox:cs0 pid 3115) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x5 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800110c02000 from client 27 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32774, for process firefox pid 3042 thread firefox:cs0 pid 3115) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x5 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800110c08000 from client 27 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32774, for process firefox pid 3042 thread firefox:cs0 pid 3115) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x5 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800110c03000 from client 27 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32774, for process firefox pid 3042 thread firefox:cs0 pid 3115) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x5 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800110c00000 from client 27 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32774, for process firefox pid 3042 thread firefox:cs0 pid 3115) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x5 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800110c01000 from client 27 18:59:57 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32774, for process firefox pid 3042 thread firefox:cs0 pid 3115) 18:59:57 kernel: gmc_v9_0_process_interrupt: 105 callbacks suppressed
Created attachment 1789296 [details] Another instance a few minutes ago Just captured one more instance, see attachment. Looks very similar to the one by billgrzanich.
This kernel.org bug seems related, has a possible workaround: https://bugzilla.kernel.org/show_bug.cgi?id=211157
FWIW the workaround reported there doesn't work. I added this line to the TLP conf: RUNTIME_PM_DRIVER_BLACKLIST="mei_me nouveau nvidia pcieport radeon" but still experiencing this. Kind ofmakes sense, I don't see why/how *enabling* PM on the driver would have improved stability.
"Better", identical really, including machine type (ThinkPad T495) upstream bug here: https://bugzilla.kernel.org/show_bug.cgi?id=213391
Same issue on F34, under swaywm, on a ThinkPad X395: $ rpm -qa | grep mesa mesa-libGLU-9.0.1-4.fc34.x86_64 mesa-libglapi-21.1.1-2.fc34.x86_64 mesa-libgbm-21.1.1-2.fc34.x86_64 mesa-filesystem-21.1.1-2.fc34.x86_64 mesa-dri-drivers-21.1.1-2.fc34.x86_64 mesa-libEGL-21.1.1-2.fc34.x86_64 mesa-libGL-21.1.1-2.fc34.x86_64 mesa-libxatracker-21.1.1-2.fc34.x86_64 mesa-vulkan-drivers-21.1.1-2.fc34.x86_64 $ rpm -qa | grep linux-firmware linux-firmware-whence-20210511-120.fc34.noarch linux-firmware-20210511-120.fc34.noarch $ glxinfo ... Vendor: AMD (0x1002) Device: AMD Radeon(TM) Vega 10 Graphics (RAVEN, DRM 3.40.0, 5.12.9-300.fc34.x86_64, LLVM 12.0.0) (0x15d8) Version: 21.1.1 Accelerated: yes Video memory: 2048MB Unified memory: no Preferred profile: core (0x1) Max core profile version: 4.6 Max compat profile version: 4.6 Max GLES1 profile version: 1.1 Max GLES[23] profile version: 3.2 ... $ journalctl -b-1 ... kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=36189, emitted seq=36190 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0 kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset begin! kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:67:crtc-0] flip_done timed out $ journalctl -b-4 ... kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process sway pid 1913 thread sway:cs0 pid 1917) kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104a00000 from client 27 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process sway pid 1913 thread sway:cs0 pid 1917) kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104a01000 from client 27 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0 ... $ journalctl -b-5 ... kernel: amdgpu_cs_ioctl: 5 callbacks suppressed kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! ... $ journalctl -b-9 ... kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:67:crtc-0] flip_done timed out kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:78:eDP-1] flip_done timed out kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:55:plane-3] flip_done timed out kernel: ------------[ cut here ]------------ kernel: WARNING: CPU: 4 PID: 243367 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7960 amdgpu_dm_atomic_commit_tail+0x2529/0x25a0 [amdgpu] kernel: Modules linked in: uas usb_storage uinput rfcomm snd_seq_dummy snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc ccm cmac uv> kernel: kvm snd_seq snd_seq_device irqbypass iwlwifi rapl snd_pcm squashfs joydev pcspkr loop snd_rn_pci_acp3x wmi_bmof k10temp cfg80211 i2c_piix4 snd_pci_acp3x thinkpad_acpi snd_timer pla> kernel: CPU: 4 PID: 243367 Comm: kworker/4:2 Tainted: G W 5.12.9-300.fc34.x86_64 #1 kernel: Hardware name: LENOVO 20NM000FAU/20NM000FAU, BIOS R13ET49W(1.23 ) 11/24/2020 kernel: Workqueue: events console_callback kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2529/0x25a0 [amdgpu] kernel: Code: b8 fd ff ff 01 c7 85 b4 fd ff ff 37 00 00 00 c7 85 bc fd ff ff 20 00 00 00 e8 83 94 12 00 e9 08 fb ff ff 0f 0b e9 33 f9 ff ff <0f> 0b e9 a5 f9 ff ff 0f 0b 0f 0b e9 bc f9 ff ff> kernel: RSP: 0018:ffffb2834b0bf8b8 EFLAGS: 00010002 kernel: RAX: 0000000000000002 RBX: 0000000000003f1d RCX: ffff894a4e92c918 kernel: RDX: 0000000000000001 RSI: 0000000000000297 RDI: ffff894a4eb80178 kernel: RBP: ffffb2834b0bfba0 R08: ffffb2834b0bf80c R09: 0000000000000000 kernel: R10: ffffb2834b0bf838 R11: ffffb2834b0bf83c R12: 0000000000000206 kernel: R13: ffff894a4e92c800 R14: ffff894a73d12600 R15: ffff894b5709b100 kernel: FS: 0000000000000000(0000) GS:ffff894cf0b00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000561da559d188 CR3: 00000001238ba000 CR4: 00000000003506e0 kernel: Call Trace: kernel: commit_tail+0x94/0x120 [drm_kms_helper] kernel: drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper] kernel: drm_client_modeset_commit_atomic+0x1c4/0x200 [drm] kernel: drm_client_modeset_commit_locked+0x56/0x150 [drm] kernel: drm_fb_helper_pan_display+0xdc/0x210 [drm_kms_helper] kernel: fb_pan_display+0x83/0x100 kernel: bit_update_start+0x1a/0x40 kernel: fbcon_switch+0x31d/0x4c0 kernel: redraw_screen+0xd7/0x210 kernel: ? fbcon_cursor+0x109/0x130 kernel: complete_change_console+0x3a/0x120 kernel: console_callback+0x14b/0x150 kernel: ? __cond_resched+0x16/0x40 kernel: process_one_work+0x1ec/0x380 kernel: worker_thread+0x53/0x3e0 kernel: ? process_one_work+0x380/0x380 kernel: kthread+0x11b/0x140 kernel: ? kthread_associate_blkcg+0xa0/0xa0 kernel: ret_from_fork+0x22/0x30 kernel: ---[ end trace ae9524d7c29cb9eb ]--- ... kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:67:crtc-0] flip_done timed out ... As you can see, I get a fun variety of different error messages in my logs, but the same result each time - a frozen or black screen which requires either a hard power down, or SSHing into the laptop to try and reboot it.
After downgrading linux-firmware and running an older kernel, I've not had a crash now for about a week. The versions I am running are: kernel-5.11.12-300.fc34.x86_64 linux-firmware-20210315-119.fc34.noarch
Created attachment 1797116 [details] journalctl after screen freeze happened Same issue here with the latest Fedora 34 kernels on a desktop PC with Ryzen 3 3200G using integrated graphics, although not that often, perhaps once a week. For no obvious reason the screen freezes (although mouse pointer still moves) and sometimes the screen goes black. The last occurence captured in enclosed attachment happened during a banking session in Firefox on Xfce with kernel 5.12.13-300.fc34.x86_64. No stability issues encountered before with this PC since I got it a year ago and installed Fedora 32.
Same issue here with a Thinkpad E595 running Fedora 33 with 5.12.12-200.fc33.x86_64 I'll try downgrading the kernel and firmware and see if it improves things.
(In reply to Norbert Jurkeit from comment #9) The issue seems to be related to firmware rather than kernel, at least in my case with Picasso hardware. It started around the time when linux-firmware-20210511-120.fc34 became available and has not occurred since upgrade to linux-firmware-20210818-122.fc34 3 weeks ago, where the later reverted some amdgpu files to those of linux-firmware-20210315-119.fc34.
I'm reluctant to say it's fixed, I can say that I have not experienced the problem in several days, perhaps even since the firmware package update that Norbert describes in comment 11. Fingers crossed.
(In reply to billgrzanich from comment #12) > I'm reluctant to say it's fixed, I can say that I have not experienced the > problem in several days, perhaps even since the firmware package update > that Norbert describes in comment 11. Fingers crossed. I was just about to uncork the champagne as well, but I'm still seeing intermittent freezes with linux-firmware-20210818-122.fc34.noarch. They are less common, but they don't seem to soft recover like before either. These crashes are all similar (identical?) to the "flip_done timed out" trace shown in comment #7.
(In reply to Aaron Sowry from comment #13) > > I was just about to uncork the champagne as well, but I'm still seeing > intermittent freezes with linux-firmware-20210818-122.fc34.noarch. They are > less common, but they don't seem to soft recover like before either. > > These crashes are all similar (identical?) to the "flip_done timed out" > trace shown in comment #7. With the questionable firmware I only got "VM_L2_PROTECTION_FAULT_STATUS:0x00101031" or "VM_L2_PROTECTION_FAULT_STATUS:0x00141051" in the journal, but nothing with "flip_done timed out", although my graphics hardware looks similar to yours according to glxinfo: Vendor: AMD (0x1002) Device: AMD Radeon(TM) Vega 8 Graphics (RAVEN, DRM 3.41.0, 5.13.16-200.fc34.x86_64, LLVM 12.0.1) (0x15d8) Version: 21.1.8 Accelerated: yes Video memory: 2048MB Unified memory: no Preferred profile: core (0x1) Max core profile version: 4.6 Max compat profile version: 4.6 Max GLES1 profile version: 1.1 Max GLES[23] profile version: 3.2 The used desktop is XFCE which might also make a difference. Perhaps it helps to post your comprehensive information from comment #7 to gitlab.freedesktop.org where it can get the attention of upstream maintainers. See e.g. https://gitlab.freedesktop.org/drm/amd/-/issues/1609.
This message is a reminder that Fedora 33 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '33'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 33 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 33 changed to end-of-life (EOL) status on 2021-11-30. Fedora 33 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.