Bug 2299031

Summary: mesa 24.1.4 breaks AV1 video playback in firefox on amdgpu
Product: [Fedora] Fedora Reporter: Gurenko Alex <agurenko>
Component: mesaAssignee: José Expósito <jexposit>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 40CC: ajax, bskeggs, decathorpe, dimitris.on.linux, gilbert.fernandes, igor.raits, jexposit, j, lnicola, lyude, maztaim, misc.widely812, mkjp, rhughes, rstrode, sevmek, suraj.ghimire7, thibaulltt+fedora, tstellar, walter.pete, zing
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: mesa-24.1.4-3.fc40 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-07-24 15:46:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gurenko Alex 2024-07-20 15:10:08 UTC
As reported here https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29400#note_2432971 there is a problem with AV1 playback in firefox that causes crashes and GPU reset on latest mesa. Seems like there is a patch provided that is available in mesa 24.2.0-rc1 or will be fixed in Firefox 130 (Releasing August 5th), alternatively ffmpeg can be patched (fixed in ffmpeg 7.0.1)?

Mesa patch: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29400#note_2432971
ffmpeg patch: https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=11693

Reproducible: Always

Steps to Reproduce:
1. Open YouTube
2. Move mouse around with previe videos enabled
Actual Results:  
You can see corrupted playback in firefox, opening corrupted video also show artifacts, after few videos gpu reset initiated and various processes crash (Xwayland, plasma-shell RDD, firefox)

Expected Results:  
This is a regression since 24.1.2-8.fc40, no gpu reset or artifacts should occur

Comment 1 Eric 2024-07-21 17:44:45 UTC
It is fixed already over on RPMFusion. Download the latest build from koji or wait till next Friday to get the new -freeworld package
https://bugzilla.rpmfusion.org/show_bug.cgi?id=7007

Comment 2 José Expósito 2024-07-22 07:52:05 UTC
As mentioned by Neal here:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29400#note_2496638

This update should fix it:
https://bodhi.fedoraproject.org/updates/FEDORA-2024-810afc5c2e

Comment 3 José Expósito 2024-07-22 08:33:58 UTC
*** Bug 2299025 has been marked as a duplicate of this bug. ***

Comment 4 Fabio Valentini 2024-07-22 11:51:14 UTC
> This update should fix it:
> https://bodhi.fedoraproject.org/updates/FEDORA-2024-810afc5c2e

This update doesn't fix the issue for me.
I still get garbled AV1 decode in firefox.

firefox-128.0-2.fc40.x86_64
mesa-va-drivers-freeworld-24.1.4-1.fc40.x86_64
ffmpeg-free-6.1.1-19.fc40.x86_64
libavcodec-freeworld-6.1.1-14.fc40.x86_64

The issue only goes away after manually installing the builds from
https://koji.rpmfusion.org/koji/buildinfo?buildID=29235

Which has revert-6746d4df-to-fix-av1-slice_data_offset.patch

So something is definitely still broken with mesa 24.1.4.

Comment 5 Gilbert Fernandes 2024-07-22 11:52:59 UTC
I have that issue and can test if needed.
I opened a bugzilla report with messages from the kernel here : https://bugzilla.redhat.com/show_bug.cgi?id=2299241
But the real issue is in Mesa.

Comment 6 Gilbert Fernandes 2024-07-22 11:56:13 UTC
What should I do if I'm not using Fusion but the standard Fedora 40 repos ?

Versions I have are :

mesa-va-drivers-24.1.4-2.fc40.x86_64
firefox-128.0-2.fc40.x86_64
libavcodec-free-6.1.1-19.fc40.x86_64

Comment 7 leigh scott 2024-07-22 12:10:09 UTC
This commit should be applied to fedora mesa https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30255

Comment 8 José Expósito 2024-07-22 13:41:51 UTC
> > This update should fix it:
> > https://bodhi.fedoraproject.org/updates/FEDORA-2024-810afc5c2e
> 
> This update doesn't fix the issue for me.
> I still get garbled AV1 decode in firefox.

Oh, I miss-read the MR discussion. I'm generating a new build including the fix.

Comment 9 Fedora Update System 2024-07-22 14:43:21 UTC
FEDORA-2024-face82e699 (mesa-24.1.4-3.fc40) has been submitted as an update to Fedora 40.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-face82e699

Comment 10 Fedora Update System 2024-07-23 02:01:21 UTC
FEDORA-2024-face82e699 has been pushed to the Fedora 40 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-face82e699`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-face82e699

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 11 Tim Bosse 2024-07-23 18:11:36 UTC
Still seeing the same behavior after `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-face82e699`.

Jul 23 13:50:25 thelina.timbos.se rtkit-daemon[1577]: Successfully made thread 184538 of process 184391 (/usr/lib64/firefox/firefox) owned by '1000' RT at priority 10.
Jul 23 13:50:25 thelina.timbos.se gnome-shell[3053]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
Jul 23 13:50:38 thelina.timbos.se kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=4175580, emitted seq=4175582
Jul 23 13:50:38 thelina.timbos.se kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox pid 184391 thread firefox:cs0 pid 184478
Jul 23 13:50:38 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset begin!
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 23 13:50:39 thelina.timbos.se kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 23 13:50:40 thelina.timbos.se kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 23 13:50:40 thelina.timbos.se kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 23 13:50:40 thelina.timbos.se kernel: [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: MODE2 reset
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset succeeded, trying to resume
Jul 23 13:50:40 thelina.timbos.se kernel: [drm] PCIE GART of 512M enabled (table at 0x000000807FD00000).
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
Jul 23 13:50:40 thelina.timbos.se kernel: [drm] DMUB hardware initialized: version=0x08003D00
Jul 23 13:50:40 thelina.timbos.se kernel: [drm] kiq ring mec 3 pipe 1 q 0
Jul 23 13:50:40 thelina.timbos.se kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow start
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: recover vram bo from shadow done
Jul 23 13:50:40 thelina.timbos.se kernel: amdgpu 0000:c1:00.0: amdgpu: GPU reset(10) succeeded!
Jul 23 13:50:40 thelina.timbos.se gnome-shell[3053]: meta_wayland_buffer_process_damage: assertion 'buffer->resource' failed
Jul 23 13:50:40 thelina.timbos.se org.mozilla.firefox.desktop[184391]: [GFX1-]: Failed to compile vertex shader: cs_linear_gradient
Jul 23 13:50:40 thelina.timbos.se org.mozilla.firefox.desktop[184391]: [ERROR webrender::device::gl] Failed to compile vertex shader: cs_linear_gradient
Jul 23 13:50:40 thelina.timbos.se org.mozilla.firefox.desktop[184391]:
Jul 23 13:50:40 thelina.timbos.se kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jul 23 13:50:40 thelina.timbos.se org.mozilla.firefox.desktop[184391]: [GFX1-]: wr_renderer_render: Shader(Compilation("cs_linear_gradient", ""))
Jul 23 13:50:40 thelina.timbos.se org.mozilla.firefox.desktop[184391]: [GFX1-]: Handling webrender error 2
Jul 23 13:50:40 thelina.timbos.se org.mozilla.firefox.desktop[184391]: [GFX1-]: Fallback WR to SW-WR
Jul 23 13:50:40 thelina.timbos.se org.mozilla.firefox.desktop[184391]: [GFX1-]: Detect DeviceReset DeviceResetReason::DRIVER_ERROR DeviceResetDetectPlace::WR_POST_UPDATE in Parent process

Comment 12 Fedora Update System 2024-07-24 15:46:50 UTC
FEDORA-2024-face82e699 (mesa-24.1.4-3.fc40) has been pushed to the Fedora 40 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 13 thibaulltt+fedora 2024-08-13 14:55:11 UTC
Hi, the problem still seems to be an issue on my system.

Currently running a Framework 13 with an AMD 7840U processor (with Radeon 780M Graphics) and mesa-24.1.5-2.fc40 installed. YouTube AV1 playback will work fine for a second, then show garbled and green patches all over the screen for another second, before completely and irrecoverably crashing my desktop environment another couple seconds later. Running the 'sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-face82e699' command gave me no updates (dnf did not perform any upgrade). For completeness, I'm running the KDE spin of Fedora 40, but AFAIK that should not be an issue (?)

What's a bit troubling is that I'm still running the *-2 version of mesa drivers, when the *-3 version seems to be the one where this patch is applied ? I cannot seem to get this version installed on my system right now. I'd appreciate any info on when this patch will be released on the main updates repository, or how to install it if I did not realize it already was ;)

Comment 14 Jason Tibbitts 2024-08-13 19:50:31 UTC
There is no F40 release that is newer than mesa-24.1.5-2.fc40.  The "*-3" package you mention is the older version mesa-24.1.4-3.fc40.  Note "24.1.5" versus "24.1.4".  It is possible that the issue was fixed in 24.1.4-3 and then regressed in later versions.

You can find every version of mesa ever built at https://koji.fedoraproject.org/koji/packageinfo?packageID=184, and it is possible to download old versions of packages from there.  You could try pulling whichever 24.1.4-3 packages you might need from https://koji.fedoraproject.org/koji/buildinfo?buildID=2513480 and downgrading to them, but there might be dependency issues that prevent such a downgrade.  If it works, then you have a useful data point and can report the regression.

Comment 15 Dimitris 2024-08-14 00:21:47 UTC
@devillelethibault I'm also on a FW13/7840U.  This seems fixed here as of 24.1.5-2.fc40, including with the example youtube video I encountered it in my case.  See now-closed upstream issue at https://gitlab.freedesktop.org/mesa/mesa/-/issues/11533#note_2496699.

Note, did you restart your browser (or even better reboot) so that new mesa libs are loaded/used?

Fedora's packaging seems to have cleanly removed the "temporary" patch used with 24.1.4, matching upstream as far as this issue/fix is concerned:

https://src.fedoraproject.org/rpms/mesa

$ git status
On branch f40
Your branch is up to date with 'origin/f40'.

nothing to commit, working tree clean
$ git log --stat -4
commit 2f038b212a4dc122f370102677d0fe1144ace004 (HEAD -> f40, origin/f40)
Author: José Expósito <jexposit>
Date:   Thu Aug 1 16:49:10 2024 +0200

    Remove unused patch

 0001-Revert-frontends-va-Fix-AV1-slice_data_offset-with-m.patch | 123 -----------------------------------
 mesa.spec                                                       |   1 -
 2 files changed, 124 deletions(-)

commit 275e0e8ebea8cb99869d8bc8be58d388063de7fe
Author: José Expósito <jexposit>
Date:   Thu Aug 1 16:39:33 2024 +0200

    Update to 24.1.5

 mesa.spec | 2 +-
 sources   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

commit ce89d320284bef0df5d61f47d37d1b7a1f25c753
Merge: 654b25e bd66117
Author: José Expósito <jexposit>
Date:   Mon Jul 22 16:16:55 2024 +0200

    Merge branch 'rawhide' into f40

commit bd66117503acfb8a1e8cc7961f1e8185805a3c0b
Author: José Expósito <jexposit>
Date:   Mon Jul 22 15:40:07 2024 +0200

    Backport AV1 fix
    
    Upstream MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30255
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2299031

 0001-Revert-frontends-va-Fix-AV1-slice_data_offset-with-m.patch | 123 +++++++++++++++++++++++++++++++++++
 mesa.spec                                                       |   1 +
 2 files changed, 124 insertions(+)