Bug 2220888 - updating to kernel 6.3.11 breaks amd gpu drivers
Summary: updating to kernel 6.3.11 breaks amd gpu drivers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: linux-firmware
Version: 38
Hardware: Unspecified
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-06 12:34 UTC by oli
Modified: 2023-08-22 17:34 UTC (History)
26 users (show)

Fixed In Version: linux-firmware-20230804-152.fc38 linux-firmware-20230804-153.fc37
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-11 00:41:47 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
broken latest 6.3.12 kernel (169.74 KB, image/png)
2023-07-20 18:52 UTC, oli
no flags Details
the new error on kernel 6.4.4 (478.87 KB, image/jpeg)
2023-07-26 13:49 UTC, oli
no flags Details


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab drm amd issues 2666 0 None opened Raphael DCN 3.1.5 firmware regression 2023-07-09 18:55:08 UTC

Description oli 2023-07-06 12:34:23 UTC
I use Fedora 38 xfce with FDE. I updated the kernel with dnf update -y and when i boot the newest one (6.3.11-200), my system hangs during boot.
I have 2 screens, one connected to the iGPU of my 7950x3d and one connected to my 7900xtx graphic card.
I can enter the FDE password and then i get about hundreds of those messages:
[drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
[drm] perform_link_training_with_retries: Link(2) training attempt 1 of 4 failed @rate(10) x lane(4...)



Reproducible: Always

Steps to Reproduce:
1. update to the newest kernel
2.
3.
Actual Results:  
system hangs and does not continue to boot

Expected Results:  
a regular boot

i could imagine that this is related to: https://gitlab.freedesktop.org/drm/amd/-/issues/2666
kernel 6.3.8 works

Comment 1 oli 2023-07-09 06:38:49 UTC
the current "workaround" is to delete the file /lib/firmware/amdgpu/dcn_3_1_5_dmcub.bin.xz, then regenerate the initrd with dracut --regenerate-all --force. the igpu is dead but at least the system "works".
i tried the workaround from the freedesktop.org url above and after updating that dcn 315 file my system was fully broken. the terminal got spammed by these dmub messages, no login to lightdm possible, system was slow as f... even in terminal only.
its a 7950x3d with a 7900xtx

Comment 2 Vitezslav Zivota 2023-07-09 07:35:23 UTC
On my system f38, Ryzen 7900X, iGPU with dual display:

after kernel update to 6.3.11 and reboot both monitors were black. After hard reset, on primary display connected to DP appeared some graphic artifacts during boot and then showed GDM. I was able to login and system seemed to work, but secondary display connected to hdmi remained black without signal.

I switched back to 6.3.8.

Comment 3 oli 2023-07-09 07:47:05 UTC
when you edit the boot params and remove rhgb quiet, you probably will also notice the dead screen of DMUB ;)
6.3.8 was working for me too, more or less (some stuters, small freezes), but with newer kernels its totally broken.
unfortunately, i dont know how to go back. i rm -rf'd /lib/firmware/amdgpu/dcn* and did a reinstall of amd-gpu-firmware, then dracut regenerate all with force, but no help, it still shows me the newest version that is fully broken

Comment 4 Peter Robinson 2023-07-09 18:54:00 UTC
> /lib/firmware/amdgpu/dcn* and did a reinstall of amd-gpu-firmware, then
> dracut regenerate all with force, but no help, it still shows me the newest
> version that is fully broken

A reinstall just reinstalls the latest version so that would be expected if the latest version is broken. You need to do "dnf downgrade amd-gpu-firmware" which will take you back to the GA release in Fedora (20230310-148.fc38) which contains the last rev that was in Fedora.

Comment 5 Peter Robinson 2023-07-09 18:55:08 UTC
It looks like upstream is updating the firmware (or reverting it or something) but it hasn't landed upstream as yet.

Comment 6 Peter Robinson 2023-07-09 18:56:58 UTC
(In reply to Peter Robinson from comment #5)
> It looks like upstream is updating the firmware (or reverting it or
> something) but it hasn't landed upstream as yet.

Actually it has (but a dnf downgrade will have the same effect in the short term if you're affected):

commit d3f66064cf43bd7338a79174bd0ff60c4ecbdf6d (HEAD -> main, origin/main)
Author: Hamza Mahfooz <hamza.mahfooz>
Date:   Wed Jul 5 16:56:35 2023 -0400

    Partially revert "amdgpu: DMCUB updates for DCN 3.1.4 and 3.1.5"
    
    This partially reverts commit ade163aaaeae0c1ad20cb3dd8ce878bf61c91b3a.
    
    The DCN315 DMCUB firmware update provided by the aforementioned commit
    wasn't thoroughly tested before being sent for public consumption and as
    such there are a number of issues with it. So, revert to the previous
    version until it can be fixed properly.
    
    Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2666
    Signed-off-by: Hamza Mahfooz <hamza.mahfooz>
    Signed-off-by: Josh Boyer <jwboyer>

Comment 7 oli 2023-07-10 05:04:48 UTC
downgrading did break something else it seems:

[oli@DESKTOP-SJIB21T ~]$ dmesg | grep DMUB
[    5.480892] [drm] Loading DMUB firmware via PSP: version=0x07001900
[    5.919524] [drm] DMUB hardware initialized: version=0x07001900
[    6.280890] [drm:dm_early_init [amdgpu]] *ERROR* DMUB firmware loading failed: -19


2nd screen is still dead, have this now installed: amd-gpu-firmware-20230310-148.fc38.noarch

Comment 8 Vitezslav Zivota 2023-07-10 07:26:53 UTC
I have now amd-gpu-firmware-20230625-151.fc38 and kernel 6.3.8, both display OK, without problems.

dmesg | grep DMUB
[    4.277782] [drm] Loading DMUB firmware via PSP: version=0x05000500

Comment 9 Volker Braun 2023-07-11 10:30:54 UTC
Latest working version is amd-gpu-firmware-20230515-150.fc38.noarch.rpm which you can download from https://koji.fedoraproject.org/koji/buildinfo?buildID=2201258, this fixes the "Error waiting for DMUB idle".

Do not forget to regenerate initramfs (dracut --regenerate-all --force) after firmware downgrade.


PS: This is a different issue, but FYI: if you get a white screen instead of GUI then add amdgpu.sg_display=0 kernel parameter (https://gitlab.freedesktop.org/drm/amd/-/issues/2354)

Comment 10 Vitezslav Zivota 2023-07-12 08:58:17 UTC
Today update to kernel 6.3.12 didn't fix the issue. Then I downgraded amdgpu firmware to amd-gpu-firmware-20230515-150.fc38 from koji and it works with 6.3.12.

Thanks for workaround!

$ sudo rpm -Uvh --oldpackage amd-gpu-firmware-20230515-150.fc38.noarch.rpm
$ sudo dracut --regenerate-all --force

Comment 11 joshua 2023-07-12 16:05:22 UTC
I too suffer from breakage of Dual Monitor capabilities on F38.   I noticed it with the 6.3 kernel.

Downgrading to amd-gpu-firmware-20230515-150.fc38.noarch.rpm and regenerating the initrd doesn't change anything for me.

I've also tried the kernel-6.5.0-0.rc1.20230711git3f01e9fed845.12 just for kicks, no luck there either. 

For now I'm stuck on 6.2.15-300.fc38.x86_64 until something upstream gets fixed or is otherwise reverted

Comment 12 Peter Robinson 2023-07-13 08:52:38 UTC
(In reply to joshua from comment #11)
> I too suffer from breakage of Dual Monitor capabilities on F38.   I noticed
> it with the 6.3 kernel.

This bug is purely about a regression on a single firmware for a single class of devices as described in the first comment. If you're not suffering from that problem it's a different bug so please file a new bug against the kernel for that.

Comment 13 oli 2023-07-20 18:52:31 UTC
Created attachment 1976815 [details]
broken latest 6.3.12 kernel

this is how it looks with the latest kernel, so its unusable
the .11 kernel seem to work but some apps are fckd... like chromium. i guess because of hw acceleration

Comment 14 oli 2023-07-26 13:49:18 UTC
Created attachment 1980132 [details]
the new error on kernel 6.4.4

updating to 6.4.4-200 breaks the system even with old amd gpu drivers.
its getting worse and worse.

Comment 15 Peter Robinson 2023-07-26 14:08:05 UTC
(In reply to oli from comment #14)
> Created attachment 1980132 [details]
> the new error on kernel 6.4.4
> 
> updating to 6.4.4-200 breaks the system even with old amd gpu drivers.
> its getting worse and worse.

I think that is a different bug, but we should have a new linux-firmware shortly.

Comment 16 oli 2023-07-26 14:11:27 UTC
is there some roadmap or so for the linux firmware? i checked the online git repo but it looks like its dead since 3 or 4 weeks.

Comment 17 Peter Robinson 2023-07-26 14:17:25 UTC
(In reply to oli from comment #16)
> is there some roadmap or so for the linux firmware? i checked the online git
> repo but it looks like its dead since 3 or 4 weeks.

It releases basically monthly, you could actually see that given the versions, there's be a lot of recent commits upstream this week so I have no idea where you're looking.

Comment 18 oli 2023-07-26 14:20:30 UTC
yeah you are right, i did not re-check it. it was dead for about 2 weeks, that was the time where i checked it. thanks

Comment 19 Fedora Update System 2023-08-06 14:48:33 UTC
FEDORA-2023-85168977a9 has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2023-85168977a9

Comment 20 Fedora Update System 2023-08-06 14:48:43 UTC
FEDORA-2023-d15f5a186a has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-d15f5a186a

Comment 21 Fedora Update System 2023-08-07 01:15:02 UTC
FEDORA-2023-d15f5a186a has been pushed to the Fedora 38 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-d15f5a186a`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-d15f5a186a

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 22 Fedora Update System 2023-08-07 01:44:51 UTC
FEDORA-2023-85168977a9 has been pushed to the Fedora 37 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-85168977a9`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-85168977a9

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 23 joshua 2023-08-07 05:07:48 UTC
Can anyone who was suffering from this problem verify that these F37 and F38 fixes resolve the issue on their system?

Comment 24 oli 2023-08-07 05:45:04 UTC
it is way better than before. before that update it was 100% dmcub errors which lead to instable system, now i can even boot the latest 6.4 kernel
but 1 of 10 times the system goes nuts again with dmcub errors, not sure what exactly causes that.
i disabled hw acceleration in chromium (because it was not possible to use hw acceleration in any application because of dmcub errors), will enable that now again

Comment 25 Vitezslav Zivota 2023-08-07 07:40:48 UTC
firmware update FEDORA-2023-d15f5a186a works on my Ryzen 7900X, iGPU. Both displays OK.

I did this, I suppose that's enough:

sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-d15f5a186a
sudo dracut --force

Thanks

Comment 26 Fedora Update System 2023-08-11 00:41:47 UTC
FEDORA-2023-d15f5a186a has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 27 Fedora Update System 2023-08-11 02:27:01 UTC
FEDORA-2023-eabbf4ca4d has been pushed to the Fedora 37 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-eabbf4ca4d`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-eabbf4ca4d

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 28 Fedora Update System 2023-08-22 17:34:52 UTC
FEDORA-2023-eabbf4ca4d has been pushed to the Fedora 37 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.