Bug 2390638 - amd graphics fail on laptop, due to faulty amd-gpu firmware file.
Summary: amd graphics fail on laptop, due to faulty amd-gpu firmware file.
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: linux-firmware
Version: 42
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: dracut-maint-list
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2025-08-24 23:26 UTC by customercare
Modified: 2026-04-11 17:12 UTC (History)
13 users (show)

Fixed In Version: linux-firmware-20260110-1.fc42 linux-firmware-20260110-1.fc43
Clone Of:
Environment:
Last Closed: 2026-01-15 00:52:26 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
lshw on Laptop HW (36.22 KB, text/plain)
2025-08-24 23:26 UTC, customercare
no flags Details
working dmesg output (107.01 KB, text/plain)
2025-09-14 14:07 UTC, customercare
no flags Details
dmesg not working up to GDM blackscreen (110.54 KB, text/plain)
2025-09-14 14:08 UTC, customercare
no flags Details
dmesg not working up to full desktop (115.69 KB, text/plain)
2025-09-14 14:09 UTC, customercare
no flags Details
full lspci with pci ids (3.79 KB, text/plain)
2025-09-14 14:10 UTC, customercare
no flags Details


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab drm amd issues 4553 0 None opened Regression in amd-gpu-firmware 2024, boot ends in black screen. 2025-09-04 19:17:25 UTC

Description customercare 2025-08-24 23:26:13 UTC
Created attachment 2104592 [details]
lshw on Laptop HW

This will be a bugreport of a different kind, so please be patient.

We have to start with dracut as component, because you are the only ones, who know who puts what into the initramfs. There will be the need to involve other components as well. But you have to identify them, because we couldn't.

I choose Fedora 42 because the oob liveimage has still the same bug.
--

It all started with Fedora 39. It got installed on two HP business laptops with AMD gpu and an nvidia discret GPU (not a 3d accelerator). But we can forget about Nvidia, because our tests revealed that it has nothing to do with the issue.

so, after installing Fedora 39 on the laptop, everything worked as expected (kernel 6.5.6-300 ) . After the time, when kernel 6.8.4. was released, the system developed a bug, where the amdgpu driver did not init the hw as it needed to be initialized.
the system still boots, but the GDM logingreeter is just a blackscreen, because the screen is OFF. It does not "not render", the display is dark.

A Power-Suspend and Revoke-Cycle brings the perfectly draw and working GDM and you can log into the system. Where we made a lot of tests.. really a lot. the issue that follows the incorrect init of the amdgpu makes problems, because the display is incorrectly identified and thus the resolution and scaling options are not working proper.The display type and capabilites due to missing edms are missinterpreted. It's a major fail.

--

We did the following tests on a freshly installed system:

We started with Fedora 42 ... the Liveimage itself had already the same bug.
We did not need to install it.

We booted the Fedora 39 Live image as we knew it was working and installed it.
The system was working fine after booting it.

We did a full os upgrade to the latest Fedora 39 packages and rebooted.

Result: the above described bug with the black screen. 

We knew that kernel 6.8.4 was working on that laptop, so we downgraded the kernels one by one down to the original 6.5.6 and ALL kernels shown the issue described.

There was only one conclusion valid for this scenario: it's not the kernel, it's the rest of the os who is messing it up in the initramfs.

To test this, we reinstalled the naked F39 from the Live image and installed 10 kernels up to 6.9.12 ( which is not the latest possible but sufficed, as it did not work on the original installation ). We DID NOT upgrade the os yet, just the kernels.

We tested one kernel after the other and DID NOT HAVE ANY ISSUES with the amdgpu.
So we can rule out, that it's a kernel issue.

We checked with -M with modules dracut puts into the initramfs and created several states of the iniramfs with updates of Systemd, udev and dracut itself. Important: we did not install any additional software yet, so it's blank fresh install reproducible by anyone. 

We can rule them out, because the created initramfs worked perfectly.

I will attach 3 initramfs to this bugreport.

The working ones have something in common: they are around 56MB. 
The failing one is 100MB.

If one, who has a clue what could go wrong here with amdgpu, makes a diff on the two initramfses, we will get answeres.

After the os upgrade to Fedora 39 latest packages, we tested the original untouched initramfs and it still worked. So the screwup is definitely inside that initramfs.

This bug has kept the laptop owner and the support up for 3 full working days and brought the rpmfusion guys to theire limit. It's worth analysing and fixing it, such a mess it is.

Comment 1 customercare 2025-08-24 23:32:03 UTC
bugzilla does not like uploades of more than 19,5MB ;)

Here is a link to all 4 initramfs:

http://static.bloggt-in-braunschweig.de/initramfs.tar

( no further compressing required => it's just an archive )

initramfs-6.5.6-300.fc39.x86_64.img      The original one from the liveimage install
initramfs-6.9.12-100.fc39.x86_64.img     The original working fs of the kernel 6.9.12
initramfs-6.9.12-100-neu.fc39.x86_64.img This contains updates for systemd,udev and dracut
initramfs-6.9.12-100-FAIL-FULLY-UPGRADED.fc39.x86_64.img This one does not work proper anymore.

Comment 2 customercare 2025-08-27 13:15:08 UTC
From further tests:

inside a gnome-boxen vm the problem can not be reproduced. It is related to the detected hw what ends up in the initramfs.

IN CASE someone ever need this kind of software again: ( it runs 5hs )

#!/bin/bash

COUNTER=0

MAX=$(rpm -qa | wc -l)

echo $MAX

ALL=$(rpm -qa --qf "%{NAME}\n")

for OUTPUT in $(seq $MAX)
do
    let COUNTER++

    echo $COUNTER

    package=$( printf "$ALL" | head -n $COUNTER | tail -n 1)

    echo $package
    #german for "nothing to do"
    update=$(dnf -y update $package|grep -c "Nichts zu tun.")

    if [ $update -gt 0 ];then
    #german for "package skipped, because it is already up2date."
    echo "$package übersprungen, weil schon aktualisiert"

    else

    echo "Building INITRAMFS"

    dracut -f

    ergebnis=$( ls -lah /boot/initramfs-6.5.6-300.fc39.x86_64.img | awk '{print $5};' | sed -e "s/M//g")

    ls -lah /boot/initramfs-6.5.6-300.fc39.x86_64.img  | awk '{print $5};'

    if [ $ergebnis -eq 100 ]; then

        echo " ############################################################################# "
        echo " ############################################################################# "
        echo " ############################################################################# "
        echo " Package identified: "
        echo $package
        exit;
    fi
   fi


done
printf "The value of the counter is COUNTER=%d\n" $COUNTER

Comment 3 customercare 2025-08-28 09:11:04 UTC
The 40M increase in size comes from a difference in the firmware folder:

59M	new/usr/lib/firmware
19M	old/usr/lib/firmware

a comparison:

164K	new/usr/lib/firmware/amd
20M	new/usr/lib/firmware/amdgpu
39M	new/usr/lib/firmware/nvidia

156K	old/usr/lib/firmware/amd
17M	old/usr/lib/firmware/amdgpu
1,5M	old/usr/lib/firmware/nvidia

For the record: As no amdgpu is in the test vm, we do not get these firmwarefiles included in our initramfs.

so it boils down to a change in the amdgpu firmware files

Comment 4 customercare 2025-08-29 23:13:40 UTC
Component identified: amd/nvidia-gpu-firmware

Today we got the laptop working again normally, by downgrading the amd and nvidia-gpu-firmwares to a working F39 version and rebuilding the initramfs files.

O== known faulty version AND ANY NEWER UP TO TODAY  ( ( there could be a older be faulty too )

Name        : amd-gpu-firmware
Version     : 20241110
Release     : 1.fc39

O=== tested WORKING version: ( there could be a newer working )

amd-gpu-firmware-20230804-152.fc39.noarch.rpm
nvidia-gpu-firmware-20230804-152.fc39.noarch.rpm 

O=== Device:

OMEN by HP Gaming Laptop 16-xf0xxx (8C012EA#ABD)

family=103C_5335M7 HP OMEN sku=8C012EA#ABD
CPU AMD Ryzen 7 7840HS w/ Radeon 780M Graphics


O=== HOW TO WORKAROUND for ANY Fedora F39++ release:

1. Download both files form KOJI -> search "linux-firmware" package
2. become root
3. rpm -e --no-deps amd-gpu-firmware nvidia-gpu-firmware
4. rpm -i amd-gpu-firmware-20230804-152.fc39.noarch.rpm nvidia-gpu-firmware-20230804-152.fc39.noarch.rpm

OPTIONAL IF AMDGPU got blacklisted to have a clean boot:
5. set /etc/default/grub back to normal ( amdgpu was blacklisted ) 
6. rework /boot/loader/entities/* and remove blacklist there too for easiert booting.

7. dracut -f 
8. echo "exclude=amd-gpu* nvidia-gpu*" >> /etc/dnf/dnf.conf

9. reboot

NOTE: because we excluded them from dnf, we can do sysupgrades until the issue is FIXED BY AMD!

Comment 5 customercare 2025-09-03 22:32:37 UTC
Kanotix suggested to add this:

01:00.0 VGA compatible controller: NVIDIA Corporation AD106M [GeForce RTX 4070 Max-Q / Mobile] (rev a1)
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Phoenix1 (rev c2)

Comment 6 Peter Robinson 2025-09-05 08:10:54 UTC
Thanks for all the detail.

> A Power-Suspend and Revoke-Cycle brings the perfectly draw and working GDM and you can log into the system.

This stood out to me. Are you saying it boots to a black screen, but if you suspend/resume it works as expected?

> But we can forget about Nvidia, because our tests revealed that it has nothing to do with the issue.

So why do you need to downgrade the nvidia firmware packages?

> brought the rpmfusion guys to theire limit

So you're using the NV binary drivers?

> The working ones have something in common: they are around 56MB. 
> The failing one is 100MB.

This is unsurprising, the NVIDIA firmware supporting GSP are very large and when they pushed them upstream their firmware package grew a lot, I strongly suspect it's unrelated to this problem and purely a coincidence.

Comment 7 Peter Robinson 2025-09-05 08:14:30 UTC
> The 40M increase in size comes from a difference in the firmware folder:

Unsurprising TBH. 

> a comparison:
> 
> 164K	new/usr/lib/firmware/amd
> 20M	new/usr/lib/firmware/amdgpu
> 39M	new/usr/lib/firmware/nvidia
> 
> 156K	old/usr/lib/firmware/amd
> 17M	old/usr/lib/firmware/amdgpu
> 1,5M	old/usr/lib/firmware/nvidia

And that shows really the only one that noticeably increased was the NV firmware, that's expected as NV pushed their GSP generation firmwares upstream and they're massive.

> For the record: As no amdgpu is in the test vm, we do not get these
> firmwarefiles included in our initramfs.
> 
> so it boils down to a change in the amdgpu firmware files

How do you correlate an issue with amd firmware files not being included in the initrd on a VM (as expected) with the problem?

Comment 8 customercare 2025-09-05 08:21:12 UTC
(In reply to Peter Robinson from comment #6)
> Thanks for all the detail.
> 
> > A Power-Suspend and Revoke-Cycle brings the perfectly draw and working GDM and you can log into the system.
> 
> This stood out to me. Are you saying it boots to a black screen, but if you
> suspend/resume it works as expected?

Lets say it that way: You can work with it, but you don't have the full set of display features i.e. different resolutions, different framerates.


> 
> > But we can forget about Nvidia, because our tests revealed that it has nothing to do with the issue.
> 
> So why do you need to downgrade the nvidia firmware packages?

I downgraded the nvidia firmware to have it on the same stand, than the amd one, in case they relate to another. If it really was necessary to downgrade nvidia too, was not checked.

> 
> > brought the rpmfusion guys to theire limit
> 
> So you're using the NV binary drivers?

On the live system: yes . 
For the Tests described here: NO. No nvidia driver was installed. Just a plain Fedora  liveimage install.


> 
> > The working ones have something in common: they are around 56MB. 
> > The failing one is 100MB.
> 
> This is unsurprising, the NVIDIA firmware supporting GSP are very large and
> when they pushed them upstream their firmware package grew a lot, I strongly
> suspect it's unrelated to this problem and purely a coincidence.

Most likely.

Comment 9 Peter Robinson 2025-09-05 08:31:31 UTC
> > > A Power-Suspend and Revoke-Cycle brings the perfectly draw and working GDM and you can log into the system.
> > 
> > This stood out to me. Are you saying it boots to a black screen, but if you
> > suspend/resume it works as expected?
> 
> Lets say it that way: You can work with it, but you don't have the full set
> of display features i.e. different resolutions, different framerates.

So is that yes, it works with a suspend/resume? Please be clear/concise.

I'm not a AMD GPU expert, but I bet it reloads the firmware on resume, and that would be coming from disk not initrd which suggests to me there's nothing wrong with the actual firmware (if there was you'd never get any graphics output) but rather the one in the initrd.

Can you also attach a good/bad dmesg output.

One thing to possibly try to assist with the initrd firmware issue is to enable ssh and once the machine is booted ssh in and from root do a 'rmmod amdgpu; sleep 10; modprobe amdgpu' and see if the screen comes up then. Also attach the dmesg from that.

Comment 10 customercare 2025-09-05 08:48:38 UTC
> > so it boils down to a change in the amdgpu firmware files
> 
> How do you correlate an issue with amd firmware files not being included in
> the initrd on a VM (as expected) with the problem?


The NOT-Including of something in the vm was a hint, that that what we were looking for is specific to the system and included into the initramfs. 


And because i already fixed it on the laptops of the client, by downgrading only the two firmwarefiles and recreating the initramfs afterwards. Client is now on F42, with those F39 firmwarefiles. Working flawless again.

That we had a change in the nv firmware at that time, was a lucky hint, that lead to the right solution. I had tons of other "usual suspects" on the list (X,Wayland, Systemd aso..), before i thought about the firmware files.

NOW, after that debugging session, it's easy and logical, but when it happend, the nvidia kernel driver was the first candidate, than amd kernel driver than something else. In 25 years of Linux, it was never the firmware, until it was now ;) 

> So is that yes, it works with a suspend/resume? Please be clear/concise.

After a suspend/resume cycle, it was usable with some features missing.


> I'm not a AMD GPU expert, but I bet it reloads the firmware on resume, and that would be coming from disk not initrd which suggests to me there's nothing wrong with the actual firmware (if there was you'd never get > any graphics output) but rather the one in the initrd.

Are you sure? What if NOTHING is reuploaded after the resume, than the chip would have the working build-in firmware release again.

This would make more sense, than that the resume loads the firmware from disk, which is the same firmware as in the initramfs and therefor not working. 

> Can you also attach a good/bad dmesg output.


Will take a while, i have to arrange a date with the client and test on it's hardware.


> One thing to possibly try to assist with the initrd firmware issue is to enable ssh and once the machine is booted ssh in and from root do a 'rmmod amdgpu; sleep 10; modprobe amdgpu' and see if the screen comes up
> then. Also attach the dmesg from that.

ok. Does the kernel driver load the firmware or is the firmware independently loaded by a service on boot ?

Comment 11 Peter Robinson 2025-09-05 08:59:20 UTC
> Are you sure? What if NOTHING is reuploaded after the resume, than the chip
> would have the working build-in firmware release again.

I stated above I'm not an expert in AMD GPUs, but most HW reloads the firmware on resume. What built in firmware are you referring to?

> This would make more sense, than that the resume loads the firmware from
> disk, which is the same firmware as in the initramfs and therefor not
> working. 

Can you confirm that the sha256sum is the same if you extract the initrd?

> > One thing to possibly try to assist with the initrd firmware issue is to enable ssh and once the machine is booted ssh in and from root do a 'rmmod amdgpu; sleep 10; modprobe amdgpu' and see if the screen comes up
> > then. Also attach the dmesg from that.
> 
> ok. Does the kernel driver load the firmware or is the firmware
> independently loaded by a service on boot ?

The kernel driver loads the firmware. If you 'modinfo amdgpu' you'll see a list of all the firmware that the driver files the driver may load from disk (obv HW dependent), same goes for wifi or any other HW that loads firmware.

Comment 12 customercare 2025-09-05 09:19:37 UTC
(In reply to Peter Robinson from comment #11)
> 
> Can you confirm that the sha256sum is the same if you extract the initrd?

if not, someone tampered it. We can check that.. no problem.
 
> > > One thing to possibly try to assist with the initrd firmware issue is to enable ssh and once the machine is booted ssh in and from root do a 'rmmod amdgpu; sleep 10; modprobe amdgpu' and see if the screen comes up
> > > then. Also attach the dmesg from that.
> > 
> > ok. Does the kernel driver load the firmware or is the firmware
> > independently loaded by a service on boot ?
> 
> The kernel driver loads the firmware. If you 'modinfo amdgpu' you'll see a
> list of all the firmware that the driver files the driver may load from disk
> (obv HW dependent), same goes for wifi or any other HW that loads firmware.

I would also expect this to happen on each resume, in case the chips had a powerloss. 

But it makes logically no sense in regard to the issue at hand, because it raises the question of 

Why would the first init attempt not work and the second attempt, with the same data and the same routine initializing it, does work? 

Assuming that it is always the same procedure to load the firmware and the driver, only a RACE condition regarding timing would make sense. 


STOP here. 

We will check the hw, see what, if anything, dmesg has to say about it ( I did not find anything related when i checked it last autumn ) and call back. ok?

Comment 13 customercare 2025-09-14 14:07:45 UTC
Created attachment 2106594 [details]
working dmesg output

Comment 14 customercare 2025-09-14 14:08:35 UTC
Created attachment 2106595 [details]
dmesg not working up to GDM blackscreen

Comment 15 customercare 2025-09-14 14:09:19 UTC
Created attachment 2106596 [details]
dmesg not working up to full desktop

Comment 16 customercare 2025-09-14 14:10:04 UTC
Created attachment 2106597 [details]
full lspci with pci ids

Comment 17 JAlberto 2025-12-12 10:02:21 UTC
is this bug related to this? https://github.com/ROCm/ROCm/issues/5724#issuecomment-3642517574

Comment 18 customercare 2025-12-12 10:37:37 UTC
no.(In reply to JAlberto from comment #17)
> is this bug related to this?
> https://github.com/ROCm/ROCm/issues/5724#issuecomment-3642517574

No.

Comment 19 Fedora Update System 2026-01-11 04:14:05 UTC
FEDORA-2026-1d240112ff (linux-firmware-20260110-1.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2026-1d240112ff

Comment 20 Fedora Update System 2026-01-11 04:14:18 UTC
FEDORA-2026-2cebf295af (linux-firmware-20260110-1.fc43) has been submitted as an update to Fedora 43.
https://bodhi.fedoraproject.org/updates/FEDORA-2026-2cebf295af

Comment 21 Fedora Update System 2026-01-12 01:34:04 UTC
FEDORA-2026-2cebf295af has been pushed to the Fedora 43 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2026-2cebf295af`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2026-2cebf295af

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 22 Fedora Update System 2026-01-12 01:55:39 UTC
FEDORA-2026-1d240112ff has been pushed to the Fedora 42 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2026-1d240112ff`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2026-1d240112ff

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 23 Fedora Update System 2026-01-15 00:52:26 UTC
FEDORA-2026-1d240112ff (linux-firmware-20260110-1.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 24 Fedora Update System 2026-01-15 01:12:47 UTC
FEDORA-2026-2cebf295af (linux-firmware-20260110-1.fc43) has been pushed to the Fedora 43 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 25 customercare 2026-01-19 16:39:22 UTC
*** NOT FIXED ***

Comment 26 customercare 2026-01-19 16:42:03 UTC
if you have contacts to a relevant AMD dev, a direct communication would help a lot.

Comment 27 Peter Robinson 2026-01-20 06:06:32 UTC
Mario are you aware of any issues surrounding this?

Comment 28 Mario Limonciello 2026-01-21 04:29:16 UTC
There's a few things I'll mention.  I know there was a race condition with GDM and simpledrm that could lead to a black screen at the login screen.  This has been fixed a few GNOME releases (a year or two ago IIRC).  If you can still reproduce this issue on a current Fedora release I would say that's not your issue though.

Looking through some of your dmesg above a few things I'll note.

> amdgpu: ATOM BIOS: 113-PHXGENERIC-001
> [drm] Display Core v3.2.241 initialized on DCN 3.1.4

This is a Phoenix or Hawk Point system.

> Loading DMUB firmware via PSP: version=0x08002300

Your "good" run (attachment 2106594 [details]) included a DMUB microcode 0x08002300.

> [drm] Loading DMUB firmware via PSP: version=0x08004500

Your "bad" run (attachment 2106595 [details] and 2106596) included a DMUB microcode 0x08004500.


A few leading questions to try to figure out what's going on:

1) If you remove 'quiet' from the kernel command line do you have graphics up until a certain point and then it turns black?
2) Is this just a case of brightness being too dark?  IE, can you press brightness down followed by brightness up and it recovers?
3) Is an external monitor affected?  If you plug one in before you turn on the system, does it work at all during boot?  During login?
4) Would you be able to isolate binaries one by one between the two firmware packages to figure out which one causes the issue?

These are the binaries applicable to your system that you would need to check.

dcn_3_1_4_dmcub.bin
gc_11_0_1_imu.bin
gc_11_0_1_me.bin
gc_11_0_1_mec.bin
gc_11_0_1_mes1.bin
gc_11_0_1_mes_2.bin
gc_11_0_1_pfp.bin
gc_11_0_1_rlc.bin
psp_13_0_4_ta.bin
psp_13_0_4_toc.bin
sdma_6_0_1.bin
vcn_4_0_2.bin

Basically start with your "good" firmware and then copy a binary in, rebuild your initramfs and reboot.  Once you are at a boot that fails, let me know which binary failed.
Once we can narrow down the binary that failed, we can cross reference it against upstream to see what git hashes it matches and work further on it.

Comment 29 Peter Robinson 2026-04-11 13:41:47 UTC
Did you get a chance to debug this further?

Comment 30 customercare 2026-04-11 17:12:56 UTC
i'm not the owner of that laptop, but he is informed and if he can, he will invest.


Note You need to log in before you can comment on or make changes to this bug.