Created attachment 2104592 [details] lshw on Laptop HW This will be a bugreport of a different kind, so please be patient. We have to start with dracut as component, because you are the only ones, who know who puts what into the initramfs. There will be the need to involve other components as well. But you have to identify them, because we couldn't. I choose Fedora 42 because the oob liveimage has still the same bug. -- It all started with Fedora 39. It got installed on two HP business laptops with AMD gpu and an nvidia discret GPU (not a 3d accelerator). But we can forget about Nvidia, because our tests revealed that it has nothing to do with the issue. so, after installing Fedora 39 on the laptop, everything worked as expected (kernel 6.5.6-300 ) . After the time, when kernel 6.8.4. was released, the system developed a bug, where the amdgpu driver did not init the hw as it needed to be initialized. the system still boots, but the GDM logingreeter is just a blackscreen, because the screen is OFF. It does not "not render", the display is dark. A Power-Suspend and Revoke-Cycle brings the perfectly draw and working GDM and you can log into the system. Where we made a lot of tests.. really a lot. the issue that follows the incorrect init of the amdgpu makes problems, because the display is incorrectly identified and thus the resolution and scaling options are not working proper.The display type and capabilites due to missing edms are missinterpreted. It's a major fail. -- We did the following tests on a freshly installed system: We started with Fedora 42 ... the Liveimage itself had already the same bug. We did not need to install it. We booted the Fedora 39 Live image as we knew it was working and installed it. The system was working fine after booting it. We did a full os upgrade to the latest Fedora 39 packages and rebooted. Result: the above described bug with the black screen. We knew that kernel 6.8.4 was working on that laptop, so we downgraded the kernels one by one down to the original 6.5.6 and ALL kernels shown the issue described. There was only one conclusion valid for this scenario: it's not the kernel, it's the rest of the os who is messing it up in the initramfs. To test this, we reinstalled the naked F39 from the Live image and installed 10 kernels up to 6.9.12 ( which is not the latest possible but sufficed, as it did not work on the original installation ). We DID NOT upgrade the os yet, just the kernels. We tested one kernel after the other and DID NOT HAVE ANY ISSUES with the amdgpu. So we can rule out, that it's a kernel issue. We checked with -M with modules dracut puts into the initramfs and created several states of the iniramfs with updates of Systemd, udev and dracut itself. Important: we did not install any additional software yet, so it's blank fresh install reproducible by anyone. We can rule them out, because the created initramfs worked perfectly. I will attach 3 initramfs to this bugreport. The working ones have something in common: they are around 56MB. The failing one is 100MB. If one, who has a clue what could go wrong here with amdgpu, makes a diff on the two initramfses, we will get answeres. After the os upgrade to Fedora 39 latest packages, we tested the original untouched initramfs and it still worked. So the screwup is definitely inside that initramfs. This bug has kept the laptop owner and the support up for 3 full working days and brought the rpmfusion guys to theire limit. It's worth analysing and fixing it, such a mess it is.
bugzilla does not like uploades of more than 19,5MB ;) Here is a link to all 4 initramfs: http://static.bloggt-in-braunschweig.de/initramfs.tar ( no further compressing required => it's just an archive ) initramfs-6.5.6-300.fc39.x86_64.img The original one from the liveimage install initramfs-6.9.12-100.fc39.x86_64.img The original working fs of the kernel 6.9.12 initramfs-6.9.12-100-neu.fc39.x86_64.img This contains updates for systemd,udev and dracut initramfs-6.9.12-100-FAIL-FULLY-UPGRADED.fc39.x86_64.img This one does not work proper anymore.
From further tests: inside a gnome-boxen vm the problem can not be reproduced. It is related to the detected hw what ends up in the initramfs. IN CASE someone ever need this kind of software again: ( it runs 5hs ) #!/bin/bash COUNTER=0 MAX=$(rpm -qa | wc -l) echo $MAX ALL=$(rpm -qa --qf "%{NAME}\n") for OUTPUT in $(seq $MAX) do let COUNTER++ echo $COUNTER package=$( printf "$ALL" | head -n $COUNTER | tail -n 1) echo $package #german for "nothing to do" update=$(dnf -y update $package|grep -c "Nichts zu tun.") if [ $update -gt 0 ];then #german for "package skipped, because it is already up2date." echo "$package übersprungen, weil schon aktualisiert" else echo "Building INITRAMFS" dracut -f ergebnis=$( ls -lah /boot/initramfs-6.5.6-300.fc39.x86_64.img | awk '{print $5};' | sed -e "s/M//g") ls -lah /boot/initramfs-6.5.6-300.fc39.x86_64.img | awk '{print $5};' if [ $ergebnis -eq 100 ]; then echo " ############################################################################# " echo " ############################################################################# " echo " ############################################################################# " echo " Package identified: " echo $package exit; fi fi done printf "The value of the counter is COUNTER=%d\n" $COUNTER
The 40M increase in size comes from a difference in the firmware folder: 59M new/usr/lib/firmware 19M old/usr/lib/firmware a comparison: 164K new/usr/lib/firmware/amd 20M new/usr/lib/firmware/amdgpu 39M new/usr/lib/firmware/nvidia 156K old/usr/lib/firmware/amd 17M old/usr/lib/firmware/amdgpu 1,5M old/usr/lib/firmware/nvidia For the record: As no amdgpu is in the test vm, we do not get these firmwarefiles included in our initramfs. so it boils down to a change in the amdgpu firmware files
Component identified: amd/nvidia-gpu-firmware Today we got the laptop working again normally, by downgrading the amd and nvidia-gpu-firmwares to a working F39 version and rebuilding the initramfs files. O== known faulty version AND ANY NEWER UP TO TODAY ( ( there could be a older be faulty too ) Name : amd-gpu-firmware Version : 20241110 Release : 1.fc39 O=== tested WORKING version: ( there could be a newer working ) amd-gpu-firmware-20230804-152.fc39.noarch.rpm nvidia-gpu-firmware-20230804-152.fc39.noarch.rpm O=== Device: OMEN by HP Gaming Laptop 16-xf0xxx (8C012EA#ABD) family=103C_5335M7 HP OMEN sku=8C012EA#ABD CPU AMD Ryzen 7 7840HS w/ Radeon 780M Graphics O=== HOW TO WORKAROUND for ANY Fedora F39++ release: 1. Download both files form KOJI -> search "linux-firmware" package 2. become root 3. rpm -e --no-deps amd-gpu-firmware nvidia-gpu-firmware 4. rpm -i amd-gpu-firmware-20230804-152.fc39.noarch.rpm nvidia-gpu-firmware-20230804-152.fc39.noarch.rpm OPTIONAL IF AMDGPU got blacklisted to have a clean boot: 5. set /etc/default/grub back to normal ( amdgpu was blacklisted ) 6. rework /boot/loader/entities/* and remove blacklist there too for easiert booting. 7. dracut -f 8. echo "exclude=amd-gpu* nvidia-gpu*" >> /etc/dnf/dnf.conf 9. reboot NOTE: because we excluded them from dnf, we can do sysupgrades until the issue is FIXED BY AMD!
Kanotix suggested to add this: 01:00.0 VGA compatible controller: NVIDIA Corporation AD106M [GeForce RTX 4070 Max-Q / Mobile] (rev a1) 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Phoenix1 (rev c2)
Thanks for all the detail. > A Power-Suspend and Revoke-Cycle brings the perfectly draw and working GDM and you can log into the system. This stood out to me. Are you saying it boots to a black screen, but if you suspend/resume it works as expected? > But we can forget about Nvidia, because our tests revealed that it has nothing to do with the issue. So why do you need to downgrade the nvidia firmware packages? > brought the rpmfusion guys to theire limit So you're using the NV binary drivers? > The working ones have something in common: they are around 56MB. > The failing one is 100MB. This is unsurprising, the NVIDIA firmware supporting GSP are very large and when they pushed them upstream their firmware package grew a lot, I strongly suspect it's unrelated to this problem and purely a coincidence.
> The 40M increase in size comes from a difference in the firmware folder: Unsurprising TBH. > a comparison: > > 164K new/usr/lib/firmware/amd > 20M new/usr/lib/firmware/amdgpu > 39M new/usr/lib/firmware/nvidia > > 156K old/usr/lib/firmware/amd > 17M old/usr/lib/firmware/amdgpu > 1,5M old/usr/lib/firmware/nvidia And that shows really the only one that noticeably increased was the NV firmware, that's expected as NV pushed their GSP generation firmwares upstream and they're massive. > For the record: As no amdgpu is in the test vm, we do not get these > firmwarefiles included in our initramfs. > > so it boils down to a change in the amdgpu firmware files How do you correlate an issue with amd firmware files not being included in the initrd on a VM (as expected) with the problem?
(In reply to Peter Robinson from comment #6) > Thanks for all the detail. > > > A Power-Suspend and Revoke-Cycle brings the perfectly draw and working GDM and you can log into the system. > > This stood out to me. Are you saying it boots to a black screen, but if you > suspend/resume it works as expected? Lets say it that way: You can work with it, but you don't have the full set of display features i.e. different resolutions, different framerates. > > > But we can forget about Nvidia, because our tests revealed that it has nothing to do with the issue. > > So why do you need to downgrade the nvidia firmware packages? I downgraded the nvidia firmware to have it on the same stand, than the amd one, in case they relate to another. If it really was necessary to downgrade nvidia too, was not checked. > > > brought the rpmfusion guys to theire limit > > So you're using the NV binary drivers? On the live system: yes . For the Tests described here: NO. No nvidia driver was installed. Just a plain Fedora liveimage install. > > > The working ones have something in common: they are around 56MB. > > The failing one is 100MB. > > This is unsurprising, the NVIDIA firmware supporting GSP are very large and > when they pushed them upstream their firmware package grew a lot, I strongly > suspect it's unrelated to this problem and purely a coincidence. Most likely.
> > > A Power-Suspend and Revoke-Cycle brings the perfectly draw and working GDM and you can log into the system. > > > > This stood out to me. Are you saying it boots to a black screen, but if you > > suspend/resume it works as expected? > > Lets say it that way: You can work with it, but you don't have the full set > of display features i.e. different resolutions, different framerates. So is that yes, it works with a suspend/resume? Please be clear/concise. I'm not a AMD GPU expert, but I bet it reloads the firmware on resume, and that would be coming from disk not initrd which suggests to me there's nothing wrong with the actual firmware (if there was you'd never get any graphics output) but rather the one in the initrd. Can you also attach a good/bad dmesg output. One thing to possibly try to assist with the initrd firmware issue is to enable ssh and once the machine is booted ssh in and from root do a 'rmmod amdgpu; sleep 10; modprobe amdgpu' and see if the screen comes up then. Also attach the dmesg from that.
> > so it boils down to a change in the amdgpu firmware files > > How do you correlate an issue with amd firmware files not being included in > the initrd on a VM (as expected) with the problem? The NOT-Including of something in the vm was a hint, that that what we were looking for is specific to the system and included into the initramfs. And because i already fixed it on the laptops of the client, by downgrading only the two firmwarefiles and recreating the initramfs afterwards. Client is now on F42, with those F39 firmwarefiles. Working flawless again. That we had a change in the nv firmware at that time, was a lucky hint, that lead to the right solution. I had tons of other "usual suspects" on the list (X,Wayland, Systemd aso..), before i thought about the firmware files. NOW, after that debugging session, it's easy and logical, but when it happend, the nvidia kernel driver was the first candidate, than amd kernel driver than something else. In 25 years of Linux, it was never the firmware, until it was now ;) > So is that yes, it works with a suspend/resume? Please be clear/concise. After a suspend/resume cycle, it was usable with some features missing. > I'm not a AMD GPU expert, but I bet it reloads the firmware on resume, and that would be coming from disk not initrd which suggests to me there's nothing wrong with the actual firmware (if there was you'd never get > any graphics output) but rather the one in the initrd. Are you sure? What if NOTHING is reuploaded after the resume, than the chip would have the working build-in firmware release again. This would make more sense, than that the resume loads the firmware from disk, which is the same firmware as in the initramfs and therefor not working. > Can you also attach a good/bad dmesg output. Will take a while, i have to arrange a date with the client and test on it's hardware. > One thing to possibly try to assist with the initrd firmware issue is to enable ssh and once the machine is booted ssh in and from root do a 'rmmod amdgpu; sleep 10; modprobe amdgpu' and see if the screen comes up > then. Also attach the dmesg from that. ok. Does the kernel driver load the firmware or is the firmware independently loaded by a service on boot ?
> Are you sure? What if NOTHING is reuploaded after the resume, than the chip > would have the working build-in firmware release again. I stated above I'm not an expert in AMD GPUs, but most HW reloads the firmware on resume. What built in firmware are you referring to? > This would make more sense, than that the resume loads the firmware from > disk, which is the same firmware as in the initramfs and therefor not > working. Can you confirm that the sha256sum is the same if you extract the initrd? > > One thing to possibly try to assist with the initrd firmware issue is to enable ssh and once the machine is booted ssh in and from root do a 'rmmod amdgpu; sleep 10; modprobe amdgpu' and see if the screen comes up > > then. Also attach the dmesg from that. > > ok. Does the kernel driver load the firmware or is the firmware > independently loaded by a service on boot ? The kernel driver loads the firmware. If you 'modinfo amdgpu' you'll see a list of all the firmware that the driver files the driver may load from disk (obv HW dependent), same goes for wifi or any other HW that loads firmware.
(In reply to Peter Robinson from comment #11) > > Can you confirm that the sha256sum is the same if you extract the initrd? if not, someone tampered it. We can check that.. no problem. > > > One thing to possibly try to assist with the initrd firmware issue is to enable ssh and once the machine is booted ssh in and from root do a 'rmmod amdgpu; sleep 10; modprobe amdgpu' and see if the screen comes up > > > then. Also attach the dmesg from that. > > > > ok. Does the kernel driver load the firmware or is the firmware > > independently loaded by a service on boot ? > > The kernel driver loads the firmware. If you 'modinfo amdgpu' you'll see a > list of all the firmware that the driver files the driver may load from disk > (obv HW dependent), same goes for wifi or any other HW that loads firmware. I would also expect this to happen on each resume, in case the chips had a powerloss. But it makes logically no sense in regard to the issue at hand, because it raises the question of Why would the first init attempt not work and the second attempt, with the same data and the same routine initializing it, does work? Assuming that it is always the same procedure to load the firmware and the driver, only a RACE condition regarding timing would make sense. STOP here. We will check the hw, see what, if anything, dmesg has to say about it ( I did not find anything related when i checked it last autumn ) and call back. ok?
Created attachment 2106594 [details] working dmesg output
Created attachment 2106595 [details] dmesg not working up to GDM blackscreen
Created attachment 2106596 [details] dmesg not working up to full desktop
Created attachment 2106597 [details] full lspci with pci ids
is this bug related to this? https://github.com/ROCm/ROCm/issues/5724#issuecomment-3642517574
no.(In reply to JAlberto from comment #17) > is this bug related to this? > https://github.com/ROCm/ROCm/issues/5724#issuecomment-3642517574 No.
FEDORA-2026-1d240112ff (linux-firmware-20260110-1.fc42) has been submitted as an update to Fedora 42. https://bodhi.fedoraproject.org/updates/FEDORA-2026-1d240112ff
FEDORA-2026-2cebf295af (linux-firmware-20260110-1.fc43) has been submitted as an update to Fedora 43. https://bodhi.fedoraproject.org/updates/FEDORA-2026-2cebf295af
FEDORA-2026-2cebf295af has been pushed to the Fedora 43 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2026-2cebf295af` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2026-2cebf295af See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2026-1d240112ff has been pushed to the Fedora 42 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2026-1d240112ff` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2026-1d240112ff See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2026-1d240112ff (linux-firmware-20260110-1.fc42) has been pushed to the Fedora 42 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2026-2cebf295af (linux-firmware-20260110-1.fc43) has been pushed to the Fedora 43 stable repository. If problem still persists, please make note of it in this bug report.
*** NOT FIXED ***
if you have contacts to a relevant AMD dev, a direct communication would help a lot.
Mario are you aware of any issues surrounding this?
There's a few things I'll mention. I know there was a race condition with GDM and simpledrm that could lead to a black screen at the login screen. This has been fixed a few GNOME releases (a year or two ago IIRC). If you can still reproduce this issue on a current Fedora release I would say that's not your issue though. Looking through some of your dmesg above a few things I'll note. > amdgpu: ATOM BIOS: 113-PHXGENERIC-001 > [drm] Display Core v3.2.241 initialized on DCN 3.1.4 This is a Phoenix or Hawk Point system. > Loading DMUB firmware via PSP: version=0x08002300 Your "good" run (attachment 2106594 [details]) included a DMUB microcode 0x08002300. > [drm] Loading DMUB firmware via PSP: version=0x08004500 Your "bad" run (attachment 2106595 [details] and 2106596) included a DMUB microcode 0x08004500. A few leading questions to try to figure out what's going on: 1) If you remove 'quiet' from the kernel command line do you have graphics up until a certain point and then it turns black? 2) Is this just a case of brightness being too dark? IE, can you press brightness down followed by brightness up and it recovers? 3) Is an external monitor affected? If you plug one in before you turn on the system, does it work at all during boot? During login? 4) Would you be able to isolate binaries one by one between the two firmware packages to figure out which one causes the issue? These are the binaries applicable to your system that you would need to check. dcn_3_1_4_dmcub.bin gc_11_0_1_imu.bin gc_11_0_1_me.bin gc_11_0_1_mec.bin gc_11_0_1_mes1.bin gc_11_0_1_mes_2.bin gc_11_0_1_pfp.bin gc_11_0_1_rlc.bin psp_13_0_4_ta.bin psp_13_0_4_toc.bin sdma_6_0_1.bin vcn_4_0_2.bin Basically start with your "good" firmware and then copy a binary in, rebuild your initramfs and reboot. Once you are at a boot that fails, let me know which binary failed. Once we can narrow down the binary that failed, we can cross reference it against upstream to see what git hashes it matches and work further on it.
Did you get a chance to debug this further?
i'm not the owner of that laptop, but he is informed and if he can, he will invest.