Created attachment 1576228 [details] journalctl -ab-1 for 5.1.5-300.fc30.x86_64 boot fail Description of problem: Jun 02 12:40:04 fowl kernel: fb0: switching to amdgpudrmfb from EFI VGA Jun 02 12:40:04 fowl kernel: amdgpu 0000:08:00.0: vgaarb: deactivate vga console Jun 02 12:40:04 fowl kernel: amdgpu 0000:08:00.0: enabling device (0006 -> 0007) Jun 02 12:40:04 fowl kernel: [drm] initializing kernel modesetting (RAVEN 0x1002:0x15DD 0x1002:0x15DD 0xC6). Jun 02 12:40:04 fowl kernel: [drm] register mmio base: 0xFCD00000 Jun 02 12:40:04 fowl kernel: [drm] register mmio size: 524288 Jun 02 12:40:04 fowl kernel: [drm] add ip block number 0 <soc15_common> Jun 02 12:40:04 fowl kernel: [drm] add ip block number 1 <gmc_v9_0> Jun 02 12:40:04 fowl kernel: [drm] add ip block number 2 <vega10_ih> Jun 02 12:40:04 fowl kernel: [drm] add ip block number 3 <psp> Jun 02 12:40:04 fowl kernel: [drm] add ip block number 4 <gfx_v9_0> Jun 02 12:40:04 fowl kernel: [drm] add ip block number 5 <sdma_v4_0> Jun 02 12:40:04 fowl kernel: [drm] add ip block number 6 <powerplay> Jun 02 12:40:04 fowl kernel: [drm] add ip block number 7 <dm> Jun 02 12:40:04 fowl kernel: [drm] add ip block number 8 <vcn_v1_0> Jun 02 12:40:04 fowl kernel: amdgpu 0000:08:00.0: Direct firmware load for amdgpu/raven_gpu_info.bin failed with error -2 Jun 02 12:40:04 fowl kernel: amdgpu 0000:08:00.0: Failed to load gpu_info firmware "amdgpu/raven_gpu_info.bin" Jun 02 12:40:04 fowl kernel: amdgpu 0000:08:00.0: Fatal error during GPU init Jun 02 12:40:04 fowl kernel: [drm] amdgpu: finishing device. Jun 02 12:40:04 fowl kernel: amdgpu: probe of 0000:08:00.0 failed with error -2 Version-Release number of selected component (if applicable): amdgpu-5.1.5-300.fc30.x86_64 How reproducible: After a kernel update to 5.1.5-300.fc30.x86_64 boot fails with blank screen. Rebooting to previous kernel version 5.0.17-300.fc30.x86_64 and checking the logs for the previous boot shows this Actual results: Blank screen - cannot login because amdgpu driver fails to load Expected results: Normal boot Additional info: The said file exists in the filesystem but fails to load. # diff -y amdgpu-5.0.17-300.fc30.x86_64.conf amdgpu-5.1.5-300.fc30.x86_64.conf add_drivers+=" amdgpu" add_drivers+=" amdgpu" fw_dir+="/lib/firmware/5.0.17-300.fc30.x86_64" | fw_dir+="/lib/firmware/5.1.5-300.fc30.x86_64" ls -l /lib/firmware/5.1.5-300.fc30.x86_64/amdgpu/raven_gpu_info.bin -rw-r--r--. 1 root root 316 Apr 8 12:42 /lib/firmware/5.1.5-300.fc30.x86_64/amdgpu/raven_gpu_info.bin ls -l /lib/firmware/amdgpu/raven_gpu_info.bin -rw-r--r--. 1 root root 316 May 15 02:17 /lib/firmware/amdgpu/raven_gpu_info.bin ls -l /usr/lib/firmware/amdgpu/raven_gpu_info.bin -rw-r--r--. 1 root root 316 May 15 02:17 /usr/lib/firmware/amdgpu/raven_gpu_info.bin ls /usr/lib/firmware/5.1.5-300.fc30.x86_64/amdgpu/raven_gpu_info.bin -l -rw-r--r--. 1 root root 316 Apr 8 12:42 /usr/lib/firmware/5.1.5-300.fc30.x86_64/amdgpu/raven_gpu_info.bin
On the side - why are these firmware files installed in 4 different places without symlinks (including "/lib/firmware/amdgpu/" path that makes 4 locations with the previous kernel path included). I saw that "/usr/lib/firmware/5.1.5-300.fc30.x86_64/" had older files dated from April so I copied over the latest may firmware files from "/lib/firmware/amdgpu/" but that did not have any effect. This path location not updated even when I re-installed linux-firmware... G
I ran a dracut --force to rebuild initramfs and now even 5.0.17-300 kernel will not load amdgpu. Luckily I had a custom 5.0.17-300 grub menu set for a last-known-good-menu type boot option in which I had copied the /boot/ files including initramfs so I am able to boot Fedora at the least. Googling around I see this issue has come and gone since 2017 but I am not clear why the error happens in this instance even with the files present in */firmware/* folders. G
I fixed the problem with this command: #>dracut --force --kver 5.1.5-300.fc30.x86_64 --install "/lib/firmware/amdgpu/*" My boot system is now 5.1.5-300.fc30.x86_64 #1 SMP Sat May 25 18:00:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Thanks for your help on this. G
Hi Folks The problem came back with the next kernel update. Jun 06 11:37:39 fowl kernel: amdgpu 0000:08:00.0: Direct firmware load for amdgpu/raven_gpu_info.bin failed with error -2 Jun 06 11:37:39 fowl kernel: amdgpu 0000:08:00.0: Failed to load gpu_info firmware "amdgpu/raven_gpu_info.bin" Jun 06 11:37:39 fowl kernel: amdgpu 0000:08:00.0: Fatal error during GPU init An 'lsinitrd -k 5.1.6-300.fc30.x86_64 | grep amdgpu' did not show any ravenridge drivers. adding amdgpu to dracut.conf also does not work with a dracut rebuild. cat /etc/dracut.conf # PUT YOUR CONFIG IN separate files # in /etc/dracut.conf.d named "<name>.conf" # SEE man dracut.conf(5) for options #add_dracutmodules+=" amdgpu " add_modules+=" amdgpu " add_drivers+=" amdgpu " omit_dracutmodules+=" dmraid " install_items+=" /lib/firmware/edid/BenQ2710Q.bin /lib/firmware/edid/BenQ2711U.bin " I had to manually run 'dracut --force --kver 5.1.6-300.fc30.x86_64 --install "/lib/firmware/amdgpu/raven*"' before I could boot to the login screen. Is something broken about the toolchain or is it something specific to my setup? What else can I check/fix to ensure the next fedora kernel update builds initrd with amdgpu seamlessly without a manual redo like I did? Thanks G
Been having to manually dracut --regenerate-all --force --install=/lib/firmware/amdgpu/* with every kernel update. Did some digging around and apparently, the problem was an old dkms configuration from a previous ROCm install. Since I wasn't using ROCm anymore, deleted the files from /var/lib/dkms/amdgpu and it stopped generating bad /etc/dracut.conf.d/amdgpu-KERNEL-VERSION.conf files.
Hi You are right yes - I had ROCm for a week before the problem started. But uninstalling all ROCm packages a few weeks back did not fix. My /var/lib/dkms does not have an amdgpu folder. Instead /I now worked around the problem by addthig this line to dracut.conf: install_items+=" /lib/firmware/amdgpu/raven* " G
Hi You are right yes - I had ROCm for a week before the problem started. But uninstalling all ROCm packages a few weeks back did not fix. My /var/lib/dkms does not have an amdgpu folder. Instead I worked around the problem by adding this line to dracut.conf: install_items+=" /lib/firmware/amdgpu/raven* " G
My observation is that if there isn't any amdgpu-*.conf files in /etc/dracut.conf.d directory, dracut does "the right thing" and includes /usr/lib/firmware/MODULENAME files. Your workaround would just make the initrd slightly smaller by only including the raven* files instead of all amd/ati firmwares. Using lsinitrd /boot/initramfs-KERNEL-VERSION helped me debug this problem immensely. In the end, an empty /etc/dracut.conf file and empty /etc/dracut.conf.d/ fixed my issues.
I deleted the files in /etc/dracut.conf.d as described in the above post and also removed the entries in dracut.conf and it has passed two kernel upgrades successfully. Thanks for your help. G
I reproducte the bogue On Fedora 31. 1. Please describe the problem: My Radeon RX 5700 XT causes a black screen because the firmware amdgpu/navi10_gpu does not load on the boot. 2. What is the Version-Release number of the kernel: kernel : 5.3.7 3. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: At each start 4. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. Oct 28 19:50:12 zeus5 kernel: amdgpu 0000:28:00.0: Direct firmware load for amdgpu/navi10_gpu_info.bin failed with error -2 Oct 28 19:50:12 zeus5 kernel: amdgpu 0000:28:00.0: Failed to load gpu_info firmware "amdgpu/navi10_gpu_info.bin" Oct 28 19:50:12 zeus5 kernel: amdgpu 0000:28:00.0: Fatal error during GPU init Oct 28 19:50:12 zeus5 kernel: [drm] amdgpu: finishing device. Oct 28 19:50:12 zeus5 kernel: ------------[ cut here ]------------ Oct 28 19:50:12 zeus5 kernel: sysfs group 'fw_version' not found for kobject '0000:28:00.0' Oct 28 19:50:12 zeus5 kernel: WARNING: CPU: 2 PID: 490 at fs/sysfs/group.c:278 sysfs_remove_group+0x74/0x80 Oct 28 19:50:12 zeus5 kernel: Modules linked in: fjes(-) amdgpu(+) hid_logitech_hidpp(+) amd_iommu_v2 gpu_sched ttm drm_kms_helper drm igb crc32c_intel dca i2c_algo_bit nvme hid_logitech nvme_core ff_memless hid_logitech_dj wmi pinctrl_amd fuse i2c_dev Oct 28 19:50:12 zeus5 kernel: CPU: 2 PID: 490 Comm: systemd-udevd Not tainted 5.3.7-301.fc31.x86_64 #1 Oct 28 19:50:12 zeus5 kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B78/X470 GAMING PRO CARBON (MS-7B78), BIOS 2.A0 07/27/2019 Oct 28 19:50:12 zeus5 kernel: RIP: 0010:sysfs_remove_group+0x74/0x80 Oct 28 19:50:12 zeus5 kernel: Code: ff 5b 48 89 ef 5d 41 5c e9 29 bc ff ff 48 89 ef e8 41 b9 ff ff eb cc 49 8b 14 24 48 8b 33 48 c7 c7 78 ca 15 94 e8 0a 4d d4 ff <0f> 0b 5b 5d 41 5c c3 0f 1f 44 00 00 0f 1f 44 00 00 48 85 f6 74 31 Oct 28 19:50:12 zeus5 kernel: RSP: 0018:ffffba7000b079f8 EFLAGS: 00010282 Oct 28 19:50:12 zeus5 kernel: RAX: 0000000000000000 RBX: ffffffffc096abc0 RCX: 0000000000000006 Oct 28 19:50:12 zeus5 kernel: RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff91157e697900 Oct 28 19:50:12 zeus5 kernel: RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000449 Oct 28 19:50:12 zeus5 kernel: R10: 00000000000173a8 R11: 0000000000000003 R12: ffff911579f540b0 Oct 28 19:50:12 zeus5 kernel: R13: ffff91156b090018 R14: ffff91156d6ee4a0 R15: 0000000000000000 Oct 28 19:50:12 zeus5 kernel: FS: 00007f25015df940(0000) GS:ffff91157e680000(0000) knlGS:0000000000000000 Oct 28 19:50:12 zeus5 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 28 19:50:12 zeus5 kernel: CR2: 00005642af3dedc8 CR3: 00000007ef7e0000 CR4: 00000000003406e0 Oct 28 19:50:12 zeus5 kernel: Call Trace: Oct 28 19:50:12 zeus5 kernel: amdgpu_device_fini+0x441/0x475 [amdgpu] Oct 28 19:50:12 zeus5 kernel: amdgpu_driver_unload_kms+0x4a/0x90 [amdgpu] Oct 28 19:50:12 zeus5 kernel: amdgpu_driver_load_kms.cold+0x8f/0xb1 [amdgpu] Oct 28 19:50:12 zeus5 kernel: drm_dev_register+0x111/0x150 [drm] Oct 28 19:50:12 zeus5 kernel: amdgpu_pci_probe+0xbd/0x120 [amdgpu] Oct 28 19:50:12 zeus5 kernel: ? __pm_runtime_resume+0x58/0x80 Oct 28 19:50:12 zeus5 kernel: local_pci_probe+0x42/0x80 Oct 28 19:50:12 zeus5 kernel: pci_device_probe+0x107/0x1a0 Oct 28 19:50:12 zeus5 kernel: really_probe+0xf0/0x380 Oct 28 19:50:12 zeus5 kernel: driver_probe_device+0x59/0xd0 Oct 28 19:50:12 zeus5 kernel: device_driver_attach+0x53/0x60 Oct 28 19:50:12 zeus5 kernel: __driver_attach+0x8a/0x150 Oct 28 19:50:12 zeus5 kernel: ? device_driver_attach+0x60/0x60 Oct 28 19:50:12 zeus5 kernel: bus_for_each_dev+0x78/0xc0 Oct 28 19:50:12 zeus5 kernel: bus_add_driver+0x14a/0x1e0 Oct 28 19:50:12 zeus5 kernel: driver_register+0x6c/0xb0 Oct 28 19:50:12 zeus5 kernel: ? 0xffffffffc0ba1000 Oct 28 19:50:12 zeus5 kernel: do_one_initcall+0x46/0x1f4 Oct 28 19:50:12 zeus5 kernel: ? _cond_resched+0x15/0x30 Oct 28 19:50:12 zeus5 kernel: ? kmem_cache_alloc_trace+0x162/0x220 Oct 28 19:50:12 zeus5 kernel: ? do_init_module+0x23/0x230 Oct 28 19:50:12 zeus5 kernel: do_init_module+0x5c/0x230 Oct 28 19:50:12 zeus5 kernel: load_module+0x27b1/0x2990 Oct 28 19:50:12 zeus5 kernel: ? __do_sys_init_module+0x16e/0x1a0 Oct 28 19:50:12 zeus5 kernel: ? _cond_resched+0x15/0x30 Oct 28 19:50:12 zeus5 kernel: __do_sys_init_module+0x16e/0x1a0 Oct 28 19:50:12 zeus5 kernel: do_syscall_64+0x5f/0x1a0 Oct 28 19:50:12 zeus5 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Oct 28 19:50:12 zeus5 kernel: RIP: 0033:0x7f250263509e Oct 28 19:50:12 zeus5 kernel: Code: 48 8b 0d ed fd 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ba fd 0b 00 f7 d8 64 89 01 48 Oct 28 19:50:12 zeus5 kernel: RSP: 002b:00007ffdd96c0f98 EFLAGS: 00000246 ORIG_RAX: 00000000000000af Oct 28 19:50:12 zeus5 kernel: RAX: ffffffffffffffda RBX: 00005642ae2db490 RCX: 00007f250263509e Oct 28 19:50:12 zeus5 kernel: RDX: 00007f250225684d RSI: 00000000008435ce RDI: 00005642aeb9b7f0 Oct 28 19:50:12 zeus5 kernel: RBP: 00005642aeb9b7f0 R08: 0000000000000006 R09: 00007ffdd96c040e Oct 28 19:50:12 zeus5 kernel: R10: 0000000000000007 R11: 0000000000000246 R12: 00007f250225684d Oct 28 19:50:12 zeus5 kernel: R13: 0000000000000007 R14: 00005642ae2e5bd0 R15: 00005642ae2db490 Oct 28 19:50:12 zeus5 kernel: ---[ end trace 694806b803847eca ]---
This message is a reminder that Fedora 30 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '30'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 30 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 30 changed to end-of-life (EOL) status on 2020-05-26. Fedora 30 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.