Created attachment 1857568 [details] Kernel 5.16.3 dmesg of the failed boot attempt 1. Please describe the problem: Computer crashes ~2 seconds after login every time, 100% reproducible. I am using the AMD Radeon RX 6500 XT as GPU, which was just released last week. 2. What is the Version-Release number of the kernel: Defect with 5.16.3 Works with 5.16.2 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : 5.16.3 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: - Boot the computer - Login (I am using KDE Plasma 5.23.5) 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: - 6. Are you running any modules that not shipped with directly Fedora's kernel?: No 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. File is attached and shows some errors about amdgpu
Created attachment 1857907 [details] Kernel 5.16.4 dmesg of the failed boot attempt Just tested Kernel 5.16.4, still can't get the computer to boot properly. But I think the error changed: "[drm:amdgpu_dm_init.isra.0.cold [amdgpu]] *ERROR* Failed to register vline0 irq 30!"
Created attachment 1858728 [details] Kernel 5.16.5 dmesg of the failed boot attempt For Kernel 5.16.5 the error slightly changed again: 1. error message: "Feb 02 18:29:36 kernel: __common_interrupt: 8.55 No irq handler for vector" 2. error message: "Feb 02 18:29:39 kernel: [drm:amdgpu_dm_init.isra.0.cold [amdgpu]] *ERROR* Failed to register vline0 irq 30!"
Does this work with 5.17-rc2 from rawhide? Just curious if the patch added to 5.16.3 was incomplete or if it was a bad patch all together. Unfortunately, 5.16.3 was over 1000 patches, 29 of those being amd specific, so trying to narrow it down a bit. I am not seeing it on my rx580 system, which is the only AMD card I have at the moment.
Created attachment 1859208 [details] Kernel 5.17.0-rc2 dmesg of the failed boot attempt Just tried to boot with kernel-5.17.0-0.rc2.83.fc36.x86_64 , but my system is still crashing.
Unsure if strictly related, but also encountering a crash regression on boot. My gap was larger, 5.15.16 to 5.15.5. From fetch: CPU: AMD Ryzen 5 5600H with Radeon Graphics @ 12x 3.3GHz GPU: NVIDIA GeForce RTX 3060 Laptop GPU Came searching after the first reboot into the previous installed kernel to see if anyone else had the same issue. Can try to check for an error message/try an rc kernel on future restarts
Retracting last comment, took an opportunity to attempt to capture boot log on a restart today and my issue was merely more waiting needed at boot time, machine only appeared to be hung when the spinning circle stopped.
Want to give https://koji.fedoraproject.org/koji/taskinfo?taskID=82513039 a try? It is not secure boot signed being a test kernel, but might help.
Tried Kernel 5.16.8rc2, but the computer still crashes a few seconds after login.
I have a similar issue with a work laptop, try booting with acpi off (acpi=off on the kernel command line) I suspect that a bios update should solve this...
Booting wuth acpi=off did not solve the crash. Did the crash you describe happen inside the AMDGPU driver code or was it a different error message? I think I'll try to compile the kernel "commit for commit" by myself and check what change introduced the regression. Guess this is the currently the best way to find the bug (?).
No, I only have one machine with a amdgpu and it works just fine with mesa... The symptoms are similar to my weork laptop (intel, HP laptop) that freezes at login (basically deadlock) when booted with newer kernels... But it's ascpi related in that case.
But then your crash has nothing to do with this bug report. Please file another bug and only post here if it's related to the AMDGPU driver that throws that message "[drm:amdgpu_dm_init.isra.0.cold [amdgpu]] *ERROR* Failed to register vline0 irq 30!".
Created attachment 1861436 [details] Kernel 5.17.0-rc4 dmesg of the failed boot attempt Kernel 5.17.0-rc4 crashes while producing (probably) the most detailed dmesg output (the last 8 lines)
It is commit 620c32a9af98fec55f9f22e2dbeff10824c909e6 "drm/amdgpu/display: set vblank_disable_immediate for DC". When I compile the Kernel using 2f0fd2f941e88cfb56f1aa5e5ec7bd396576d2f3, everything works fine, but one commit later (620c32a9af98fec55f9f22e2dbeff10824c909e6) I can't boot anymore. This is all I can do for now, hope this helps to find/fix the bug.
As this is a problem with the upstream kernel, I close this one and create an issue there.
Just for the reference: Here is the bug report on AMD's side: https://gitlab.freedesktop.org/drm/amd/-/issues/1933