Bug 2445615 - amdxdna probe failure leaves NPU in bad hardware state, causing s2idle blank screen on resume
Summary: amdxdna probe failure leaves NPU in bad hardware state, causing s2idle blank ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: linux-firmware
Version: 42
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: David Woodhouse
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2447225 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2026-03-08 18:48 UTC by tom
Modified: 2026-03-14 02:19 UTC (History)
13 users (show)

Fixed In Version: linux-firmware-20260309-1.fc44 linux-firmware-20260309-1.fc43
Clone Of:
Environment:
Last Closed: 2026-03-14 00:16:34 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description tom 2026-03-08 18:48:11 UTC
accel/amdxdna
Version 6.18
x86-64

Hardware: Framework Laptop 13 (AMD Ryzen AI 300 Series), revision A5
CPU: AMD Ryzen AI 5 340 w/ Radeon 840M
NPU: AMD Strix/Krackan Neural Processing Unit `[1022:17f0]` (rev 20)
GPU: AMD Krackan Radeon 840M/860M `[1002:1114]` (rev c3)
BIOS: Framework 03.05 (2025-10-30)
KERNEL: 6.18.13 and 6.18.16 (both affected)
linux firmware: 20260221
Sleep mode: s2idle (only mode supported; `ACPI: PM: (supports S0 S4 S5)`)

At boot, `amdxdna` fails to probe the NPU due to a firmware protocol mismatch:

amdxdna 0000:c2:00.1: enabling device (0000 -> 0002)
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_check_protocol: Incompatible firmware protocol major 7 minor 2
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_hw_start: firmware is not alive
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_smu_exec: smu cmd 4 failed, 0xff
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_smu_fini: Power off failed, ret -22
amdxdna 0000:c2:00.1: [drm] *ERROR* aie2_init: start npu failed, ret -22
amdxdna 0000:c2:00.1: [drm] *ERROR* amdxdna_probe: Hardware init failed, ret -22
amdxdna 0000:c2:00.1: probe with driver amdxdna failed with error -22

The probe error path calls `aie2_smu_fini()` as cleanup, but because `aie2_hw_start` failed before SMU initialization completed, `aie2_smu_fini` attempts to power off hardware that was never fully initialized — and itself fails (`Power off failed, ret -22`). This leaves the NPU in an unknown hardware state.

Later, when the system suspends via s2idle, the last journal entry is: PM: suspend entry (s2idle)

The system never resumes. The display stays blank and the machine must be hard-reset. This is reproducible on every suspend cycle.



Reproducible: Always

Steps to Reproduce:
1. Boot a Framework Laptop 13 (Ryzen AI 300) with linux-firmware ≥ 20260221 and kernel 6.18.x
2. Observe `amdxdna` probe errors in `journalctl -b`
3. Suspend the system (close lid or `systemctl suspend`)
4. Attempt to wake — screen remains blank, system is unresponsive and must be hard-reset
Actual Results:
Screen does not turn on after going to sleep

Expected Results:
Screen turns back on, on wake-up

Additional Information:
I'm trying to find the right place to report this bug. It seems like the kernel bug-tracker is not the right place. Maybe this should be filed with frame.work? Apologies if this isn't the appropriate place for this bug report.

Comment 2 Mario Limonciello 2026-03-09 13:08:34 UTC
I corrected the component, the fix shared is correct for this issue.

Comment 3 Peter Robinson 2026-03-09 15:42:30 UTC
(In reply to tom from comment #1)
> Here is the kernel fix

I think you mean the kernel fix.

Comment 4 Mario Limonciello 2026-03-09 16:27:37 UTC
I think he meant "linux-firmware fix".  The kernel side the patches to enable support for the new binary are not stable material.  That's why the binary was reverted and put into a new name and newer kernel looks at that name.

Comment 5 Mario Limonciello 2026-03-10 00:18:25 UTC
A new linux-firmware tag was published today, another option instead of cherry-picking is updating to that.

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tag/?h=20260309

Comment 6 Peter Robinson 2026-03-10 09:39:41 UTC
(In reply to Mario Limonciello from comment #5)
> A new linux-firmware tag was published today, another option instead of
> cherry-picking is updating to that.

Yep, I don't cherry-pick updates, especially for quaint things like NPUs, because for every update we push it's gigabytes of updates across millions of devices.

This is not the first time in recent history that AMD has failed to properly test there changes.

Comment 7 Fedora Update System 2026-03-10 12:14:45 UTC
FEDORA-2026-3a58aeb68e (linux-firmware-20260309-1.fc44) has been submitted as an update to Fedora 44.
https://bodhi.fedoraproject.org/updates/FEDORA-2026-3a58aeb68e

Comment 8 Fedora Update System 2026-03-10 12:14:50 UTC
FEDORA-2026-16bdab3021 (linux-firmware-20260309-1.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2026-16bdab3021

Comment 9 Fedora Update System 2026-03-10 12:14:51 UTC
FEDORA-2026-2c690b1558 (linux-firmware-20260309-1.fc43) has been submitted as an update to Fedora 43.
https://bodhi.fedoraproject.org/updates/FEDORA-2026-2c690b1558

Comment 10 Fedora Update System 2026-03-11 01:32:19 UTC
FEDORA-2026-3a58aeb68e has been pushed to the Fedora 44 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2026-3a58aeb68e`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2026-3a58aeb68e

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 11 Fedora Update System 2026-03-11 01:49:53 UTC
FEDORA-2026-2c690b1558 has been pushed to the Fedora 43 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2026-2c690b1558`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2026-2c690b1558

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 12 Fedora Update System 2026-03-11 02:12:02 UTC
FEDORA-2026-16bdab3021 has been pushed to the Fedora 42 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2026-16bdab3021`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2026-16bdab3021

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 13 lakhsa 2026-03-11 05:37:10 UTC
(In reply to Mario Limonciello from comment #5)
> A new linux-firmware tag was published today, another option instead of
> cherry-picking is updating to that.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
> tag/?h=20260309

Question : In the generated RPM for Fedora 43  'amd-gpu-firmware-20260309-1.fc43.noarch.rpm' 2 files are unchanged (sh256sum identical) compared to fiwa-20260221:

- amdgpu/gc_11_5_0_imu.bin 
- amdgpu/psp_14_0_0_toc.bin

Is this as intended ?
Thanks for information.

Note 1 : Reason I'm asking is that with kernel-in-testing 6.19.6 + fiwa-20260221, I have spurious issues on my laptop (ThinkPad P14s AMD Ryzen AI 9 HX 370), incl. a few freezes where PSP showed up in the logs as not being avaiulable on resume. I did not observe any of these on 6.18.16 and lower,

Note 2 : RPM downloaded from https://koji.fedoraproject.org/koji/buildinfo?buildID=2956131

Comment 14 Peter Robinson 2026-03-11 09:04:27 UTC
> Question : In the generated RPM for Fedora 43 
> 'amd-gpu-firmware-20260309-1.fc43.noarch.rpm' 2 files are unchanged
> (sh256sum identical) compared to fiwa-20260221:
> 
> - amdgpu/gc_11_5_0_imu.bin 
> - amdgpu/psp_14_0_0_toc.bin
> 
> Is this as intended ?

I believe so, I've not followed this closely upstream but I think it was purely a renaming to be able to deal with different versions of the FW against different kernels.

Comment 15 Mario Limonciello 2026-03-11 14:41:01 UTC
The relevant firmware binaries for this issue are all contained in amdnpu/ directory.  You might have a separate kernel regression.  I do know that 6.19.4/6.19.5 had a bad backport from Sasha's robot, it was reverted in 6.19.6.  I'm not personally aware of anything remaining problematic in 6.19.6.  I'd suggest opening another issue to work through it.

Comment 16 Peter Robinson 2026-03-13 10:36:29 UTC
*** Bug 2447225 has been marked as a duplicate of this bug. ***

Comment 17 Andre Costa 2026-03-13 13:36:57 UTC
(In reply to Mario Limonciello from comment #15)
> The relevant firmware binaries for this issue are all contained in amdnpu/
> directory.  You might have a separate kernel regression.  I do know that
> 6.19.4/6.19.5 had a bad backport from Sasha's robot, it was reverted in
> 6.19.6.  I'm not personally aware of anything remaining problematic in
> 6.19.6.  I'd suggest opening another issue to work through it.

Hi Mario (and Peter),

Since bug 2447225 is marked as a duplicate of this one, can I assume the updated version already includes the fixes for https://gitlab.freedesktop.org/drm/amd/-/issues/5049 ?

Comment 18 Mario Limonciello 2026-03-13 16:24:47 UTC
>Since bug 2447225 is marked as a duplicate of this one, can I assume the updated version already includes the fixes for https://gitlab.freedesktop.org/drm/amd/-/issues/5049 ?

You can certainly test it to confirm and tell us if it doesn't work.  If it doesn't work then we need an amd-s2idle report to confirm the state of everything during a failure.

Comment 19 Andre Costa 2026-03-13 18:25:34 UTC
(In reply to Mario Limonciello from comment #18)
> >Since bug 2447225 is marked as a duplicate of this one, can I assume the updated version already includes the fixes for https://gitlab.freedesktop.org/drm/amd/-/issues/5049 ?
> 
> You can certainly test it to confirm and tell us if it doesn't work.  If it
> doesn't work then we need an amd-s2idle report to confirm the state of
> everything during a failure.

Sure thing, I already did that, thanks to the instructions from comment #11. I can confirm it solves the suspend issue on my hardware (AMD AI 9 365 Strix Point) \o/

Comment 20 Fedora Update System 2026-03-14 00:16:34 UTC
FEDORA-2026-3a58aeb68e (linux-firmware-20260309-1.fc44) has been pushed to the Fedora 44 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 21 Fedora Update System 2026-03-14 02:19:32 UTC
FEDORA-2026-2c690b1558 (linux-firmware-20260309-1.fc43) has been pushed to the Fedora 43 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.