Bug 2330681 - Recent debug kernels fail to boot with "failed to validate module" errors (BPF / BTF)
Summary: Recent debug kernels fail to boot with "failed to validate module" errors (BP...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: RejectedBlocker AcceptedFreezeException
: 2334643 (view as bug list)
Depends On:
Blocks: F42BetaFreezeException
TreeView+ depends on / blocked
 
Reported: 2024-12-05 22:49 UTC by Mikhail
Modified: 2025-03-15 00:43 UTC (History)
23 users (show)

Fixed In Version: kernel-6.14.0-0.rc6.49.fc42
Clone Of:
Environment:
Last Closed: 2025-03-15 00:43:43 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Terminal photo (1.11 MB, image/jpeg)
2024-12-05 22:50 UTC, Mikhail
no flags Details
Terminal photo (785.69 KB, image/jpeg)
2025-02-03 20:09 UTC, Mikhail
no flags Details
virsh console "dmesg" (224.34 KB, text/plain)
2025-02-19 20:07 UTC, Chris Murphy
no flags Details

Description Mikhail 2024-12-05 22:49:05 UTC
Something changed between kernel-6.13.0-0.rc0.20241126git7eef7e306d3c.10.fc42 and kernel-6.13.0-0.rc1.20241202gite70140ba0d2b.14.fc42 which made kernel-6.13.0-0.rc1.20241202gite70140ba0d2b.14.fc42 and all subsequent kernels become non-working

Instead, I see a lot of "failed to validate module" messages in the terminal during boot. 

But the upstream kernel I built at the same commit and .config works fine.

Reproducible: Always

Comment 1 Mikhail 2024-12-05 22:50:42 UTC
Created attachment 2061419 [details]
Terminal photo

Comment 2 Fedora Blocker Bugs Application 2024-12-06 06:46:27 UTC
Proposed as a Blocker and Freeze Exception for 42-beta by Fedora user mikhail using the blocker tracking app because:

 Some changes in the rhel patchset completely made all my systems unbootable.

Comment 3 Adam Williamson 2024-12-06 07:06:30 UTC
It boots fine on openQA (or else it wouldn't have passed gating, and all the Rawhide validation tests would fail).

On my system kernel-6.13.0-0.rc1.20241203gitcdd30ebb1b9f.16.fc42.x86_64 doesn't work for graphics, but does at least get me to a console (I hadn't got time to look into why, yet). I'm not seeing this BPF stuff.

Comment 4 Mikhail 2024-12-08 19:26:25 UTC
Adam, please test the debug kernel.
# dnf install kernel-debug kernel-debug-modules-extra


This issue only affected the debug kernel. A non-debug kernel works as intended.

Comment 5 Adam Williamson 2025-01-20 18:35:09 UTC
Mikhail, is this still happening?

Anyhow, if it only affects the debug kernel, I don't think it can be a blocker, as no install uses that by default...

Comment 6 Mikhail 2025-02-03 20:09:58 UTC
Created attachment 2075031 [details]
Terminal photo

(In reply to Adam Williamson from comment #5)
> Mikhail, is this still happening?
Yes, the latest builds https://koji.fedoraproject.org/koji/buildinfo?buildID=2649629 still not work. The messages in the terminal have changed a bit, I suspect due to a problem with dwarves package. https://bugzilla.redhat.com/show_bug.cgi?id=2342785
 
> Anyhow, if it only affects the debug kernel, I don't think it can be a
> blocker, as no install uses that by default...

The debug kernel allows you to see many problems that are usually hidden. That is why I use the debug kernel on a daily basis.

Comment 7 Adam Williamson 2025-02-03 23:03:07 UTC
That's a good reason to use it for testing, but it doesn't mean bugs in it are a release blocker.

Comment 8 Chris Murphy 2025-02-18 23:16:26 UTC
I'm hitting this also. These kernels do not boot but their non-debug equivalent versions boot fine. What I get is a bunch of failed to validate module messages and then an apparent hang, no plymouth prompt to unlock the root volume, ESC key does nothing.

kernel-debug-6.14.0-0.rc3.29.fc42.x86_64
kernel-debug-6.13.3-200.fc41.x86_64

Comment 9 Chris Murphy 2025-02-19 20:07:17 UTC
Created attachment 2077149 [details]
virsh console "dmesg"

Reproduced it in qemu/kvm. Other than having UEFI enabled (without Secure Boot), it's a stock VMM VM.

Comment 10 Chris Murphy 2025-02-19 20:26:56 UTC
Fedora-Workstation-Live-Rawhide-20250219.n.0.x86_64.iso is using 6.14.0-0.rc3.29.fc43.x86_64 which is a no-debug kernel. This is probably why OpenQA hasn't caught this problem.

Comment 11 Chris Murphy 2025-02-19 20:34:02 UTC
Just to be extra sure, looking in /run/rootfsbase/usr/lib/modules/6.14.0-0.rc3.29.fc43.x86_64/config I see:

# CONFIG_KASAN is not set
# CONFIG_BTRFS_ASSERT is not set

And at least those two things are set on Fedora debug kernels. And still another way to check is the kernel file size, non-debug are 16-17M. Debug are 31-32M.

root@localhost-live:~# ls -lsh /run/initramfs/live/boot/x86_64/loader/linux
17M -rwxr-xr-x. 1 root root 17M Feb 19 06:44 /run/initramfs/live/boot/x86_64/loader/linux

Comment 12 Chris Murphy 2025-02-19 20:46:46 UTC
Fails in both UEFI and BIOS qemu/kvm.

Comment 13 Adam Williamson 2025-02-19 20:58:06 UTC
> Fedora-Workstation-Live-Rawhide-20250219.n.0.x86_64.iso is using 6.14.0-0.rc3.29.fc43.x86_64 which is a no-debug kernel. This is probably why OpenQA hasn't caught this problem.

Well, yes, that's what all my comments above mean. It also means the bug isn't particularly critical; it just makes debugging kernel problems harder.

Comment 14 Mikhail 2025-02-19 21:48:18 UTC
Anyway, it's regression. The user can remove all non-debug kernels, and after upgrading to the next Fedora release, the system became broken.

Comment 15 Adam Williamson 2025-02-23 19:30:39 UTC
-4 in https://pagure.io/fedora-qa/blocker-review/issue/1745 , marking rejected blocker. FE vote is still open.

Comment 16 Kamil Páral 2025-02-24 18:52:48 UTC
Discussed on 2025-02-24 in a blocker review meeting [1]:

!agreed 2330681 - AcceptedBetaFE - We would like to fix debug kernels ASAP, and we don't ship them on any medium, so this should be a safe freeze exception to grant.

[1] https://meetbot.fedoraproject.org/blocker-review_matrix_fedoraproject-org/2025-02-24/f42-blocker-review.2025-02-24-17.01.log.html

Comment 17 Adam Williamson 2025-03-02 17:46:09 UTC
*** Bug 2334643 has been marked as a duplicate of this bug. ***

Comment 18 Adam Williamson 2025-03-02 17:48:31 UTC
Useful comment from the other bug, from Jason Montleon:

From serial I collected some output:
```
[    9.670579] BPF: [145778] ENUM ee 
[    9.672260] BPF: size=4 vlen=53
[    9.673775] BPF:  
[    9.675183] BPF: Invalid name
[    9.676689] BPF: 
[    9.678155] failed to validate module [fuse] BTF: -22
[    9.901438] BPF: [145778] ENUM ee 
[    9.903195] BPF: size=4 vlen=53
[    9.904717] BPF:  
[    9.906061] BPF: Invalid name
[    9.907546] BPF: 
[    9.908922] failed to validate module [fuse] BTF: -22
[    9.994502] BPF: 	 type_id=350 bits_offset=64
[    9.996322] BPF:  
[    9.997616] BPF: Invalid name
[    9.999123] BPF: 
[   10.000428] failed to validate module [scsi_dh_alua] BTF: -22
[   10.065557] BPF: [145788] FUNC  
[   10.067110] BPF: type_id=199
[   10.068530] BPF:  
[   10.069743] BPF: Invalid name
[   10.071136] BPF: 
[   10.072428] failed to validate module [scsi_dh_emc] BTF: -22
[   10.143174] BPF: 	 type_id=18 bits_offset=296
[   10.144713] BPF:  
[   10.145914] BPF: Invalid name
[   10.147255] BPF: 
[   10.148445] failed to validate module [scsi_dh_rdac] BTF: -22
[   10.242935] systemd[1]: systemd-modules-load.service: Main process exited, code=exited, status=1/FAILURE
[   10.261194] systemd[1]: systemd-modules-load.service: Failed with result 'exit-code'.
[   10.279406] systemd[1]: Failed to start systemd-modules-load.service - Load Kernel Modules.
[FAILED] Failed to start systemd-modules-load.service - Load Kernel Modules.
See 'systemctl status systemd-modules-load.service' for details.
```

Comment 19 Zbigniew Jędrzejewski-Szmek 2025-03-06 20:21:55 UTC
What is the libbpf version? There were some bugs that were only fixed in 1.5.0, but also later backport to 1.4.7.

Comment 20 Justin M. Forbes 2025-03-07 17:58:49 UTC
Want to give https://koji.fedoraproject.org/koji/taskinfo?taskID=129948696 a try?

Comment 21 Jason Montleon 2025-03-07 21:06:59 UTC
Created attachment 2079293 [details]
6.14.0-0.rc5.20250307git00a7d39898c8.47.fc43.x86_64+debug journal

This boots so it is much better. I do still see two cases Invalid offset, but I can't see what module(s) might be causing them. I have uploaded the journal in case someone else can pick it out.
```
Mar 07 15:54:41 fedora kernel: BPF:          type_id=3067 offset=0 size=1
Mar 07 15:54:41 fedora kernel: BPF:  
Mar 07 15:54:41 fedora kernel: BPF: Invalid offset
Mar 07 15:54:41 fedora kernel: BPF: 
```

Comment 22 Chris Murphy 2025-03-11 16:46:34 UTC
Tested fixed in kernel-debug-6.14.0-0.rc6.49.fc42.x86_64, reported fixed in 6.14.0-0.rc5.2a520073e74f.47

Comment 23 Adam Williamson 2025-03-11 16:58:27 UTC
Re-opening for F42 tracking.

Comment 24 Fedora Update System 2025-03-11 16:58:54 UTC
FEDORA-2025-1b8a020e07 (kernel-6.14.0-0.rc6.49.fc42 and kernel-headers-6.14.0-0.rc6.49.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-1b8a020e07

Comment 25 Fedora Update System 2025-03-12 01:44:34 UTC
FEDORA-2025-1b8a020e07 has been pushed to the Fedora 42 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-1b8a020e07`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-1b8a020e07

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 26 Lukas Ruzicka 2025-03-12 11:37:06 UTC
The latest update of the debug kernel boots normally.

Comment 27 Fedora Update System 2025-03-15 00:43:43 UTC
FEDORA-2025-1b8a020e07 (kernel-6.14.0-0.rc6.49.fc42 and kernel-headers-6.14.0-0.rc6.49.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.