Bug 2361849 - GRUB updates do not update modules in /boot, causing crashes due to version mismatches
Summary: GRUB updates do not update modules in /boot, causing crashes due to version m...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kiwi
Version: 41
Hardware: aarch64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Neal Gompa
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2025-04-23 12:27 UTC by Hector Martin
Modified: 2025-05-03 17:49 UTC (History)
11 users (show)

Fixed In Version: kiwi-10.2.18-1.fc42 kiwi-10.2.18-1.fc41 kiwi-10.2.18-1.el10_1 kiwi-10.2.18-1.fc40 kiwi-10.2.18-1.el10_0 kiwi-10.2.18-1.el9
Clone Of:
Environment:
Last Closed: 2025-04-29 20:40:09 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github OSInside kiwi issues 2790 0 None open kiwi leaves orphanes grub modules in /boot/grub2 2025-04-27 16:43:25 UTC
Github OSInside kiwi pull 2791 0 None open Drop copying GRUB2 modules to /boot with Secure Boot UEFI images 2025-04-28 00:02:12 UTC

Description Hector Martin 2025-04-23 12:27:52 UTC
grub package updates fail to update modules in `/boot/grub2/arm64-efi` (at least on arm64/Fedora Asahi Remix), even though it updates the core image in /boot/efi/EFI/fedora/grubaa64.efi. This silently works most of the time, until a version incompatibility problem happens and it blows up when the wrong combination of core image and module set are installed.

This happens on at least F40 and F41. All of my Fedora Asahi Remix systems (and everyone else I asked) had ancient modules in /boot/grub2/arm64-efi, so it seems this has always been broken. Haven't tested F42 yet.

After an update on two F40 systems, GRUB crashed on boot like this:

"Synchronous Abort" handler, esr 0x96000005, far 0x0
elr: ffffffff68634760 lr : ffffffff68634ce0 (reloc)
elr: 000001073c8bd760 lr : 000001073c8bdce0
x0 : 0000000000000000 x1 : 000001073dc93250
x2 : 000001073dc83854 x3 : 0000000000000000
x4 : 000001073c8bdcc0 x5 : 000001073c8b3000
x6 : 000001073c8c4000 x7 : 000001073dc91ec8
x8 : 000001073c904869 x9 : 0000000000001000
x10: 0000000000000ff8 x11: 0000000001392288
x12: 000001073dc94f80 x13: 00000107d4289000
x14: 000001073d703a20 x15: 0000000000000000
x16: 00000107d42bdadc x17: 0000000000000000
x18: 0000000000000011 x19: 000001073e518000
x20: 0000000000000000 x21: 00000107d02f57a0
x22: 000001073e52c038 x23: 00000107d02f57a0
x24: 00000107d0246570 x25: 000001073e52c038
x26: 000001073e52c100 x27: 000001073e52c108
x28: 000001073e52c110 x29: 00000107d02459a0

Code: f9001bff 12800020 b90047e0 f9400fe0 (f9400000) 
UEFI image [0x000001073e7a6000:0x000001073e7bdfff] '/\EFI\BOOT\fbaa64.efi'
UEFI image [0x000001073dc72000:0x000001073e072fff] '/\EFI\fedora\grubaa64.efi'

Updating the modules fixed it.

Reproducible: Always

Steps to Reproduce:
1. Install Fedora Asahi Remix
2. Upgrade grub
Actual Results:
/boot/grub2/arm64-efi has install time contents, and is never updated

Expected Results:
/boot/grub2/arm64-efi is updated

Additional Information:
Manual workaround to fix a broken system (from some kind of rescue boot): rsync -av --delete /usr/lib/grub/arm64-efi/ /boot/grub2/arm64-efi/

Comment 1 Neal Gompa 2025-04-23 13:20:53 UTC
This is probably an issue for all architectures, but it's particularly bad for AArch64 systems. I can see this on Fedora KDE AArch64 on a Raspberry Pi 400 too.

Comment 2 Marta Lewandowska 2025-04-23 14:57:19 UTC
Hector, which version of GRUB are we talking about? You're installing the released version and then updating? Do you install -modules and then update them?
thanks.

Comment 3 Janne Grunau 2025-04-23 19:28:36 UTC
from a semi affected system (modules in /boot/grub/arm64-efi with timestamps from the initial installation (2023-08-22) but no boot failures):
> 2023-08-22T15:33:52+0000 SUBDEBUG Installed: grub2-efi-aa64-1:2.06-95.fc38.aarch64
> 2023-08-22T15:35:50+0000 SUBDEBUG Installed: grub2-efi-aa64-modules-1:2.06-95.fc38.noarch
> 2023-10-11T21:10:29+0000 SUBDEBUG Upgraded: grub2-efi-aa64-1:2.06-95.fc38.aarch64
> 2023-10-11T21:10:29+0000 SUBDEBUG Upgraded: grub2-efi-aa64-modules-1:2.06-95.fc38.noarch
> 2023-11-20T19:48:41+0000 SUBDEBUG Upgraded: grub2-efi-aa64-1:2.06-102.fc38.aarch64
> 2023-11-20T19:48:47+0000 SUBDEBUG Upgraded: grub2-efi-aa64-modules-1:2.06-102.fc38.noarch
> 2024-03-24T20:38:08+0000 SUBDEBUG Upgraded: grub2-efi-aa64-1:2.06-110.fc39.aarch64
> 2024-03-24T20:38:08+0000 SUBDEBUG Upgraded: grub2-efi-aa64-modules-1:2.06-110.fc39.noarch
> 2024-04-29T17:21:32+0200 SUBDEBUG Upgraded: grub2-efi-aa64-1:2.06-118.fc39.aarch64
> 2024-04-29T17:21:32+0200 SUBDEBUG Upgraded: grub2-efi-aa64-modules-1:2.06-118.fc39.noarch
> 2024-05-05T13:40:48+0200 SUBDEBUG Upgraded: grub2-efi-aa64-1:2.06-120.fc39.aarch64
> 2024-05-05T13:41:32+0200 SUBDEBUG Upgraded: grub2-efi-aa64-modules-1:2.06-120.fc39.noarch
> 2024-06-01T15:07:44+0200 SUBDEBUG Upgraded: grub2-efi-aa64-1:2.06-121.fc40.aarch64
> 2024-06-01T15:07:45+0200 SUBDEBUG Upgraded: grub2-efi-aa64-modules-1:2.06-121.fc40.noarch

Modules in /boot/grub2/arm64-efi/ have 2023-04-12 as date which is probably the date of the files in grub2-efi-aa64-modules-1:2.06-95.fc38.noarch

Comment 4 Janne Grunau 2025-04-23 19:31:09 UTC
From, the same system:
> 2024-12-17T13:55:18+01:00 pk-offline-update[1305]: package updating        grub2-efi-aa64-1:2.12-15.fc41.aarch64 (updates)
> 2024-12-17T13:55:23+01:00 pk-offline-update[1305]: package updating        grub2-efi-aa64-modules-1:2.12-15.fc41.noarch (updates)
> 2024-12-17T13:55:29+01:00 pk-offline-update[1305]: package cleanup        grub2-efi-aa64-1:2.06-123.fc40.aarch64 (installed)
> 2024-12-17T13:55:32+01:00 pk-offline-update[1305]: package cleanup        grub2-efi-aa64-modules-1:2.06-123.fc40.noarch (installed)
> 2025-03-14T21:55:16+01:00 pk-offline-update[1323]: package updating        grub2-efi-aa64-1:2.12-20.fc41.aarch64 (updates)
> 2025-03-14T21:55:21+01:00 pk-offline-update[1323]: package updating        grub2-efi-aa64-modules-1:2.12-20.fc41.noarch (updates)
> 2025-03-14T21:55:30+01:00 pk-offline-update[1323]: package cleanup        grub2-efi-aa64-1:2.12-15.fc41.aarch64 (installed)
> 2025-03-14T21:55:33+01:00 pk-offline-update[1323]: package cleanup        grub2-efi-aa64-modules-1:2.12-15.fc41.noarch (installed)

Comment 5 Hector Martin 2025-04-24 00:27:59 UTC
As Janne mentioned, this affects all versions of GRUB going back to at least what was current in F38. Simply doing normal system upgrades never updates the modules.

Comment 6 Marta Lewandowska 2025-04-24 08:51:44 UTC
The "right" way to do this is to run grub2-install, which for a while was not permitted on EFI at all because of SB, but works now if you --force. The problem is that this creates a new GRUB image, which is incompatible with SB on systems that care about that. On aarch, you might not care yet, but it's coming.

I'm guessing that the Synchronous Abort that Hector reported, which I can't reproduce on a vanilla f40 aarch VM, is happening because you need additional modules to boot Asahi that are not built-in to the binary... is that right? if so, which ones?

Comment 7 Janne Grunau 2025-04-24 21:03:53 UTC
Booting works without grub modules. I removed the modules on a apple silicon notebook without connected USB devices and it still boots. That probably explains why we haven't seen many issues so far.

Hector, are the affected systems special in a way noticeable by grub? I can only think of connected USB devices (I guess this should be abstrected away by u-boot's efi implementation), filesystems types on storage devices or special grub configuration.

Why are the modules present at all if they can't be used with secure boot and possibly/likely broken after grub efi image updates?

Comment 8 Hector Martin 2025-04-26 08:47:22 UTC
Nothing special other than some USB drives containing LVM PVs. The volumes themselves are Ceph OSDs so nothing GRUB could interpret, and are not used for booting. In addition, two machines have identical peripherals connected and only one failed, with the same exact installed GRUB package version (the only difference was the version of the ancient modules). I'm also pretty sure I tried disconnecting the drives and the problem persisted. So I don't think it has anything to do with USB devices.

GRUB config is also boring:

GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_DISABLE_RECOVERY=true
GRUB_CMDLINE_LINUX_DEFAULT="rootflags=subvol=root"
GRUB_DISTRIBUTOR="Fedora Linux Asahi Remix"
GRUB_ENABLE_BLSCFG=true
GRUB_GFXMODE=auto
GRUB_TERMINAL=""
GRUB_TIMEOUT=5
GRUB_TIMEOUT_STYLE=menu

The internal NVMe partition layout is also bog standard for Asahi Linux.

There is one partitioning difference: The two machines that failed have a separate /home btrfs subvolume, while the one that survived does not. They also have slightly diferent /boot partition sizes. I believe this is a change that happened at some point. The machines that broke have an install date of Dec 19 2023 and I believe the image is from the same date, while the one that survived has an install date of May 16 2023 and I believe the image was built on May 9. So in this case, the machine with the *older* GRUB modules survived, the *newer* (but still wildly out of date) modules broke.

There *is* a difference in GRUB config. The machine that survived has this:

GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_DISABLE_RECOVERY=true
GRUB_CMDLINE_LINUX_DEFAULT="scsi_mod.use_blk_mq=1 multipath=off"
GRUB_DISTRIBUTOR="Fedora Linux Asahi Remix"
GRUB_ENABLE_BLSCFG=true
GRUB_GFXMODE=auto
GRUB_TERMINAL="console"
GRUB_TIMEOUT=5
GRUB_TIMEOUT_STYLE=hidden

The default for GRUB_TERMINAL is `gfxterm`. So that means the machines that failed were trying to use a graphical terminal, while the one that survived was trying to use the native UEFI I/O. In gfxterm mode, a bunch of extra things happen that wouldn't otherwise, including loading the all_video module, which then loads all video modules as dependencies. I'm guessing not all of those are bundled into the EFI image. So that's quite likely what made it explode.

If the intent is that only built-in modules are (can be) used, then there should not be any modules in /boot at all. Heck, if this is supposed to be a secure-boot-capable image, it shouldn't even *have* the ability to load extra modules from /boot.

Comment 9 Janne Grunau 2025-04-26 11:45:36 UTC
Just switching to gfxterm and menu timeout style does not reproduce the SError on M2 Macbook Pro without USB devices. The grub modules are from:
> 2023-07-31T17:11:13+0000 SUBDEBUG Installed: grub2-efi-aa64-modules-1:2.06-95.fc38.noarch

2023-07-31 should be the image date.

"efi_gop" is not explicitly included in the image in https://src.fedoraproject.org/rpms/grub2/blob/rawhide/f/grub.macros but it seems that grub-mkimage includes it via "all_video". The installed grub efi image seems to include it.

I found a way to produce a SError with that system. Change to the grub command line and type "ls (hd0," and press tab for completion. This happens independently of gfxterm/console. On a system with the modules removed this prints a list of all partitions on the internal nvme. Since that includes file system information I suspect grub2 could try to load all file system modules for the unknown apfs partitions. I suspect the easiest way to reproduce this on other platforms would be adding an exfat (module should be added to the image) or a ntfs (no module like apfs) partition.

That's not helpful to determine why the affected systems tried to load a module.

I think we only have two options:
1. the modules must not be present in /boot/grub2
2. the modules in /boot/grub2 must be updated with the grub2 (efi) image

Comment 10 Janne Grunau 2025-04-27 16:00:02 UTC
This is a problem created by kiwi. Kiwi copies the grub2 modules "helpfully" to the boot partition: https://github.com/OSInside/kiwi/blob/eeea9e6405ceb9d0aa346523b0d8232139ab50c8/kiwi/bootloader/config/grub2.py#L559

This was explicitly added in https://github.com/OSInside/kiwi/commit/c6e80a13c95a7f61b0df7334204d2ef0d19a5cbd for EFI secure boot in 2016. That copies the the modules explicitly from /usr/lib/grub/arm64-efi/ to /boot/grub2 with rsync.

I'll open a kiwi issue and will forcefully delete the files in the mean time.

This still leaves the question open how to deal with the broken existing installations which are ticking timebombs.

Comment 11 Hector Martin 2025-04-27 16:45:32 UTC
Sounds like it's time for a postinstall script in one of the platform metapackages to wipe it?

Comment 12 Janne Grunau 2025-04-27 21:35:32 UTC
changing component to kiwi

Comment 13 Neal Gompa 2025-04-28 00:02:12 UTC
Upstream pull request: https://github.com/OSInside/kiwi/pull/2791

Comment 14 Fedora Update System 2025-04-29 10:35:08 UTC
FEDORA-EPEL-2025-abc2389dd4 (kiwi-10.2.18-1.el9) has been submitted as an update to Fedora EPEL 9.
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-abc2389dd4

Comment 15 Fedora Update System 2025-04-29 10:35:58 UTC
FEDORA-2025-b9ae42c8d7 (kiwi-10.2.18-1.fc40) has been submitted as an update to Fedora 40.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-b9ae42c8d7

Comment 16 Fedora Update System 2025-04-29 10:37:00 UTC
FEDORA-EPEL-2025-1516ba47ea (kiwi-10.2.18-1.el10_1) has been submitted as an update to Fedora EPEL 10.1.
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-1516ba47ea

Comment 17 Fedora Update System 2025-04-29 10:37:24 UTC
FEDORA-EPEL-2025-a6bd816644 (kiwi-10.2.17-1.el10_0) has been submitted as an update to Fedora EPEL 10.0.
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2025-a6bd816644

Comment 18 Fedora Update System 2025-04-29 10:37:58 UTC
FEDORA-2025-caba97efbd (kiwi-10.2.18-1.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-caba97efbd

Comment 19 Fedora Update System 2025-04-29 10:38:57 UTC
FEDORA-2025-7cf125b833 (kiwi-10.2.18-1.fc41) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-7cf125b833

Comment 20 Fedora Update System 2025-04-29 20:40:09 UTC
FEDORA-2025-caba97efbd (kiwi-10.2.18-1.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 21 Fedora Update System 2025-04-30 01:38:21 UTC
FEDORA-2025-7cf125b833 (kiwi-10.2.18-1.fc41) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 22 Fedora Update System 2025-04-30 01:56:54 UTC
FEDORA-EPEL-2025-1516ba47ea (kiwi-10.2.18-1.el10_1) has been pushed to the Fedora EPEL 10.1 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 23 Fedora Update System 2025-04-30 02:01:03 UTC
FEDORA-2025-b9ae42c8d7 (kiwi-10.2.18-1.fc40) has been pushed to the Fedora 40 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 24 Fedora Update System 2025-04-30 02:13:41 UTC
FEDORA-EPEL-2025-a6bd816644 (kiwi-10.2.18-1.el10_0) has been pushed to the Fedora EPEL 10.0 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 25 Fedora Update System 2025-04-30 02:30:00 UTC
FEDORA-EPEL-2025-abc2389dd4 (kiwi-10.2.18-1.el9) has been pushed to the Fedora EPEL 9 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 26 Janne Grunau 2025-05-03 17:49:24 UTC
asahi-platform-metapackage-core-0-23 was released with a %posttrans script to remove stale grub modules from /boot/grub2/arm64-efi/ of existing installs: https://pagure.io/fedora-asahi/asahi-platform-metapackage/c/ffe251c9a0c51d13a61a7aa60a22825d6981e9aa?branch=main


Note You need to log in before you can comment on or make changes to this bug.