Bug 2346804 - grub2-2.12-24.fc43 update seems to cause initramfs unpack failure
Summary: grub2-2.12-24.fc43 update seems to cause initramfs unpack failure
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: grub2
Version: rawhide
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Nicolas Frayer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2025-02-20 13:43 UTC by Dusty Mabe
Modified: 2025-02-28 13:30 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2025-02-28 13:30:23 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
console.txt (35.90 KB, text/plain)
2025-02-20 13:44 UTC, Dusty Mabe
no flags Details

Description Dusty Mabe 2025-02-20 13:43:35 UTC
The new grub update failed CoreOS tests. None of the created images boot and fail unpacking the initramfs.

```
[    0.759308] Initramfs unpacking failed: ZSTD-compressed data is corrupt
```

Normally I wouldn't start with investigating GRUB when seeing an error message like this, but our tests are designed to only test the update in question, so GRUB was the only software that changed. Sticking with the previous version of GRUB does not yield any failed tests.

You can download an image to experiment with at: https://dustymabe.fedorapeople.org/fedora-coreos-43.20250219.dev.0-qemu.x86_64.qcow2.xz

Remember to decompress it first.

I'll attach a full console log as well.

Reproducible: Always

Comment 1 Dusty Mabe 2025-02-20 13:44:10 UTC
Created attachment 2077213 [details]
console.txt

Comment 2 Marta Lewandowska 2025-02-20 14:02:11 UTC
looks like it's failing to start systemd... could you check to see what the permissions on it are?

[    1.200453] Run /init as init process
[    1.200885] Failed to execute /init (error -26)
[    1.201323] Run /sbin/init as init process
[    1.201696] Run /etc/init as init process
[    1.202098] Run /bin/init as init process
[    1.202507] Starting init: /bin/init exists but couldn't execute it (error -26)
[    1.203142] Run /bin/sh as init process
[    1.203529] Starting init: /bin/sh exists but couldn't execute it (error -26)
[    1.204123] Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
...

Comment 3 Dusty Mabe 2025-02-20 14:41:18 UTC
Hey Marta,

I initially focused on that too, but I think this is just a symptom of not being able to unpack the initramfs (look higher up in the log):

```
[    0.759308] Initramfs unpacking failed: ZSTD-compressed data is corrupt
```

So it seems like it unpacks some of it, but not all?? Also if you run the test over and over it will fail in slightly different ways. For example sometimes init runs but then fails when trying to mount sysroot.mount.

Can we focus on the initramfs unpacking problem first?

Comment 4 Marta Lewandowska 2025-02-20 16:09:51 UTC
Ok, I didn't know that it fails at different times later on... have you tried rebuilding the initramfs? I downloaded your qcow and I'll try it later.

Comment 5 Dusty Mabe 2025-02-20 16:14:48 UTC
I haven't tried rebuilding the initramfs of the disk because the system won't boot so rebuilding the initramfs in place isn't really an option (or at least it isn't easy to do).

I can confirm that this isn't an isolated build problem. It failed in CI and also on my local system. So that's two different image builds (two different initramfs builds) in two different environments having the same results.

Also worth emphasizing here: the code that generates the initramfs hasn't changed, just GRUB.

Comment 6 Marta Lewandowska 2025-02-24 09:59:52 UTC
Hey Dusty,

There's nothing obvious in the newest GRUB that should cause this... they are all CVE fixes for OOB reads and writes (type changes), using safe math for adding, subtracting, etc. Leo was worried that one patch, which disables a bunch of filesystems during lockdown, was the culprit, but I also see your error when booting the image with SB disabled, while rpm installation with SB enabled works...

How do we test this? Is there an easy / straightforward way of creating the images with a different version of GRUB?

Comment 7 Dusty Mabe 2025-02-24 16:39:34 UTC
You can build using `COSA` [1] overriding grub with your local development RPM build by placing the RPM files in overrides/rpm/[2]. Since we are building against `rawhide` you'd want to `cosa init --branch rawhide https://github.com/coreos/fedora-coreos-config`. 

If that is too complicated reach out to me in https://matrix.to/#/#coreos:fedoraproject.org or I can also test out development builds for you.

[1] https://github.com/coreos/coreos-assembler/blob/main/docs/building-fcos.md#downloading-the-container
[2] https://github.com/coreos/coreos-assembler/blob/main/docs/working.md#using-overrides

Comment 8 Marta Lewandowska 2025-02-25 20:41:38 UTC
Thank you for the instructions. A new patch appeared upstream a couple of days ago https://lists.gnu.org/archive/html/grub-devel/2025-02/msg00115.html and it appears to fix this issue.
Nicolas built a scratch build for rawhide. It works for me...no kernel panic, machine boots. Please try it yourself, just so we are sure. If it's ok, it can land in rawhide probably tomorrow https://koji.fedoraproject.org/koji/taskinfo?taskID=129608711

Comment 9 Dusty Mabe 2025-02-26 03:58:22 UTC
The build does appear to pass all test except secureboot tests, which I assume is expected because it is a scratch build. ✔️

Comment 10 Marta Lewandowska 2025-02-28 09:58:02 UTC
Yes, that's right.

We noticed you tagged in  2.12-23, which means things are working for you now..? You will get a new build soon :)

Comment 11 Dusty Mabe 2025-02-28 13:30:23 UTC
Fixed in grub2-2.12-25.fc43


Note You need to log in before you can comment on or make changes to this bug.