grub2-2.06-117.fc40 landed in the latest Rawhide compose. In openQA testing, it seems to break boot of all BIOS installs where /boot is XFS (which Server installs use by default). This doesn't seem to affect F38 or F39, so I think it's specific to xfsprogs 6.5.0. I suspect the "fix" for #2254370 breaks things with xfsprogs 6.5.0. I'll attach a screenshot showing how the failure looks - there's an error saying /grub2/i386-pc/normal.mod is not found. This should be reproducible just by doing a default install of https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20240119.n.0/compose/Server/x86_64/iso/Fedora-Server-netinst-x86_64-Rawhide-20240119.n.0.iso (as long as the mirror you hit has today's Rawhide).
Proposing as a Beta blocker, as a conditional violation of "The release must be able host virtual guest instances of the same release" for now (conditions being: BIOS VM, XFS /boot). I haven't tested whether it affects a real box yet (will do that shortly).
Confirmed that this does also affect my test bare metal system.
(In reply to Adam Williamson from comment #0) > grub2-2.06-117.fc40 landed in the latest Rawhide compose. In openQA testing, > it seems to break boot of all BIOS installs where /boot is XFS (which Server > installs use by default). > > This doesn't seem to affect F38 or F39, so I think it's specific to xfsprogs > 6.5.0. > > I suspect the "fix" for #2254370 breaks things with xfsprogs 6.5.0. > I assume you saw my reply on the other bug, but just to copy the TL;DR here. Yes, reverting that patch will cause other XFS parsing bugs unless the original fuzzer patch (https://git.savannah.gnu.org/cgit/grub.git/commit/grub-core/fs/xfs.c?id=ef7850c757fb3dd2462a512cfa0ff19c89fcc0b1) is also reverted. In order to revert that cleanly you probably need to revert the other followup fix (https://git.savannah.gnu.org/cgit/grub.git/commit/grub-core/fs/xfs.c?id=ad7fb8e2e02bb1dd0475ead9919c1c82514d2ef8) as well. Additionally, F38 and F39 are almost certainly affected as well because the bug doesn't depend on any particular version of xfsprogs.
Well, I tested 39 and it worked.
Hi Adam, I answered Jon in bz#2254370... Nicolas reverted his patch in f38 and f39 because of that bug, as you suspected. Our tests late last week suggested that it should be reverted in rawhide as well, at least until there was another fix, but it looks like that was the wrong approach. Nicolas is out this week. Maybe we can make some progress on this anyway, or maybe we will wait until he's back. Anyway, sorry about this... we'll figure it out. ;)
Can we please put the patch back in Rawhide for now? It is breaking ~55 tests in openQA and substantially reducing our test coverage (plus, of course, affecting anyone who actually tries to do a default Server install of Rawhide on BIOS at present).
Working on it... wish we had more people...
We have untagged the build for now - https://pagure.io/releng/issue/11915 - because this is just dragging on too long and now breaking too much stuff. The openQA server base disk image got regenerated with the bug and that is causing all updates to fail gating, I'm cleaning that up now.
*** Bug 2260057 has been marked as a duplicate of this bug. ***
The patch was added back in https://bodhi.fedoraproject.org/updates/FEDORA-2024-dfbd7bf972 , so we can call this resolved. Thanks.
Very bizarrely, this bug has suddenly started affecting the F39 Server base image used by openQA. Even though xfsprogs and grub2 haven't changed at all in F39 recently. I did a network install from the release-day F39 Server netinst and it wasn't affected, though. The openQA base images are built with virt-install. Not sure what the difference is, or how on earth this happened, but maybe something for anyone else using virt-install to watch out for. I'm trying to work around it using a side build of grub2 for F39 with the patch added back...
Oh dear, my test was wrong as I tested on UEFI. Testing on BIOS, yup, I can recreate this on F39 now: just do a network install from the official F39 Server network install image and the installed system won't boot. I have no idea why this suddenly happened, but I guess we need to reapply the patch on F39 too if others can reproduce.
It's even affecting F38 too. Same deal: do a network install on BIOS from the release day F38 Server netinst image, installed system fails to boot. Theory: the kernel's involved. kernel 6.7 landed in stable for F38 and F39 within the last few days. although...that's still weird, because the network install image obviously is using the release day kernel, the kernel from the update repos is present on the installed system but we aren't *using* it...
Okay, no, we don't have to invoke magic kernels, just blame adamw... It looks like this was really just caused by the grub2 updates for F38 and F39 that match the Rawhide update that initially triggered this report: * https://bodhi.fedoraproject.org/updates/FEDORA-2024-633dc7e183 * https://bodhi.fedoraproject.org/updates/FEDORA-2024-53d986312e I initially filed negative karma for them as I was worried this would happen, but then I tested them and thought they came out fine. After they sat around for some time with lots of positive karma I pushed them both stable at the start of this week, since they would have been autopushed without my initial negative karma. I kinda forgot I'd done that. But it looks like I must have messed up the initial testing, because it seems like they *are* both subject to this bug, unfortunately.
I've tested F39 Server netinst with the following results (VM, x86 BIOS): GA iso with -updates repo (grub2-2.06-116): Fails with /grub2/i386-pc/normal.mod is not found GA iso without -updates: Works fine GA iso with grub2 < 2.06-113: Works fine GA iso with grub2 >= 2.06-113: Fails with /grub2/i386-pc/normal.mod is not found (apart from grub2 itself, also the lvm2-2.03.23 and device-mapper-1.02.197 were pulled in, but they by themselves cause no issues (verified together with grub2 < 2.06-113))
The exact grub2 commit that causes this in f39 would be https://src.fedoraproject.org/rpms/grub2/c/b601fadc1325642ec0017c3dba5aabe37550bb9a?branch=f39
As the patch was already reapplied to Rawhide, this doesn't block F40, it's an F38/F39 issue at present.
This message is a reminder that Fedora Linux 39 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 39 on 2024-11-26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '39'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 39 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
Looking at the package changelog, pretty sure this should be fixed in 2.06-118 ("fs/xfs: Re-applied the XFS directory extent parsing patch") and/or 2.06-120 ("fs/xfs: Handle non-continuous data blocks in directory extents"). I *think* 118 fixed this but re-introduced the other problem, then 120 fixed both together.