Bug 2152763 - UEFI PXE boot broken with kernels after "Bundle unicode.pf2 with images"
Summary: UEFI PXE boot broken with kernels after "Bundle unicode.pf2 with images"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: grub2
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Javier Martinez Canillas
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-13 00:48 UTC by Adam Williamson
Modified: 2022-12-31 01:16 UTC (History)
5 users (show)

Fixed In Version: grub2-2.06-72.fc38 grub2-2.06-72.fc37 grub2-2.06-59.fc36
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-23 01:19:56 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Adam Williamson 2022-12-13 00:48:58 UTC
The openQA test which tests PXE boot and install started failing on UEFI after the "Bundle unicode.pf2 with images" change - 0ccadff7a2be9fcc858e0a1cb7825beb5d81f8c6 - landed. It still works on BIOS.

The last time it passed was this run:
https://openqa.fedoraproject.org/tests/1609324
the first time it failed was this one:
https://openqa.fedoraproject.org/tests/1611319

I had a bit of trouble pinning down the difference at first, but it was actually when this update landed:
https://bodhi.fedoraproject.org/updates/FEDORA-2022-04d670e731
the bootloader used in the test is installed on the "support server", a parallel-running test which provides the server end of the PXE setup (and a bunch of other stuff for other tests), and runs on F37. The last time the UEFI test passed, the support server installed grub2-2.06-63.fc37 ; the first time it failed, the support server installed grub2-2.06-67.fc37.

I did a couple of scratch builds to narrow down the trigger. If I have the support server install a scratch build with my "Go back to installing unicode.pf2" commit reverted, the test still fails. But if I have the support server install a scratch build of commit 558410c2d90bc1c28affccebf15a9c4ce82a109f , "Don't obsolete the tools package with extra/efi" - the commit immediately *before* "Bundle unicode.pf2 with images" - the test passes. So that change definitely looks to be the trigger.

The failure mode is that the system that's attempting to boot via PXE shows the boot menu, but when it selects the entry that should boot the installer, instead some errors appear briefly:

----

error: ../../grub-core/kern/fs.c:170:invalid file name 'fedora/vmlinuz'.
error: ../../grub-core/loader/i386/efi/linux.c:258:you need to load the kernel first.

Press any key to continue...

----

and then the boot menu is shown again.

The setup used in the test is based on https://fedoraproject.org/wiki/QA:Testcase_Boot_Methods_Pxeboot . The support server creates a directory `/var/lib/tftpboot/fedora` and a /etc/dnsmasq.conf with this content (not all of this is related to tftp, some of it is for other purposes):

----

domain=test.openqa.fedoraproject.org
dhcp-range=172.16.2.150,172.16.2.199
dhcp-option=option:router,172.16.2.2
enable-tftp
tftp-root=/var/lib/tftpboot
tftp-secure
dhcp-match=set:efi-x86_64,option:client-arch,7
dhcp-match=set:efi-x86_64,option:client-arch,9
dhcp-match=set:bios,option:client-arch,0
dhcp-match=set:efi-aarch64,option:client-arch,11
dhcp-match=set:ppc64,option:client-arch,12
dhcp-match=set:ppc64,option:client-arch,13
dhcp-boot=tag:efi-x86_64,"shim.efi"
dhcp-boot=tag:bios,"pxelinux.0"
dhcp-boot=tag:efi-aarch64,"grubaa64.efi"
dhcp-boot=tag:ppc64,"boot/grub2/powerpc-ieee1275/core.elf"

----

it installs shim-x64 and grub2-efi-x64 to a temporary install root (so it has the files available), then copies /boot/efi/EFI/fedora/{shim.efi,grubx64.efi} from the temporary install root to /var/lib/tftpboot . It creates a /var/lib/tftpboot/grub.cfg with this content:

----

function load_video {
  insmod efi_gop
  insmod efi_uga
  insmod ieee1275_fb
  insmod vbe
  insmod vga
  insmod video_bochs
  insmod video_cirrus
}

load_video
set gfxpayload=keep
insmod gzio

menuentry "Install Fedora 64-bit"  --class fedora --class gnu-linux --class gnu --class os {
  linuxefi fedora/vmlinuz ip=dhcp inst.ks=file:///ks.cfg
  initrdefi fedora/initrd.img
}

----

it downloads vmlinuz and initrd.img from the images/pxeboot directory of the compose being tested - e.g. https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20221212.n.0/compose/Everything/x86_64/os/images/pxeboot/ , for today's compose - to /var/lib/tftpboot/fedora . It embeds a kickstart in the initramfs (though we never reach a point where that would matter). It sets the right permissions and SELinux contexts on /var/lib/tftpboot , and sets up the firewall. Then it starts up dnsmasq.service .

The 'client' end test starts up once all that is done, and just does a network boot - that's all it's supposed to do.

I can't myself see exactly what it is about the font change that causes this problem, but it definitely seems to.

Comment 1 Adam Williamson 2022-12-13 00:51:54 UTC
Note, I'm not going to propose this as a release blocker even though it sort of violates "The installer must be able to use all supported local and remote package and installer sources". The *client* end of that, which is what we're really blocking on, seems fine - PXE boot and install works so long as the *server* end is working okay. This is a bug in the *server* end, which isn't really in the scope of the criteria; in production the server would very likely not be a Fedora machine at all, we just use a Fedora PXE server in our tests because, well, that's what we have handy and know how to configure.

Comment 2 Adam Williamson 2022-12-13 01:47:58 UTC
Here are the relevant log messages (filtered by the IP address of the client) from when this *works OK*:

[adamw@xps13k tmp]$ journalctl --file var/log/journal/de8e9d25a88e4e9dbf575cf63eac7e24/system.journal -u dnsmasq | grep 186
Nov 23 17:47:50 support.test.openqa.fedoraproject.org dnsmasq-dhcp[1340]: DHCPOFFER(ens4) 172.16.2.186 52:54:00:12:01:0d
Nov 23 17:47:50 support.test.openqa.fedoraproject.org dnsmasq-dhcp[1340]: DHCPREQUEST(ens4) 172.16.2.186 52:54:00:12:01:0d
Nov 23 17:47:50 support.test.openqa.fedoraproject.org dnsmasq-dhcp[1340]: DHCPACK(ens4) 172.16.2.186 52:54:00:12:01:0d
Nov 23 17:47:50 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: error 8 User aborted the transfer received from 172.16.2.186
Nov 23 17:47:50 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: sent /var/lib/tftpboot/shim.efi to 172.16.2.186
Nov 23 17:47:50 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: sent /var/lib/tftpboot/shim.efi to 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: sent /var/lib/tftpboot/grubx64.efi to 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/grub.cfg-01-52-54-00-12-01-0d not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/grub.cfg-AC1002BA not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/grub.cfg-AC1002B not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/grub.cfg-AC1002 not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/grub.cfg-AC100 not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/grub.cfg-AC10 not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/grub.cfg-AC1 not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/grub.cfg-AC not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/grub.cfg-A not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: sent /var/lib/tftpboot/grub.cfg to 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/EFI/fedora/x86_64-efi/command.lst not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/EFI/fedora/x86_64-efi/fs.lst not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/EFI/fedora/x86_64-efi/crypto.lst not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/EFI/fedora/x86_64-efi/terminal.lst not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: sent /var/lib/tftpboot/grub.cfg to 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/EFI/fedora/x86_64-efi/ieee1275_fb.mod not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/EFI/fedora/x86_64-efi/vbe.mod not found for 172.16.2.186
Nov 23 17:47:52 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: file /var/lib/tftpboot/EFI/fedora/x86_64-efi/vga.mod not found for 172.16.2.186
Nov 23 17:47:59 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: sent /var/lib/tftpboot/fedora/vmlinuz to 172.16.2.186
Nov 23 17:48:34 support.test.openqa.fedoraproject.org dnsmasq-tftp[1340]: sent /var/lib/tftpboot/fedora/initrd.img to 172.16.2.186

and here's when this *fails*:

Dec 07 03:30:17 support.test.openqa.fedoraproject.org dnsmasq-dhcp[1341]: DHCPOFFER(ens4) 172.16.2.174 52:54:00:12:01:01
Dec 07 03:30:17 support.test.openqa.fedoraproject.org dnsmasq-dhcp[1341]: DHCPREQUEST(ens4) 172.16.2.174 52:54:00:12:01:01
Dec 07 03:30:17 support.test.openqa.fedoraproject.org dnsmasq-dhcp[1341]: DHCPACK(ens4) 172.16.2.174 52:54:00:12:01:01
Dec 07 03:30:17 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: error 8 User aborted the transfer received from 172.16.2.174
Dec 07 03:30:17 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: sent /var/lib/tftpboot/shim.efi to 172.16.2.174
Dec 07 03:30:18 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: sent /var/lib/tftpboot/shim.efi to 172.16.2.174
Dec 07 03:30:19 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: sent /var/lib/tftpboot/grubx64.efi to 172.16.2.174
Dec 07 03:30:20 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: file /var/lib/tftpboot/grub.cfg-01-52-54-00-12-01-01 not found for 172.16.2.174
Dec 07 03:30:20 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: file /var/lib/tftpboot/grub.cfg-AC1002AE not found for 172.16.2.174
Dec 07 03:30:20 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: file /var/lib/tftpboot/grub.cfg-AC1002A not found for 172.16.2.174
Dec 07 03:30:20 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: file /var/lib/tftpboot/grub.cfg-AC1002 not found for 172.16.2.174
Dec 07 03:30:20 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: file /var/lib/tftpboot/grub.cfg-AC100 not found for 172.16.2.174
Dec 07 03:30:20 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: file /var/lib/tftpboot/grub.cfg-AC10 not found for 172.16.2.174
Dec 07 03:30:20 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: file /var/lib/tftpboot/grub.cfg-AC1 not found for 172.16.2.174
Dec 07 03:30:20 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: file /var/lib/tftpboot/grub.cfg-AC not found for 172.16.2.174
Dec 07 03:30:20 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: file /var/lib/tftpboot/grub.cfg-A not found for 172.16.2.174
Dec 07 03:30:20 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: sent /var/lib/tftpboot/grub.cfg to 172.16.2.174
Dec 07 03:30:20 support.test.openqa.fedoraproject.org dnsmasq-tftp[1341]: sent /var/lib/tftpboot/grub.cfg to 172.16.2.174

Comment 3 Fedora Update System 2022-12-21 22:54:51 UTC
FEDORA-2022-ccfe59f43a has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2022-ccfe59f43a

Comment 4 Fedora Update System 2022-12-21 22:54:52 UTC
FEDORA-2022-290f715c04 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-290f715c04

Comment 5 Fedora Update System 2022-12-22 01:17:21 UTC
FEDORA-2022-290f715c04 has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-290f715c04`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-290f715c04

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 6 Fedora Update System 2022-12-22 01:41:33 UTC
FEDORA-2022-ccfe59f43a has been pushed to the Fedora 37 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-ccfe59f43a`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-ccfe59f43a

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 7 Fedora Update System 2022-12-23 01:19:56 UTC
FEDORA-2022-ccfe59f43a has been pushed to the Fedora 37 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 8 Adam Williamson 2022-12-23 01:36:14 UTC
Thanks for the fix. I'll confirm it with Rawhide testing tomorrow.

Comment 9 Adam Williamson 2022-12-23 19:17:46 UTC
Yup, fix does indeed look good, thanks a lot.

Comment 10 Fedora Update System 2022-12-31 01:16:29 UTC
FEDORA-2022-290f715c04 has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.