Bug 2354064

Summary: edk-ovmf-20250221-8.fc42 breaks grub chainloading
Product: [Fedora] Fedora Reporter: Petr Janda <pjanda>
Component: edk2Assignee: Paolo Bonzini <pbonzini>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 42CC: berrange, crobinso, kraxel, marehrauer, pbonzini, philmd, virt-maint
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Petr Janda 2025-03-21 14:35:08 UTC
From grub2 it is not possible chainload another bootloader in VM when using OVMF UEFI firmware in it. The chainloaded bootloader hangs.





Reproducible: Always

Steps to Reproduce:
Started in bug 2351559 but windows is not necessary to reproduce it.

1. Install F42 as a hypervizor
2. use virt-manager to create a new UEFI VM with /usr/share/edk2/ovmf/OVMF_CODE_4M.secboot.qcow2 firmware, boot from F42 iso media
  install Fedora-42 to HDD, default installation, no disk encryption
3. reboot VM into installed system
4. edit /boot/grub2/grub.cfg
cat << EOF >> /boot/grub2/grub.cfg
menuentry 'Chainload fedora' --class windows --class os {
        insmod part_gpt
        insmod fat
        set root='hd0,gpt1'
        chainloader /EFI/fedora/grubx64.efi
}
EOF
5. reboot
6. choose 'Chainload fedora' entry in grub
Actual Results:  
system hangs with black screen and static cursor in left top corner


Expected Results:  
Another instance of grub is chainloaded, the same menu (because there is the same config file) appears again, it is possible to choose Fedora and boot system up.


Aditional info:
* With /usr/share/edk2/ovmf replaced by files from edk2-ovmf-20241117-5.fc40 it works as expected.

* Simmilarly replacing /usr/share/edk2/ovmf by files from edk-ovmf-20250221-8.fc42 on F40 hypervizor causes grub to hang.

* it is a bit synthetic reproducer, original issue is broken dualboot with windows in VM (bug 2351559)

* in case yoy add lines
pager=0
debug=all
to newly added menuentry the debug output ends by "loader/efi/chainloader.c:826:chain: booting via entry point"
indicating that a next bootloader is loaded into memory and should be executed

Comment 1 Gerd Hoffmann 2025-03-24 08:03:44 UTC
Does using /usr/share/edk2/ovmf/OVMF_CODE_4M.qcow2 work?

The secure boot ovmf builds are more strict in f42.  Some NX bugs
and NULL pointer dereferences are not tolerated any more.  Most
likely there is a page fault logged on the serial console.

See https://fedoraproject.org/wiki/Changes/Edk2Security for details.

Comment 2 Petr Janda 2025-03-24 12:28:12 UTC
Thank you, Gerd.

Yes, with OVMF_CODE_4M.qcow2 it works (with SB disabled via firmware menu as well)

Yes a page fault is logged to the serial console (thank you for a hint)

exception below

!!!! X64 Exception Type - 0E(#PF - Page-Fault)  CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000011  I:1 R:0 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP  - 000000007BB12000, CS  - 0000000000000038, RFLAGS - 0000000000210206
RAX  - 000000007CC6AD18, RCX - 000000007CC6AD18, RDX - 000000007E9EC018
RBX  - 0000000000000000, RSP - 000000007EF530B8, RBP - 000000007EF53260
RSI  - 000000007BB12000, RDI - 000000007C2E3980
R8   - 000000000000000D, R9  - 00000000000003F8, R10 - 000000007D65C018
R11  - 00000000843627C0, R12 - 0000000000000000, R13 - 0000000000000000
R14  - 000000007CB2B6E8, R15 - 000000007CB2B6F0
DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
GS   - 0000000000000030, SS  - 0000000000000030
CR0  - 0000000080010033, CR2 - 000000007BB12000, CR3 - 000000007EC01000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000007E9E1000 0000000000000047, LDTR - 0000000000000000
IDTR - 000000007E2C1018 0000000000000FFF,   TR - 0000000000000000
FXSAVE_STATE - 000000007EF52D10
!!!! Find image based on IP(0x7AA96549) (No PDB)  (ImageBase=000000007A9FD1C0, EntryPoint=000000007A9FE1C0) !!!!

Comment 5 marehrauer 2025-04-19 16:06:09 UTC
This problem goes beyond chainloading. After switching to f42 (Silverblue and Kinoite) none of my opensuse virtual machines will run and I can no longer install (using virt-manager) any opensuse distro (leap 15, leap 16, slowroll, and tumbleweed all hang). A similar problem occurred with ovmf on opensuse distros in March but it affected all other distros running UEFI with secure boot including fedora and windows 11. They issued a fix to ovmf and the problem was resolved. In all cases the symptoms are the same. The system hangs displaying "Loading initial ramdisk". I have worked around the problem by creating a distrobox running f41, installing virt-manager, exporting virt-manager and then layering qemu-kvm in silverblue 42 and kinoite 42.

Comment 6 marehrauer 2025-04-22 15:00:02 UTC
Maybe my previous comment was not clear. I received a copy of bug 1240300 which I guess I was supposed to interpret to mean my previous comment was a duplicate of that bug.  However, that bug refers to qemu-ovmf which is an opensuse package and I am well aware of that bug and the fix.  What I was trying to say is that the same or similar problem occurs with edk2-ovmf-20250221-8.fc42.  I can NOT run any opensuse distro  as a guest with a fedora host running edk-ovmf-20250221-8.fc42.  However, if I use a fedora host running edk2-20241117-5.fc41 all my opensuse virtual machines work fine.

Comment 7 Gerd Hoffmann 2025-04-23 11:05:45 UTC
(In reply to marehrauer from comment #6)
> I can NOT run any opensuse distro  as a guest
> with a fedora host running edk-ovmf-20250221-8.fc42.  However, if I use a
> fedora host running edk2-20241117-5.fc41 all my opensuse virtual machines
> work fine.

Don't use the secure boot builds (i.e. /usr/share/edk2/ovmf/OVMF_CODE.fd or /usr/share/edk2/ovmf/OVMF_CODE_4M.qcow2).  See comment #1 for details.

Failure reproduces (sles15sp6).  Boot runs into NX faults.  I'm wondering how suse managed to get their shim.efi (version 15.8) signed with a boot chain which is not NX-clean.  Microsoft requires this since 2022 (see https://techcommunity.microsoft.com/blog/hardwaredevcenter/updated-uefi-signing-requirements/1062916).  Handed out exceptions for a while for linux, due to the kernel changes landing upstream took a while.  But as far I know this stopped rougly one year ago, after the patches finally landed in the 6.9 kernel.

Comment 8 marehrauer 2025-04-23 13:04:08 UTC
Thanks for your help Gerd. Avoiding the secure boot builds fixes the problem and I can avoid reverting to f41.