This is a crazy obscure bug, feel free to keep it low priority for now. Basically, in some of the RHCOS flows we end up using: - a rhel7 (openshift 3.11) bare metal host system - a fedora 36 userspace run as a container (with qemu-kvm and edk2-ovmf notably) in there called - a rhel8 guest When we updated coreos-assembler to f36, this new version of edk2-ovmf started triggering kernel panics; for more information see https://github.com/openshift/os/issues/862 Abbreviated trace is: ``` Jun 22 14:51:51.587132 kernel: Call Trace: Jun 22 14:51:51.587140 kernel: apic_bsp_setup+0x62/0x80 Jun 22 14:51:51.587145 kernel: x86_late_time_init+0x29/0x39 Jun 22 14:51:51.587150 kernel: start_kernel+0x486/0x542 Jun 22 14:51:51.587155 kernel: secondary_startup_64_no_verify+0xc2/0xcb Jun 22 14:51:51.587160 kernel: ---[ end trace 02abcd4e6b23d7dc ]--- ``` Downgrading to f35 edk2-ovmf fixes it. Now, we know running rhel7 as a host system doesn't make sense; we're going to fix that. I can't reproduce this when the host is rhel8 or current fedora. It also doesn't reproduce when the guest is fedora. Haven't tried rhel9 in this yet.
(Really wish BZ let me edit comments) > in there called ... https://github.com/coreos/coreos-assembler
Probably not a directly related cause, but please note that QEMU and libvirt and other related virt packages have all explicitly dropped support for RHEL-7. Mostly this impacts userspace package build deps, but also means we no longer do any level of testing, and may assume we have newer host kernel features either at build time or runtime (though offhand I don't recall us doing that yet).
> Abbreviated trace is: > > ``` Jun 22 14:51:51.587041 kernel: WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1547 setup_local_APIC+0x269/0x370 > Jun 22 14:51:51.587132 kernel: Call Trace: > Jun 22 14:51:51.587140 kernel: apic_bsp_setup+0x62/0x80 > Jun 22 14:51:51.587145 kernel: x86_late_time_init+0x29/0x39 > Jun 22 14:51:51.587150 kernel: start_kernel+0x486/0x542 > Jun 22 14:51:51.587155 kernel: secondary_startup_64_no_verify+0xc2/0xcb > Jun 22 14:51:51.587160 kernel: ---[ end trace 02abcd4e6b23d7dc ]--- > ``` ovmf switched from acpi timer to local apic for time keeping recently, so the local apic is in a different state now at kernel boot time. Given this happens on RHEL-7 kernels only my first guess would be this change triggers a lapic emulation bug in the kernel.
This message is a reminder that Fedora Linux 36 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '36'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 36 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
Fedora Linux 36 entered end-of-life (EOL) status on 2023-05-16. Fedora Linux 36 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed.