Fedora aarch64 QEMU emulation slowed down starting from Fedora 33. Reproducible: Always Steps to Reproduce: Compare Fedora-Cloud 32 and 33 aarch64 images in qemu-system-aarch64 (any version including 9.1) and running command in qemu-aarch64-static. Actual Results: Fedora 32: Startup finished in 4.518s (kernel) + 9.704s (initrd) + 13.090s (userspace) = 27.313s Fedora 33: Startup finished in 4.755s (kernel) + 19.132s (initrd) + 17.163s (userspace) = 41.051s Even more significant slowdown in qemu-aarch64-static, when running 'yum repolist' F32: real 0m1.455s F33: real 0m10.371s Could Pointer Authentication & Branch Target Enablement F33 feature be the cause of a significant slowdown in Fedora 33+ aarch64 QEMU emulation? https://bugzilla.redhat.com/show_bug.cgi?id=1847148
> Compare Fedora-Cloud 32 and 33 aarch64 images in qemu-system-aarch64 (any > version including 9.1) and running command in qemu-aarch64-static. So just to be clear here, running a F-32 VM is fast and F-33 is slow on a F-41/qemu 9.1 host? As in the slow down is on the VM? Can you add commands etc for easier reproduction?
Adding jlinton for PAC/BTI clarification.
What's the exact version of qemu?
Steps to reproduce this (only qemu-user-static-aarch64 because steps for booting more complicated): # uname -a Linux rawhide 6.11.0-63.fc42.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Sep 15 17:14:12 UTC 2024 x86_64 GNU/Linux # rpm -q qemu-user-static-aarch64 qemu-user-static-aarch64-9.1.0-2.fc42.x86_64 (slowdown also in any previous QEMU versions) Download aarch64 images https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/32/Cloud/aarch64/images/Fedora-Cloud-Base-32-1.6.aarch64.raw.xz https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/33/Cloud/aarch64/images/Fedora-Cloud-Base-33-1.2.aarch64.raw.xz # xz -kd Fedora-Cloud-Base-32-1.6.aarch64.raw.xz # losetup /dev/loop0 Fedora-Cloud-Base-32-1.6.aarch64.raw # kpartx -va /dev/loop0 # mount /dev/mapper/loop0p2 /mnt/ # chroot /mnt # time yum repolist repo id repo name fedora Fedora 32 - aarch64 fedora-cisco-openh264 Fedora 32 openh264 (From Cisco) - aarch64 fedora-modular Fedora Modular 32 - aarch64 updates Fedora 32 - aarch64 - Updates updates-modular Fedora Modular 32 - aarch64 - Updates real 0m1.677s user 0m1.561s sys 0m0.063s # exit # umount /mnt # kpartx -d /dev/loop0 # losetup -d /dev/loop0 # xz -kd Fedora-Cloud-Base-33-1.2.aarch64.raw.xz # losetup /dev/loop0 Fedora-Cloud-Base-33-1.2.aarch64.raw # kpartx -va /dev/loop0 # mount /dev/mapper/loop0p2 /mnt/ # chroot /mnt # time yum repolist repo id repo name fedora Fedora 33 - aarch64 fedora-cisco-openh264 Fedora 33 openh264 (From Cisco) - aarch64 fedora-modular Fedora Modular 33 - aarch64 updates Fedora 33 - aarch64 - Updates updates-modular Fedora Modular 33 - aarch64 - Updates real 0m11.415s user 0m11.243s sys 0m0.067s # exit # umount /mnt # kpartx -d /dev/loop0 # losetup -d /dev/loop0
Arm architectural features are usually designed with an eye towards how they can be efficiently implemented in HW, this means that if they are emulated in software the overhead can frequently be significant. Which is why say, the arm software models have flags which enable or disable individual feature emulation. For example the PAC algorithm can be changed to something that is more friendly to software emulation. Obviously then, as more features are enabled then the software must emulate more behavior. Frequently picking a simpler v8.0 cpu target will be considerably faster than picking one with all the architectural features enabled and emulated fully because the overhead of doing additional PAC computations, or checking page properties and BTI landing pads, or other security checks (ex:MTE!) which are largely transparent in HW now need additional software validation which slows the overall emulation. So, for the end user they have to decide if they aren't running in a HW accelerated environment like KVM/etc whether these security features are worth the emulation overhead and enable/disable them as needed. I don't believe there is a 2-10x slowdown going from F32-F33 on HW so that should help to narrow down the problem.
Since this is binfmt, you have to adjust the cpu/emulation selection via an environment variable since by design it selects the emulation for max compatibility, which is intentional. If you adjust that, your problem here will go away. As this is all by design, I think this bug should be closed as NOTABUG
With QEMU_CPU=cortex-a76 binfmt runs much faster but when Fedora-Cloud image booted in vm with cortex-a76 CPU startup of 33 two times slower than 32.
a76 is a v8.2 core, you might try something like a cortex-a57. That said, while the existence of PAC/BTI might add some extra overhead, its possible the boot sequence is something else causing a slowdown. With an a57, is the F32/33 repoquery time closer to the original 1.67 time?
Oh! libvirt isn't going to use an environment variable for cpu selection, that needs to be setup using libvirt/etc specific methods.