Description of problem: qemu 6.1.0 cannot boot the current kernel using TCG. It hangs just before entering the kernel. $ LIBGUESTFS_BACKEND_SETTINGS=force_tcg libguestfs-test-tool ... libguestfs: responding to serial console Device Status Report \x1b[1;256r\x1b[256;256H\x1b[6n Google, Inc. Serial Graphics Adapter 01/29/21 SGABIOS $Id$ (mockbuild@) Fri Jan 29 01:55:59 UTC 2021 Term: 80x24 4 0 SeaBIOS (version 1.14.0-5.fc35) Machine UUID 225c481d-3798-4724-9761-3d5b76e9df8f Booting from ROM... \x1b[2J <---- hangs here The next line would be the first line of output from the kernel (using earlyprintk I think). Version-Release number of selected component (if applicable): qemu-6.1.0-4.fc36.x86_64 How reproducible: 100% Steps to Reproduce: 1. As above. Note that you need to apply this fix to libguestfs: https://github.com/libguestfs/libguestfs/commit/45de287447bb18d59749fbfc1ec5072413090109 because of bug 1998820 but nothing about this bug is caused by libguestfs, it's caused by qemu, seabios or the kernel. Additional info: [edit: see comment 4] https://people.redhat.com/~rjones/qemu-sanity-check/
I downgraded to qemu-6.0.0-12.fc35.x86_64 which works fine.
Created attachment 1819453 [details] log file from libguestfs-test-tool
Created attachment 1819454 [details] qemu log file
(In reply to Richard W.M. Jones from comment #0) > If only we had a testing tool that could detect this situation > automatically. Oh wait, we do! > https://people.redhat.com/~rjones/qemu-sanity-check/ That *is* being run and we see it succeed on this QEMU build with vmlinuz-5.14.0-0.rc7.54.fc36.x86_64 https://osci-jenkins-1.ci.fedoraproject.org/job/fedora-ci/job/dist-git-pipeline/job/master/69097/testReport/(root)/tests/_tests_qemu_sanity_check/ so there's something libguestfs does more thoroughly that qemu-sanity-check isn't detecting.
Using LIBGUESTFS_APPEND=debug to add the kernel debug option changes the messages a tiny bit (it still hangs). SeaBIOS (version 1.14.0-5.fc35) Machine UUID 7c57dc66-eccb-4bf5-b82d-0bfa0e8dc05f Booting from ROM... early console in setup code \x1b[2J So it looks as if it gets into some part of the kernel.
git bisect on upstream qemu.git blames this commit: commit 213ff024a2f92020290296cb9dc29c2af3d4a221 (HEAD, refs/bisect/bad) Author: Lara Lazier <laramglazier> Date: Wed Jul 21 17:26:50 2021 +0200 target/i386: Added consistency checks for CR4 All MBZ bits in CR4 must be zero. (APM2 15.5) Added reserved bitmask and added checks in both helper_vmrun and helper_write_crN. Signed-off-by: Lara Lazier <laramglazier> Message-Id: <20210721152651.14683-2-laramglazier> Signed-off-by: Paolo Bonzini <pbonzini> looking at what's different with libguestfs-test-tool vs qemu-sanity-check, i see the CPU model is set --cpu=max with libguestfs-test-tool. Using $ qemu-sanity-check -v -q /home/berrange/src/virt/qemu/build/qemu-system-x86_64 --cpu=max gets it to fail in the same way, so this is nothing todo with libguestfs - we have a simple broken QEMU TCG impl here. All other named CPU models I've tried appear to work fine too. Only --cpu=max is broken.
(In reply to Daniel Berrangé from comment #6) > git bisect on upstream qemu.git blames this commit: > > > commit 213ff024a2f92020290296cb9dc29c2af3d4a221 (HEAD, refs/bisect/bad) > Author: Lara Lazier <laramglazier> > Date: Wed Jul 21 17:26:50 2021 +0200 > > target/i386: Added consistency checks for CR4 > > All MBZ bits in CR4 must be zero. (APM2 15.5) > Added reserved bitmask and added checks in both > helper_vmrun and helper_write_crN. > > Signed-off-by: Lara Lazier <laramglazier> > Message-Id: <20210721152651.14683-2-laramglazier> > Signed-off-by: Paolo Bonzini <pbonzini> > > > looking at what's different with libguestfs-test-tool vs qemu-sanity-check, > i see the CPU model is set --cpu=max with libguestfs-test-tool. > > Using > > $ qemu-sanity-check -v -q > /home/berrange/src/virt/qemu/build/qemu-system-x86_64 --cpu=max > > gets it to fail in the same way, so this is nothing todo with libguestfs - > we have a simple broken QEMU TCG impl here. > > All other named CPU models I've tried appear to work fine too. Only > --cpu=max is broken. Eventually related to: commit 5b8978d8042660de35b2c67c62ffeb6b42ff441e Author: Claudio Fontana <cfontana> Date: Fri Jul 23 13:29:21 2021 +0200 i386: do not call cpudef-only models functions for max, host, base Some cpu properties have to be set only for cpu models in builtin_x86_defs, registered with x86_register_cpu_model_type, and not for cpu models "base", "max", and the subclass "host". These properties are the ones set by function x86_cpu_apply_props, (also including kvm_default_props, tcg_default_props), and the "vendor" property for the KVM and HVF accelerators. After recent refactoring of cpu, which also affected these properties, they were instead set unconditionally for all x86 cpus. This has been detected as a bug with Nested on AMD with cpu "host", as svm was not turned on by default, due to the wrongful setting of kvm_default_props via x86_cpu_apply_props, which set svm to "off". Rectify the bug introduced in commit "i386: split cpu accelerators" and document the functions that are builtin_x86_defs-only.
> All other named CPU models I've tried appear to work fine too. Only --cpu=max is broken. The problem is the 'la57' feature - Fails: --cpu max - Works: --cpu max,la57=off - Works: --cpu Skylake-Server - Fails: --cpu Skylake-Server,la57=on So that cr4 patch is broken wrt 5-level paging.
Dan posted this patch: https://lists.nongnu.org/archive/html/qemu-devel/2021-08/msg05468.html
FEDORA-2021-b2fffa02d2 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-b2fffa02d2
FEDORA-2021-b2fffa02d2 has been pushed to the Fedora 35 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-b2fffa02d2` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-b2fffa02d2 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
Cloned as https://bugzilla.redhat.com/show_bug.cgi?id=2002246 for RHEL 9.0.
FEDORA-2021-b2fffa02d2 has been pushed to the Fedora 35 stable repository. If problem still persists, please make note of it in this bug report.