Bug 1999700 - qemu -cpu max + TCG hangs booting Linux because of missing bits in CR4_RESERVED_MASK
Summary: qemu -cpu max + TCG hangs booting Linux because of missing bits in CR4_RESERV...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs
TreeView+ depends on / blocked
 
Reported: 2021-08-31 15:23 UTC by Richard W.M. Jones
Modified: 2021-09-24 20:12 UTC (History)
9 users (show)

Fixed In Version: qemu-6.1.0-5.fc35
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-24 20:12:44 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
log file from libguestfs-test-tool (12.66 KB, text/plain)
2021-08-31 15:28 UTC, Richard W.M. Jones
no flags Details
qemu log file (4.76 KB, text/plain)
2021-08-31 15:30 UTC, Richard W.M. Jones
no flags Details

Description Richard W.M. Jones 2021-08-31 15:23:50 UTC
Description of problem:

qemu 6.1.0 cannot boot the current kernel using TCG.  It hangs
just before entering the kernel.

$ LIBGUESTFS_BACKEND_SETTINGS=force_tcg libguestfs-test-tool
...
libguestfs: responding to serial console Device Status Report
\x1b[1;256r\x1b[256;256H\x1b[6n
Google, Inc.
Serial Graphics Adapter 01/29/21
SGABIOS $Id$ (mockbuild@) Fri Jan 29 01:55:59 UTC 2021
Term: 80x24
4 0
SeaBIOS (version 1.14.0-5.fc35)
Machine UUID 225c481d-3798-4724-9761-3d5b76e9df8f
Booting from ROM...
\x1b[2J            <---- hangs here

The next line would be the first line of output from the
kernel (using earlyprintk I think).

Version-Release number of selected component (if applicable):

qemu-6.1.0-4.fc36.x86_64

How reproducible:

100%

Steps to Reproduce:
1. As above.

Note that you need to apply this fix to libguestfs:
https://github.com/libguestfs/libguestfs/commit/45de287447bb18d59749fbfc1ec5072413090109
because of bug 1998820 but nothing about this bug is
caused by libguestfs, it's caused by qemu, seabios or the
kernel.

Additional info:

[edit: see comment 4]
https://people.redhat.com/~rjones/qemu-sanity-check/

Comment 1 Richard W.M. Jones 2021-08-31 15:26:34 UTC
I downgraded to qemu-6.0.0-12.fc35.x86_64 which works fine.

Comment 2 Richard W.M. Jones 2021-08-31 15:28:39 UTC
Created attachment 1819453 [details]
log file from libguestfs-test-tool

Comment 3 Richard W.M. Jones 2021-08-31 15:30:08 UTC
Created attachment 1819454 [details]
qemu log file

Comment 4 Daniel Berrangé 2021-08-31 15:41:16 UTC
(In reply to Richard W.M. Jones from comment #0)
> If only we had a testing tool that could detect this situation
> automatically.  Oh wait, we do!
> https://people.redhat.com/~rjones/qemu-sanity-check/

That *is* being run and we see it succeed on this QEMU build with vmlinuz-5.14.0-0.rc7.54.fc36.x86_64

https://osci-jenkins-1.ci.fedoraproject.org/job/fedora-ci/job/dist-git-pipeline/job/master/69097/testReport/(root)/tests/_tests_qemu_sanity_check/

so there's something libguestfs does more thoroughly that qemu-sanity-check isn't detecting.

Comment 5 Richard W.M. Jones 2021-08-31 15:54:19 UTC
Using LIBGUESTFS_APPEND=debug to add the kernel debug option
changes the messages a tiny bit (it still hangs).

SeaBIOS (version 1.14.0-5.fc35)
Machine UUID 7c57dc66-eccb-4bf5-b82d-0bfa0e8dc05f
Booting from ROM...
early console in setup code
\x1b[2J

So it looks as if it gets into some part of the kernel.

Comment 6 Daniel Berrangé 2021-08-31 16:59:08 UTC
git bisect on upstream qemu.git blames this commit:


commit 213ff024a2f92020290296cb9dc29c2af3d4a221 (HEAD, refs/bisect/bad)
Author: Lara Lazier <laramglazier>
Date:   Wed Jul 21 17:26:50 2021 +0200

    target/i386: Added consistency checks for CR4
    
    All MBZ bits in CR4 must be zero. (APM2 15.5)
    Added reserved bitmask and added checks in both
    helper_vmrun and helper_write_crN.
    
    Signed-off-by: Lara Lazier <laramglazier>
    Message-Id: <20210721152651.14683-2-laramglazier>
    Signed-off-by: Paolo Bonzini <pbonzini>


looking at what's different with libguestfs-test-tool vs qemu-sanity-check, i see the CPU model is set --cpu=max with libguestfs-test-tool.

Using

$ qemu-sanity-check -v -q /home/berrange/src/virt/qemu/build/qemu-system-x86_64  --cpu=max

gets it to fail in the same way, so this is nothing todo with libguestfs - we have a simple broken QEMU TCG impl here.

All other named CPU models I've tried appear to work fine too. Only --cpu=max is broken.

Comment 7 Philippe Mathieu-Daudé 2021-08-31 17:09:17 UTC
(In reply to Daniel Berrangé from comment #6)
> git bisect on upstream qemu.git blames this commit:
> 
> 
> commit 213ff024a2f92020290296cb9dc29c2af3d4a221 (HEAD, refs/bisect/bad)
> Author: Lara Lazier <laramglazier>
> Date:   Wed Jul 21 17:26:50 2021 +0200
> 
>     target/i386: Added consistency checks for CR4
>     
>     All MBZ bits in CR4 must be zero. (APM2 15.5)
>     Added reserved bitmask and added checks in both
>     helper_vmrun and helper_write_crN.
>     
>     Signed-off-by: Lara Lazier <laramglazier>
>     Message-Id: <20210721152651.14683-2-laramglazier>
>     Signed-off-by: Paolo Bonzini <pbonzini>
> 
> 
> looking at what's different with libguestfs-test-tool vs qemu-sanity-check,
> i see the CPU model is set --cpu=max with libguestfs-test-tool.
> 
> Using
> 
> $ qemu-sanity-check -v -q
> /home/berrange/src/virt/qemu/build/qemu-system-x86_64  --cpu=max
> 
> gets it to fail in the same way, so this is nothing todo with libguestfs -
> we have a simple broken QEMU TCG impl here.
> 
> All other named CPU models I've tried appear to work fine too. Only
> --cpu=max is broken.

Eventually related to:

commit 5b8978d8042660de35b2c67c62ffeb6b42ff441e
Author: Claudio Fontana <cfontana>
Date:   Fri Jul 23 13:29:21 2021 +0200

    i386: do not call cpudef-only models functions for max, host, base
    
    Some cpu properties have to be set only for cpu models in builtin_x86_defs,
    registered with x86_register_cpu_model_type, and not for
    cpu models "base", "max", and the subclass "host".
    
    These properties are the ones set by function x86_cpu_apply_props,
    (also including kvm_default_props, tcg_default_props),
    and the "vendor" property for the KVM and HVF accelerators.
    
    After recent refactoring of cpu, which also affected these properties,
    they were instead set unconditionally for all x86 cpus.
    
    This has been detected as a bug with Nested on AMD with cpu "host",
    as svm was not turned on by default, due to the wrongful setting of
    kvm_default_props via x86_cpu_apply_props, which set svm to "off".
    
    Rectify the bug introduced in commit "i386: split cpu accelerators"
    and document the functions that are builtin_x86_defs-only.

Comment 8 Daniel Berrangé 2021-08-31 17:10:50 UTC
> All other named CPU models I've tried appear to work fine too. Only --cpu=max is broken.

The problem is the 'la57' feature

 - Fails: --cpu max
 - Works: --cpu max,la57=off

 - Works: --cpu Skylake-Server
 - Fails: --cpu Skylake-Server,la57=on

So that cr4 patch is broken wrt 5-level paging.

Comment 9 Richard W.M. Jones 2021-08-31 18:01:18 UTC
Dan posted this patch:
https://lists.nongnu.org/archive/html/qemu-devel/2021-08/msg05468.html

Comment 10 Fedora Update System 2021-08-31 20:27:07 UTC
FEDORA-2021-b2fffa02d2 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-b2fffa02d2

Comment 11 Fedora Update System 2021-09-01 19:26:21 UTC
FEDORA-2021-b2fffa02d2 has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-b2fffa02d2`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-b2fffa02d2

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 14 Richard W.M. Jones 2021-09-08 10:47:21 UTC
Cloned as https://bugzilla.redhat.com/show_bug.cgi?id=2002246
for RHEL 9.0.

Comment 15 Fedora Update System 2021-09-10 15:37:00 UTC
FEDORA-2021-b2fffa02d2 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-b2fffa02d2

Comment 16 Fedora Update System 2021-09-10 22:11:11 UTC
FEDORA-2021-b2fffa02d2 has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-b2fffa02d2`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-b2fffa02d2

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 17 Fedora Update System 2021-09-24 20:12:44 UTC
FEDORA-2021-b2fffa02d2 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.