Bug 2002246

Summary: qemu -cpu max + TCG hangs booting Linux because of missing bits in CR4_RESERVED_MASK
Product: Red Hat Enterprise Linux 9 Reporter: Richard W.M. Jones <rjones>
Component: qemu-kvmAssignee: Daniel Berrangé <berrange>
qemu-kvm sub component: CPU Models QA Contact: liunana <nanliu>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: berrange, coli, jinzhao, juzhang, mrezanin, nanliu, nilal, virt-maint
Version: 9.0Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-6.2.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-17 12:24:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Richard W.M. Jones 2021-09-08 10:46:46 UTC
This bug was initially created as a copy of Bug #1999700

qemu 6.1.0 cannot boot the current kernel using TCG.  It hangs
just before entering the kernel.

$ LIBGUESTFS_BACKEND_SETTINGS=force_tcg libguestfs-test-tool
...
libguestfs: responding to serial console Device Status Report
\x1b[1;256r\x1b[256;256H\x1b[6n
Google, Inc.
Serial Graphics Adapter 01/29/21
SGABIOS $Id$ (mockbuild@) Fri Jan 29 01:55:59 UTC 2021
Term: 80x24
4 0
SeaBIOS (version 1.14.0-5.fc35)
Machine UUID 225c481d-3798-4724-9761-3d5b76e9df8f
Booting from ROM...
\x1b[2J            <---- hangs here

The next line would be the first line of output from the
kernel (using earlyprintk I think).

This requires the following patch to fix:

https://lists.nongnu.org/archive/html/qemu-devel/2021-09/msg02243.html

This bug is just filed to ensure we don't forget this patch
when we rebase qemu-kvm in RHEL 9.0.

Comment 1 John Ferlan 2021-09-16 13:05:52 UTC
Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Comment 2 Eduardo Habkost 2021-11-10 22:26:53 UTC
Merged upstream:

commit 69e3895f9d37ca39536775b13ce63e8c291427ba
Author: Daniel P. Berrangé <berrange>
Date:   Tue Aug 31 18:50:33 2021 +0100

    target/i386: add missing bits to CR4_RESERVED_MASK
    
    Booting Fedora kernels with -cpu max hangs very early in boot. Disabling
    the la57 CPUID bit fixes the problem. git bisect traced the regression to
    
      commit 213ff024a2f92020290296cb9dc29c2af3d4a221 (HEAD, refs/bisect/bad)
      Author: Lara Lazier <laramglazier>
      Date:   Wed Jul 21 17:26:50 2021 +0200
    
        target/i386: Added consistency checks for CR4
    
        All MBZ bits in CR4 must be zero. (APM2 15.5)
        Added reserved bitmask and added checks in both
        helper_vmrun and helper_write_crN.
    
        Signed-off-by: Lara Lazier <laramglazier>
        Message-Id: <20210721152651.14683-2-laramglazier>
        Signed-off-by: Paolo Bonzini <pbonzini>
    
    In this commit CR4_RESERVED_MASK is missing CR4_LA57_MASK and
    two others. Adding this lets Fedora kernels boot once again.
    
    Signed-off-by: Daniel P. Berrangé <berrange>
    Tested-by: Richard W.M. Jones <rjones>
    Message-Id: <20210831175033.175584-1-berrange>
    [Removed VMXE/SMXE, matching the commit message. - Paolo]
    Fixes: 213ff024a2 ("target/i386: Added consistency checks for CR4", 2021-07-22)
    Cc: qemu-stable
    Signed-off-by: Paolo Bonzini <pbonzini>

Comment 5 liunana 2021-12-17 09:08:14 UTC
Hi Daniel,


Would you please tell QE how to test this bug on the QE side?

Do we need some tools?

Thanks in advance.



Best regards
Liu Nana

Comment 6 Daniel Berrangé 2021-12-17 13:12:30 UTC
Use the following command line:

# /usr/libexec/qemu-kvm  -kernel /boot/vmlinuz-5.14.0-2.el9.x86_64  -append 'console=ttyS0' -display none -serial stdio -m 1000 -cpu max -accel tcg


replacing the vmlinuz path with whatever kernel version you have available on the host.

If the CR4 bug is present then this will hang and display no output, while when disabling 'la57' it will work

# /usr/libexec/qemu-kvm  -kernel /boot/vmlinuz-5.14.0-2.el9.x86_64  -append 'console=ttyS0' -display none -serial stdio -m 1000 -cpu max,-la57 -accel tcg

and display kernel boot messages (until it fails to find a rootfs)

The fixed QEMU gets rid fo the hang in the first example

Comment 7 Yanan Fu 2021-12-20 12:45:27 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 10 liunana 2021-12-28 05:57:57 UTC
(In reply to Daniel Berrangé from comment #6)
> Use the following command line:
> 
> # /usr/libexec/qemu-kvm  -kernel /boot/vmlinuz-5.14.0-2.el9.x86_64  -append
> 'console=ttyS0' -display none -serial stdio -m 1000 -cpu max -accel tcg
> 
> 
> replacing the vmlinuz path with whatever kernel version you have available
> on the host.
> 
> If the CR4 bug is present then this will hang and display no output, while
> when disabling 'la57' it will work
> 
> # /usr/libexec/qemu-kvm  -kernel /boot/vmlinuz-5.14.0-2.el9.x86_64  -append
> 'console=ttyS0' -display none -serial stdio -m 1000 -cpu max,-la57 -accel tcg
> 
> and display kernel boot messages (until it fails to find a rootfs)
> 
> The fixed QEMU gets rid fo the hang in the first example


Thank you, Daniel.


I can reproduce this with qemu-kvm-6.1.0-8.el9.x86_64 now.


Verify this bug with qemu-img-6.2.0-1.el9.x86_64.


Test Env:
    intel-jacobsville-02.khw2.lab.eng.bos.redhat.com
    kernel-5.14.0-34.el9.x86_64
    qemu-kvm-6.2.0-1.el9.x86_64

Test steps:
- Boot qemu with follows command, qemu won't hang and can boot up successfully.

# /usr/libexec/qemu-kvm -kernel /boot/vmlinuz-5.14.0-34.el9.x86_64 -append 'console=ttyS0' -display none -serial stdio -m 1000 -cpu max -accel tcg


[    0.000000] Linux version 5.14.0-34.el9.x86_64 (mockbuild.eng.bos.redhat.com) (gcc (GCC) 11.2.1 20211019 (Red Hat 11.2.1-6), GNU ld version 2.35.2-13.el9) #1 SMP PREEMPT Fri Dec 17 23:04:30 EST 2021
[    0.000000] Command line: console=ttyS0
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
[    0.000000] x86/fpu: xstate_offset[3]:  960, xstate_sizes[3]:   64
[    0.000000] x86/fpu: xstate_offset[4]: 1024, xstate_sizes[4]:   64
[    0.000000] x86/fpu: xstate_offset[9]: 2688, xstate_sizes[9]:    8
[    0.000000] x86/fpu: Enabled xstate features 0x21b, context size is 2696 bytes, using 'standard' format.
[    0.000000] signal: max sigframe size: 3632
[    0.000000] BIOS-provided physical RAM map:



Move this bug to VERIFIED now, thanks.

Comment 13 errata-xmlrpc 2022-05-17 12:24:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: qemu-kvm), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2307