Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2001388

Summary:	Guest kernel panic when booting with qemu-6.0.50 using custom libvirt XML
Product:	Red Hat Enterprise Linux 9	Reporter:	John Ferlan <jferlan>
Component:	qemu-kvm	Assignee:	Virtualization Maintenance <virt-maint>
qemu-kvm sub component:	CPU Models	QA Contact:	liunana <nanliu>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	unspecified
Priority:	unspecified	CC:	ailan, chayang, coli, jinzhao, juzhang, ldoktor, mrezanin, nanliu, pbonzini, virt-maint
Version:	9.0	Keywords:	TestOnly, Triaged
Target Milestone:	rc	Flags:	pm-rhel: mirror+
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	qemu-kvm-6.1.0-1.el9	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1965638	Environment:
Last Closed:	2022-05-17 12:24:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1965638, 1997408
Bug Blocks:

Description John Ferlan 2021-09-05 23:39:07 UTC

+++ This bug was initially created as a clone of Bug #1965638 +++

Description of problem:
I'm running perf testing pipeline and around qemu-6.0 it started failing to boot one scenario with specific libvirt tunings. The same setting works well with the distro qemu-kvm, but it's failing with the weekly rebase as well as with the self-compiled upstream qemu.

Version-Release number of selected component (if applicable):
* qemu-kvm-6.0.50-16.el8.wrb210526.x86_64
* Upstream qemu from git a38553a5978052f1f4bf1b5cdf59d77049cd6170

How reproducible:
Always

Steps to Reproduce:
1. Create a VM using the attached XML file

Actual results:
Guest kernel panics on boot (very rarely it survives the first boot)

Expected results:
It should boot and survive the testing (uperf)

Additional info:
Very rarely it boots, usually survives a fio or linpack tests, but it never survives uperf test.

As the guest uses hugepage memory, let me attach a full script to reproduce the environment.

In host I can see a couple of kvm disabled prefctr messages, not sure whether they are related (first ones are related to the virt-customize and such commands, the virsh create starts on 435.39...):

[   85.804405] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activatin
[   86.823983] kvm [3903]: vcpu0, guest rIP: 0xffffffffafc69da8 disabled perfctr wrmsr: 0xc0010007 data 0xffff
[   91.285515] kvm [3998]: vcpu0, guest rIP: 0xffffffff82069da8 disabled perfctr wrmsr: 0xc0010007 data 0xffff
[  124.154584] kvm [4101]: vcpu0, guest rIP: 0xffffffffb7869da8 disabled perfctr wrmsr: 0xc0010007 data 0xffff
[  228.949021] kvm [4195]: vcpu0, guest rIP: 0xffffffff88669da8 disabled perfctr wrmsr: 0xc0010007 data 0xffff
[  263.621820] kvm [4283]: vcpu0, guest rIP: 0xffffffff8c269da8 disabled perfctr wrmsr: 0xc0010007 data 0xffff
[  435.393321] virbr0: port 2(vnet0) entered blocking state
[  435.398655] virbr0: port 2(vnet0) entered disabled state
[  435.447117] device vnet0 entered promiscuous mode
[  435.458181] virbr0: port 2(vnet0) entered blocking state
[  435.463497] virbr0: port 2(vnet0) entered listening state
[  437.499981] virbr0: port 2(vnet0) entered learning state
[  439.547989] virbr0: port 2(vnet0) entered forwarding state
[  439.553485] virbr0: topology change detected, propagating
[  445.112348] kvm [4480]: vcpu0, guest rIP: 0xffffffffa9c69da8 disabled perfctr wrmsr: 0xc0010007 data 0xffff

--- Additional comment from Lukas Doktor on 2021-05-28 16:50:37 UTC ---



--- Additional comment from Lukas Doktor on 2021-05-28 16:52:44 UTC ---



--- Additional comment from Lukas Doktor on 2021-05-28 16:55:42 UTC ---

Note I have tried booting a similar machine using the latest Fedoras kernel install on guest:

    http://pastebin.test.redhat.com/965893

And I have a couple of panics here:

    http://pastebin.test.redhat.com/967400
    http://pastebin.test.redhat.com/967403

(available for month)

--- Additional comment from Lukas Doktor on 2021-06-01 09:14:08 UTC ---

Hello guys I got to bisect it reliably (2/2) to the commit:

f5cc5a5c168674f84bf061cdb307c2d25fba5448 is the first bad commit
commit f5cc5a5c168674f84bf061cdb307c2d25fba5448
Author: Claudio Fontana <cfontana>
Date:   Mon Mar 22 14:27:40 2021 +0100

    i386: split cpu accelerators from cpu.c, using AccelCPUClass
    
    i386 is the first user of AccelCPUClass, allowing to split
    cpu.c into:
    
    cpu.c            cpuid and common x86 cpu functionality
    host-cpu.c       host x86 cpu functions and "host" cpu type
    kvm/kvm-cpu.c    KVM x86 AccelCPUClass
    hvf/hvf-cpu.c    HVF x86 AccelCPUClass
    tcg/tcg-cpu.c    TCG x86 AccelCPUClass
    
    Signed-off-by: Claudio Fontana <cfontana>
    Reviewed-by: Alex Bennée <alex.bennee>
    Reviewed-by: Richard Henderson <richard.henderson>
    
    [claudio]:
    Rebased on commit b8184135 ("target/i386: allow modifying TCG phys-addr-bits")
    
    Signed-off-by: Claudio Fontana <cfontana>
    Message-Id: <20210322132800.7470-5-cfontana>
    Signed-off-by: Paolo Bonzini <pbonzini>


Bisect log:
# bad: [c8616fc7670b884de5f74d2767aade224c1c5c3a] Merge remote-tracking branch 'remotes/philmd/tags/gitlab-ci-20210527' into staging
# good: [d90f154867ec0ec22fd719164b88716e8fd48672] Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210504' into staging
git bisect start 'c8616fc7670b884de5f74d2767aade224c1c5c3a' 'd90f154867ec0ec22fd719164b88716e8fd48672'
# bad: [068479e1e1d680ac246f12aaaacf2c5e1a0bd97b] hw/ppc/spapr.c: Extract MMU mode error reporting into a function
git bisect bad 068479e1e1d680ac246f12aaaacf2c5e1a0bd97b
# bad: [052b66e7211af64964e005126eaa3c944b296b0e] pc-bios/s390-ccw: Fix inline assembly for older versions of Clang
git bisect bad 052b66e7211af64964e005126eaa3c944b296b0e
# good: [a5ccdccc97d6e0d75282ede5b866cf694e9602b0] Merge remote-tracking branch 'remotes/kraxel/tags/vga-20210510-pull-request' into staging
git bisect good a5ccdccc97d6e0d75282ede5b866cf694e9602b0
# good: [c30a0757f094c107e491820e3d35224eb68859c7] target/riscv: Fix the RV64H decode comment
git bisect good c30a0757f094c107e491820e3d35224eb68859c7
# bad: [5ecfb76ccc056eb6127e44268e475827ae73b9e0] configure: fix detection of gdbus-codegen
git bisect bad 5ecfb76ccc056eb6127e44268e475827ae73b9e0
# bad: [30493a030ff154fc9ea5f91a848c6ec7a018efa1] i386: split seg_helper into user-only and sysemu parts
git bisect bad 30493a030ff154fc9ea5f91a848c6ec7a018efa1
# bad: [9ea057dc641b150ecbfd45acfe18fe043641a551] accel-cpu: make cpu_realizefn return a bool
git bisect bad 9ea057dc641b150ecbfd45acfe18fe043641a551
# bad: [f5cc5a5c168674f84bf061cdb307c2d25fba5448] i386: split cpu accelerators from cpu.c, using AccelCPUClass
git bisect bad f5cc5a5c168674f84bf061cdb307c2d25fba5448
# good: [0ac2b197430ebf19b5575ea48fe3b76d62110ab9] target/i386: Split out do_fsave, do_frstor, do_fxsave, do_fxrstor
git bisect good 0ac2b197430ebf19b5575ea48fe3b76d62110ab9
# first bad commit: [f5cc5a5c168674f84bf061cdb307c2d25fba5448] i386: split cpu accelerators from cpu.c, using AccelCPUClass

--- Additional comment from John Ferlan on 2021-06-07 16:15:53 UTC ---

Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

--- Additional comment from Lukas Doktor on 2021-06-08 08:03:10 UTC ---

Hello guys, I noticed the pipeline seems to be working well now so I bisected the fix up to:

4db4385a7ab6512e9af08305f5725b26c8a980ee is the first bad commit
commit 4db4385a7ab6512e9af08305f5725b26c8a980ee
Author: Claudio Fontana <cfontana>
Date:   Thu Jun 3 14:30:01 2021 +0200

    i386: run accel_cpu_instance_init as post_init
    
    This fixes host and max cpu initialization, by running the accel cpu
    initialization only after all instance init functions are called for all
    X86 cpu subclasses.
    
    The bug this is fixing is related to the "max" and "host" i386 cpu
    subclasses, which set cpu->max_features, which is then used at cpu
    realization time.
    
    In order to properly split the accel-specific max features code that
    needs to be executed at cpu instance initialization time,
    
    we cannot call the accel cpu initialization at the end of the x86 base
    class initialization, or we will have no way to specialize
    "max features" cpu behavior, overriding the "max" cpu class defaults,
    and checking for the "max features" flag itself.
    
    This patch moves the accel-specific cpu instance initialization to after
    all x86 cpu instance code has been executed, including subclasses,
    
    so that proper initialization of cpu "host" and "max" can be restored.
    
    Fixes: f5cc5a5c ("i386: split cpu accelerators from cpu.c,"...)
    Cc: Eduardo Habkost <ehabkost>
    Cc: Paolo Bonzini <pbonzini>
    Signed-off-by: Claudio Fontana <cfontana>
    Message-Id: <20210603123001.17843-3-cfontana>
    Signed-off-by: Paolo Bonzini <pbonzini>

 target/i386/cpu.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)
bisect run success

I think it makes sense and hopefully this issue is resolved (we just need to make sure this patch is in after rebase)

--- Additional comment from Amnon Ilan on 2021-06-22 12:21:33 UTC ---

Setting TestOnly for now
Mirek, When is the next rebase planned?

--- Additional comment from Amnon Ilan on 2021-07-08 17:18:56 UTC ---

Eduardo, Can you have a look?

--- Additional comment from Eduardo Habkost on 2021-07-08 19:39:20 UTC ---

I don't understand what we're supposed to do with this BZ.  The bug is already fixed upstream and it was never present in an official RHEL-8 build or even in a released upstream version.

What's the right state for this BZ?  I don't think it makes sense to keep it open.

--- Additional comment from Lukas Doktor on 2021-07-09 04:27:12 UTC ---

I'm not sure either. Depends whether it requires additional QA coverage to prevent such regression or not.

--- Additional comment from Eduardo Habkost on 2021-07-09 15:12:55 UTC ---

(In reply to Lukas Doktor from comment #10)
> I'm not sure either. Depends whether it requires additional QA coverage to
> prevent such regression or not.

A question for QE and our maintainers: let's assume we want QE to verify this bug after we officially rebase to 6.1 (in 8.6).  What's the right status of this BZ if we want to do that?

--- Additional comment from Eduardo Habkost on 2021-07-14 14:40:09 UTC ---

Setting to POST as documented at https://gitlab.cee.redhat.com/virt/virt-wiki/-/wikis/KVM/DevelopersInfo/PreRebaseProcess

Comment 2 liunana 2021-11-02 14:35:43 UTC

Test PASS with configuration in attachments and guest works well.

Test Env:
    qemu-kvm-6.1.0-1.el9.x86_64
    5.14.0-10.el9.x86_64
Guest:5.13.0-0.rc7.51.el9.x86_64


Move this bug to verified now, thanks.



Best regards
Liu Nana

Comment 6 errata-xmlrpc 2022-05-17 12:24:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: qemu-kvm), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2307