+++ This bug was initially created as a clone of Bug #1965638 +++ Description of problem: I'm running perf testing pipeline and around qemu-6.0 it started failing to boot one scenario with specific libvirt tunings. The same setting works well with the distro qemu-kvm, but it's failing with the weekly rebase as well as with the self-compiled upstream qemu. Version-Release number of selected component (if applicable): * qemu-kvm-6.0.50-16.el8.wrb210526.x86_64 * Upstream qemu from git a38553a5978052f1f4bf1b5cdf59d77049cd6170 How reproducible: Always Steps to Reproduce: 1. Create a VM using the attached XML file Actual results: Guest kernel panics on boot (very rarely it survives the first boot) Expected results: It should boot and survive the testing (uperf) Additional info: Very rarely it boots, usually survives a fio or linpack tests, but it never survives uperf test. As the guest uses hugepage memory, let me attach a full script to reproduce the environment. In host I can see a couple of kvm disabled prefctr messages, not sure whether they are related (first ones are related to the virt-customize and such commands, the virsh create starts on 435.39...): [ 85.804405] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activatin [ 86.823983] kvm [3903]: vcpu0, guest rIP: 0xffffffffafc69da8 disabled perfctr wrmsr: 0xc0010007 data 0xffff [ 91.285515] kvm [3998]: vcpu0, guest rIP: 0xffffffff82069da8 disabled perfctr wrmsr: 0xc0010007 data 0xffff [ 124.154584] kvm [4101]: vcpu0, guest rIP: 0xffffffffb7869da8 disabled perfctr wrmsr: 0xc0010007 data 0xffff [ 228.949021] kvm [4195]: vcpu0, guest rIP: 0xffffffff88669da8 disabled perfctr wrmsr: 0xc0010007 data 0xffff [ 263.621820] kvm [4283]: vcpu0, guest rIP: 0xffffffff8c269da8 disabled perfctr wrmsr: 0xc0010007 data 0xffff [ 435.393321] virbr0: port 2(vnet0) entered blocking state [ 435.398655] virbr0: port 2(vnet0) entered disabled state [ 435.447117] device vnet0 entered promiscuous mode [ 435.458181] virbr0: port 2(vnet0) entered blocking state [ 435.463497] virbr0: port 2(vnet0) entered listening state [ 437.499981] virbr0: port 2(vnet0) entered learning state [ 439.547989] virbr0: port 2(vnet0) entered forwarding state [ 439.553485] virbr0: topology change detected, propagating [ 445.112348] kvm [4480]: vcpu0, guest rIP: 0xffffffffa9c69da8 disabled perfctr wrmsr: 0xc0010007 data 0xffff --- Additional comment from Lukas Doktor on 2021-05-28 16:50:37 UTC --- --- Additional comment from Lukas Doktor on 2021-05-28 16:52:44 UTC --- --- Additional comment from Lukas Doktor on 2021-05-28 16:55:42 UTC --- Note I have tried booting a similar machine using the latest Fedoras kernel install on guest: http://pastebin.test.redhat.com/965893 And I have a couple of panics here: http://pastebin.test.redhat.com/967400 http://pastebin.test.redhat.com/967403 (available for month) --- Additional comment from Lukas Doktor on 2021-06-01 09:14:08 UTC --- Hello guys I got to bisect it reliably (2/2) to the commit: f5cc5a5c168674f84bf061cdb307c2d25fba5448 is the first bad commit commit f5cc5a5c168674f84bf061cdb307c2d25fba5448 Author: Claudio Fontana <cfontana> Date: Mon Mar 22 14:27:40 2021 +0100 i386: split cpu accelerators from cpu.c, using AccelCPUClass i386 is the first user of AccelCPUClass, allowing to split cpu.c into: cpu.c cpuid and common x86 cpu functionality host-cpu.c host x86 cpu functions and "host" cpu type kvm/kvm-cpu.c KVM x86 AccelCPUClass hvf/hvf-cpu.c HVF x86 AccelCPUClass tcg/tcg-cpu.c TCG x86 AccelCPUClass Signed-off-by: Claudio Fontana <cfontana> Reviewed-by: Alex Bennée <alex.bennee> Reviewed-by: Richard Henderson <richard.henderson> [claudio]: Rebased on commit b8184135 ("target/i386: allow modifying TCG phys-addr-bits") Signed-off-by: Claudio Fontana <cfontana> Message-Id: <20210322132800.7470-5-cfontana> Signed-off-by: Paolo Bonzini <pbonzini> Bisect log: # bad: [c8616fc7670b884de5f74d2767aade224c1c5c3a] Merge remote-tracking branch 'remotes/philmd/tags/gitlab-ci-20210527' into staging # good: [d90f154867ec0ec22fd719164b88716e8fd48672] Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210504' into staging git bisect start 'c8616fc7670b884de5f74d2767aade224c1c5c3a' 'd90f154867ec0ec22fd719164b88716e8fd48672' # bad: [068479e1e1d680ac246f12aaaacf2c5e1a0bd97b] hw/ppc/spapr.c: Extract MMU mode error reporting into a function git bisect bad 068479e1e1d680ac246f12aaaacf2c5e1a0bd97b # bad: [052b66e7211af64964e005126eaa3c944b296b0e] pc-bios/s390-ccw: Fix inline assembly for older versions of Clang git bisect bad 052b66e7211af64964e005126eaa3c944b296b0e # good: [a5ccdccc97d6e0d75282ede5b866cf694e9602b0] Merge remote-tracking branch 'remotes/kraxel/tags/vga-20210510-pull-request' into staging git bisect good a5ccdccc97d6e0d75282ede5b866cf694e9602b0 # good: [c30a0757f094c107e491820e3d35224eb68859c7] target/riscv: Fix the RV64H decode comment git bisect good c30a0757f094c107e491820e3d35224eb68859c7 # bad: [5ecfb76ccc056eb6127e44268e475827ae73b9e0] configure: fix detection of gdbus-codegen git bisect bad 5ecfb76ccc056eb6127e44268e475827ae73b9e0 # bad: [30493a030ff154fc9ea5f91a848c6ec7a018efa1] i386: split seg_helper into user-only and sysemu parts git bisect bad 30493a030ff154fc9ea5f91a848c6ec7a018efa1 # bad: [9ea057dc641b150ecbfd45acfe18fe043641a551] accel-cpu: make cpu_realizefn return a bool git bisect bad 9ea057dc641b150ecbfd45acfe18fe043641a551 # bad: [f5cc5a5c168674f84bf061cdb307c2d25fba5448] i386: split cpu accelerators from cpu.c, using AccelCPUClass git bisect bad f5cc5a5c168674f84bf061cdb307c2d25fba5448 # good: [0ac2b197430ebf19b5575ea48fe3b76d62110ab9] target/i386: Split out do_fsave, do_frstor, do_fxsave, do_fxrstor git bisect good 0ac2b197430ebf19b5575ea48fe3b76d62110ab9 # first bad commit: [f5cc5a5c168674f84bf061cdb307c2d25fba5448] i386: split cpu accelerators from cpu.c, using AccelCPUClass --- Additional comment from John Ferlan on 2021-06-07 16:15:53 UTC --- Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage. --- Additional comment from Lukas Doktor on 2021-06-08 08:03:10 UTC --- Hello guys, I noticed the pipeline seems to be working well now so I bisected the fix up to: 4db4385a7ab6512e9af08305f5725b26c8a980ee is the first bad commit commit 4db4385a7ab6512e9af08305f5725b26c8a980ee Author: Claudio Fontana <cfontana> Date: Thu Jun 3 14:30:01 2021 +0200 i386: run accel_cpu_instance_init as post_init This fixes host and max cpu initialization, by running the accel cpu initialization only after all instance init functions are called for all X86 cpu subclasses. The bug this is fixing is related to the "max" and "host" i386 cpu subclasses, which set cpu->max_features, which is then used at cpu realization time. In order to properly split the accel-specific max features code that needs to be executed at cpu instance initialization time, we cannot call the accel cpu initialization at the end of the x86 base class initialization, or we will have no way to specialize "max features" cpu behavior, overriding the "max" cpu class defaults, and checking for the "max features" flag itself. This patch moves the accel-specific cpu instance initialization to after all x86 cpu instance code has been executed, including subclasses, so that proper initialization of cpu "host" and "max" can be restored. Fixes: f5cc5a5c ("i386: split cpu accelerators from cpu.c,"...) Cc: Eduardo Habkost <ehabkost> Cc: Paolo Bonzini <pbonzini> Signed-off-by: Claudio Fontana <cfontana> Message-Id: <20210603123001.17843-3-cfontana> Signed-off-by: Paolo Bonzini <pbonzini> target/i386/cpu.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) bisect run success I think it makes sense and hopefully this issue is resolved (we just need to make sure this patch is in after rebase) --- Additional comment from Amnon Ilan on 2021-06-22 12:21:33 UTC --- Setting TestOnly for now Mirek, When is the next rebase planned? --- Additional comment from Amnon Ilan on 2021-07-08 17:18:56 UTC --- Eduardo, Can you have a look? --- Additional comment from Eduardo Habkost on 2021-07-08 19:39:20 UTC --- I don't understand what we're supposed to do with this BZ. The bug is already fixed upstream and it was never present in an official RHEL-8 build or even in a released upstream version. What's the right state for this BZ? I don't think it makes sense to keep it open. --- Additional comment from Lukas Doktor on 2021-07-09 04:27:12 UTC --- I'm not sure either. Depends whether it requires additional QA coverage to prevent such regression or not. --- Additional comment from Eduardo Habkost on 2021-07-09 15:12:55 UTC --- (In reply to Lukas Doktor from comment #10) > I'm not sure either. Depends whether it requires additional QA coverage to > prevent such regression or not. A question for QE and our maintainers: let's assume we want QE to verify this bug after we officially rebase to 6.1 (in 8.6). What's the right status of this BZ if we want to do that? --- Additional comment from Eduardo Habkost on 2021-07-14 14:40:09 UTC --- Setting to POST as documented at https://gitlab.cee.redhat.com/virt/virt-wiki/-/wikis/KVM/DevelopersInfo/PreRebaseProcess
Test PASS with configuration in attachments and guest works well. Test Env: qemu-kvm-6.1.0-1.el9.x86_64 5.14.0-10.el9.x86_64 Guest:5.13.0-0.rc7.51.el9.x86_64 Move this bug to verified now, thanks. Best regards Liu Nana
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (new packages: qemu-kvm), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2307