Bug 1738741

Summary: L2 guest hit kernel panic when do L1->L1 live migration on PML-enabled intel host
Product: Red Hat Enterprise Linux 8 Reporter: Li Xiaohui <xiaohli>
Component: kernelAssignee: Paolo Bonzini <pbonzini>
kernel sub component: KVM QA Contact: Qinghua Cheng <qcheng>
Status: CLOSED ERRATA Docs Contact: Parth Shah <pashah>
Severity: unspecified    
Priority: high CC: bdas, chayang, coli, ebarrera, hhei, hhuang, jinzhao, juzhang, knoel, leidwang, pashah, pbonzini, rbalakri, ribarry, virt-maint, vkuznets, xfu, xiaohli, ymankad
Version: 8.1Keywords: Reopened, TestBlocker, TestOnly
Target Milestone: rc   
Target Release: 8.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1745449 (view as bug list) Environment:
Last Closed: 2020-04-28 16:23:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1749495    
Bug Blocks: 1558351, 1745449, 1746622    
Attachments:
Description Flags
vmcore-dmesg.txt none

Description Li Xiaohui 2019-08-08 03:30:05 UTC
Description of problem:
L2 guest hit kernel panic when do L1->L1 local migration


Version-Release number of selected component (if applicable):
host info: kernel-4.18.0-128.el8.x86_64 & qemu-img-2.12.0-83.module+el8.1.0+3852+0ba8aef0.x86_64


How reproducible:
3/3


Steps to Reproduce:
1.ensure nested and ept enabled on intel host
2.boot a guest with "-cpu host" on host
3.in L1 guest, boot a L2 guest with "-cpu $Module_name"
4.with L2 guest running in L1, do L1->L1 local live migration


Actual results:
after migration, L2 guest hit kernel panic in L1(L1 still works well), please see attachment for vmcore-dmesg.txt


Expected results:
L1&L2 guest work well after migration.


Additional info:

Comment 1 Li Xiaohui 2019-08-08 03:30:31 UTC
Created attachment 1601677 [details]
vmcore-dmesg.txt

Comment 5 Paolo Bonzini 2019-09-23 22:45:34 UTC
xiaohli, can you test with kvm_intel.pml=0 (module parameter)?

Rick, if the test works, it shouldn't be a difficult patch.

Comment 6 Li Xiaohui 2019-09-24 04:51:21 UTC
(In reply to Paolo Bonzini from comment #5)
> xiaohli, can you test with kvm_intel.pml=0 (module parameter)?
>
You're right, when set kvm_intel.pml=0 on host, then do L1-L1 migration between one hosts(or using two hosts), migration finish successfully, L1&L2 all work well, no kernel panic happen

> Rick, if the test works, it shouldn't be a difficult patch.

Comment 8 Paolo Bonzini 2019-09-27 12:46:12 UTC
*** Bug 1745449 has been marked as a duplicate of this bug. ***

Comment 9 Rick Barry 2019-10-16 14:20:06 UTC
The patches for this have been included in the KVM rebase for 8.2.

Closing this as duplicate of bug 1749495 and requesting 8.1 z-stream. Vitaly can provide info on the patches needed.

QA can you provide qa_ack that's needed for the z-stream approval?

*** This bug has been marked as a duplicate of bug 1749495 ***

Comment 10 Vitaly Kuznetsov 2019-10-16 15:46:35 UTC
(In reply to Rick Barry from comment #9)

> Vitaly can provide info on the patches needed.
> 

8.2 rebase patches are not merged yet so commit hashes come from my local git:

6685949407c0 selftests: kvm: add test for dirty logging inside nested guests
549f1da01d37 KVM: x86: fix nested guest live migration with PML
a95566678f33 KVM: x86: assign two bits to track SPTE kinds

Comment 11 Karen Noel 2019-10-17 01:50:48 UTC
(In reply to Rick Barry from comment #9)
> The patches for this have been included in the KVM rebase for 8.2.
> 
> Closing this as duplicate of bug 1749495 and requesting 8.1 z-stream. Vitaly
> can provide info on the patches needed.
> 
> QA can you provide qa_ack that's needed for the z-stream approval?
> 
> *** This bug has been marked as a duplicate of bug 1749495 ***

Use depends on and TestOnly instead of closing. This way we can track the fix and testing. Thanks.

Comment 12 Rick Barry 2019-11-13 14:00:45 UTC
Moving this to ON_QA since dependent bug 1749495 is also ON_QA.

Comment 13 Paolo Bonzini 2019-12-13 13:36:12 UTC
*** Bug 1076294 has been marked as a duplicate of this bug. ***

Comment 15 Paolo Bonzini 2019-12-17 12:11:32 UTC
This bug has been fixed now, but actually it was both discovered and fixed during 8.2 development; it was never something supported in 8.1.  So I think it does not need any documentation.

Comment 17 Qinghua Cheng 2020-01-02 08:55:12 UTC
Hi Paolo,

When I tried to verify this bug, I got qemu-kvm core dump on host. 

QEMU 4.2.0 monitor - type 'help' for more information
(qemu) qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172
qemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2947: kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
./test_linux2_8_2_incoming.sh: line 21: 25581 Aborted                 (core dumped) /usr/libexec/qemu-kvm -name 'l1-rhel8' -machine q35 -nodefaults -device VGA,bus=pcie.0,addr=0x2 -device qemu-xhci,id=usb1,bus=pcie.0,addr=0x3 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/root/rhel820-64-virtio-scsi.qcow2 -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,serial=SYSTEM_DISK0,bus=pcie.0,addr=0x4 -device virtio-net-pci,mac=9a:ff:43:39:cd:8b,id=idO4Skpc,netdev=idpmLptO,bus=pcie.0,addr=0x5 -netdev tap,id=idpmLptO,vhost=on -m 8192 -smp 12,maxcpus=12,cores=6,threads=1,sockets=2 -cpu host -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :2 -rtc base=localtime,clock=host,driftfix=slew -boot order=cdn,once=d,menu=off,strict=off -enable-kvm -monitor stdio -qmp tcp:0:6667,server,nowait -incoming tcp:0:5555


My host env: 

kernel: 4.18.0-167.el8.x86_64
qemu-kvm: qemu-kvm-4.2.0-4.module+el8.2.0+5220+e82621dc.x86_64

This issue happens both when set pml to Y and N on my host. 

L1 and L2 guests are both RHEL 8.2

Is this another issue? If need to open a new bug track this, please let me know.

Thanks!

Comment 18 Vitaly Kuznetsov 2020-01-04 15:50:57 UTC
(In reply to Qinghua Cheng from comment #17)
> Hi Paolo,
> 
> When I tried to verify this bug, I got qemu-kvm core dump on host. 
> 
> QEMU 4.2.0 monitor - type 'help' for more information
> (qemu) qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172
> qemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2947:
> kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.

This looks a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1786288

Comment 20 Vitaly Kuznetsov 2020-01-10 17:34:38 UTC
(In reply to Vitaly Kuznetsov from comment #18)
> (In reply to Qinghua Cheng from comment #17)
> > Hi Paolo,
> > 
> > When I tried to verify this bug, I got qemu-kvm core dump on host. 
> > 
> > QEMU 4.2.0 monitor - type 'help' for more information
> > (qemu) qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172
> > qemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2947:
> > kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
> 
> This looks a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1786288

Actually no, BZ#1786288 is about 'hv_evmcs' flag and you don't seem to have it on
your command line. Could you please try to verify this bug with an older QEMU
build (4.1) and file a new BZ for the https://bugzilla.redhat.com/show_bug.cgi?id=1738741#c17
issue? Please provide your /proc/cpuinfo.

Comment 21 Qinghua Cheng 2020-01-13 08:33:49 UTC
This bug is verified on 4.18.0-168.el8.x86_64, QEMU emulator version 4.1.0 (qemu-kvm-4.1.0-14.module+el8.2.0+4677+51176c2e). After migration, l1 and l2 guests work well. 

Issue in Comment 17 is tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1790308.

Comment 23 errata-xmlrpc 2020-04-28 16:23:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1769