Bug 1738741
Summary: | L2 guest hit kernel panic when do L1->L1 live migration on PML-enabled intel host | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Li Xiaohui <xiaohli> | ||||
Component: | kernel | Assignee: | Paolo Bonzini <pbonzini> | ||||
kernel sub component: | KVM | QA Contact: | Qinghua Cheng <qcheng> | ||||
Status: | CLOSED ERRATA | Docs Contact: | Parth Shah <pashah> | ||||
Severity: | unspecified | ||||||
Priority: | high | CC: | bdas, chayang, coli, ebarrera, hhei, hhuang, jinzhao, juzhang, knoel, leidwang, pashah, pbonzini, rbalakri, ribarry, virt-maint, vkuznets, xfu, xiaohli, ymankad | ||||
Version: | 8.1 | Keywords: | Reopened, TestBlocker, TestOnly | ||||
Target Milestone: | rc | ||||||
Target Release: | 8.2 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1745449 (view as bug list) | Environment: | |||||
Last Closed: | 2020-04-28 16:23:27 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1749495 | ||||||
Bug Blocks: | 1558351, 1745449, 1746622 | ||||||
Attachments: |
|
Description
Li Xiaohui
2019-08-08 03:30:05 UTC
Created attachment 1601677 [details]
vmcore-dmesg.txt
xiaohli, can you test with kvm_intel.pml=0 (module parameter)? Rick, if the test works, it shouldn't be a difficult patch. (In reply to Paolo Bonzini from comment #5) > xiaohli, can you test with kvm_intel.pml=0 (module parameter)? > You're right, when set kvm_intel.pml=0 on host, then do L1-L1 migration between one hosts(or using two hosts), migration finish successfully, L1&L2 all work well, no kernel panic happen > Rick, if the test works, it shouldn't be a difficult patch. *** Bug 1745449 has been marked as a duplicate of this bug. *** The patches for this have been included in the KVM rebase for 8.2. Closing this as duplicate of bug 1749495 and requesting 8.1 z-stream. Vitaly can provide info on the patches needed. QA can you provide qa_ack that's needed for the z-stream approval? *** This bug has been marked as a duplicate of bug 1749495 *** (In reply to Rick Barry from comment #9) > Vitaly can provide info on the patches needed. > 8.2 rebase patches are not merged yet so commit hashes come from my local git: 6685949407c0 selftests: kvm: add test for dirty logging inside nested guests 549f1da01d37 KVM: x86: fix nested guest live migration with PML a95566678f33 KVM: x86: assign two bits to track SPTE kinds (In reply to Rick Barry from comment #9) > The patches for this have been included in the KVM rebase for 8.2. > > Closing this as duplicate of bug 1749495 and requesting 8.1 z-stream. Vitaly > can provide info on the patches needed. > > QA can you provide qa_ack that's needed for the z-stream approval? > > *** This bug has been marked as a duplicate of bug 1749495 *** Use depends on and TestOnly instead of closing. This way we can track the fix and testing. Thanks. Moving this to ON_QA since dependent bug 1749495 is also ON_QA. *** Bug 1076294 has been marked as a duplicate of this bug. *** This bug has been fixed now, but actually it was both discovered and fixed during 8.2 development; it was never something supported in 8.1. So I think it does not need any documentation. Hi Paolo, When I tried to verify this bug, I got qemu-kvm core dump on host. QEMU 4.2.0 monitor - type 'help' for more information (qemu) qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172 qemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2947: kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. ./test_linux2_8_2_incoming.sh: line 21: 25581 Aborted (core dumped) /usr/libexec/qemu-kvm -name 'l1-rhel8' -machine q35 -nodefaults -device VGA,bus=pcie.0,addr=0x2 -device qemu-xhci,id=usb1,bus=pcie.0,addr=0x3 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/root/rhel820-64-virtio-scsi.qcow2 -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,serial=SYSTEM_DISK0,bus=pcie.0,addr=0x4 -device virtio-net-pci,mac=9a:ff:43:39:cd:8b,id=idO4Skpc,netdev=idpmLptO,bus=pcie.0,addr=0x5 -netdev tap,id=idpmLptO,vhost=on -m 8192 -smp 12,maxcpus=12,cores=6,threads=1,sockets=2 -cpu host -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :2 -rtc base=localtime,clock=host,driftfix=slew -boot order=cdn,once=d,menu=off,strict=off -enable-kvm -monitor stdio -qmp tcp:0:6667,server,nowait -incoming tcp:0:5555 My host env: kernel: 4.18.0-167.el8.x86_64 qemu-kvm: qemu-kvm-4.2.0-4.module+el8.2.0+5220+e82621dc.x86_64 This issue happens both when set pml to Y and N on my host. L1 and L2 guests are both RHEL 8.2 Is this another issue? If need to open a new bug track this, please let me know. Thanks! (In reply to Qinghua Cheng from comment #17) > Hi Paolo, > > When I tried to verify this bug, I got qemu-kvm core dump on host. > > QEMU 4.2.0 monitor - type 'help' for more information > (qemu) qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172 > qemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2947: > kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. This looks a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1786288 (In reply to Vitaly Kuznetsov from comment #18) > (In reply to Qinghua Cheng from comment #17) > > Hi Paolo, > > > > When I tried to verify this bug, I got qemu-kvm core dump on host. > > > > QEMU 4.2.0 monitor - type 'help' for more information > > (qemu) qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172 > > qemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2947: > > kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. > > This looks a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1786288 Actually no, BZ#1786288 is about 'hv_evmcs' flag and you don't seem to have it on your command line. Could you please try to verify this bug with an older QEMU build (4.1) and file a new BZ for the https://bugzilla.redhat.com/show_bug.cgi?id=1738741#c17 issue? Please provide your /proc/cpuinfo. This bug is verified on 4.18.0-168.el8.x86_64, QEMU emulator version 4.1.0 (qemu-kvm-4.1.0-14.module+el8.2.0+4677+51176c2e). After migration, l1 and l2 guests work well. Issue in Comment 17 is tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1790308. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1769 |