Bug 1420679
Summary: | Guest reboot after migration from RHEL7.2.z -> RHEL7.4 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | huiqingding <huding> |
Component: | qemu-kvm-rhev | Assignee: | Dr. David Alan Gilbert <dgilbert> |
Status: | CLOSED ERRATA | QA Contact: | huiqingding <huding> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.4 | CC: | chayang, dgilbert, huding, juzhang, knoel, michen, mrezanin, qzhang, virt-maint, xianwang, zhengtli |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.9.0-1.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-01 23:44:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1387372 | ||
Bug Blocks: | 1376765 |
Description
huiqingding
2017-02-09 10:01:03 UTC
Confirmed. Happens on Intel as well with a 7.3 guest Easy reproduce with: /usr/libexec/qemu-kvm -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off,vmport=off -cpu IvyBridge -m 4096 -no-hpet -drive file=/home/vms/7.3-fromimage.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev stdio,mux=on,id=mon -mon chard ev=mon,mode=readline --device isa-serial,chardev=mon kvm tracing on the destination I see: <series of sane looking stuff, page faults, all normal, some console IO> CPU 0/KVM-11014 [014] .... 6629872.405344: kvm_exit: reason EPT_VIOLATION rip 0xffffffff81060eb6 info 181 0 CPU 0/KVM-11014 [014] .... 6629872.405345: kvm_page_fault: address 13fc09010 error_code 181 CPU 0/KVM-11014 [014] .... 6629872.405347: kvm_inj_exception: #PF (0x0) CPU 0/KVM-11014 [014] d... 6629872.405347: kvm_entry: vcpu 0 CPU 0/KVM-11014 [014] .... 6629872.405350: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xffffffff8168df90 info 0 800000fd CPU 0/KVM-11014 [014] .... 6629872.405352: kvm_enter_smm: vcpu 0: entering SMM, smbase 0x30000 CPU 0/KVM-11014 [014] d... 6629872.405371: kvm_entry: vcpu 0 CPU 0/KVM-11014 [014] .... 6629872.405373: kvm_exit: reason EPT_VIOLATION rip 0x8000 info 184 0 CPU 0/KVM-11014 [014] .... 6629872.405373: kvm_page_fault: address 38000 error_code 184 CPU 0/KVM-11014 [014] d... 6629872.405380: kvm_entry: vcpu 0 <then more page faults from 8000..3f000 with codes of 184 or 183> then CPU 0/KVM-11014 [014] .... 6629872.405486: kvm_exit: reason EXCEPTION_NMI rip 0xfe04 info 0 80000306 CPU 0/KVM-11014 [014] .... 6629872.405495: kvm_emulate_insn: 30000:fe04:ff ff (real) CPU 0/KVM-11014 [014] .... 6629872.405496: kvm_inj_exception: #UD (0x0) CPU 0/KVM-11014 [014] d... 6629872.405497: kvm_entry: vcpu 0 CPU 0/KVM-11014 [014] .... 6629872.405499: kvm_exit: reason TRIPLE_FAULT rip 0xfe04 info 0 0 CPU 0/KVM-11014 [014] .... 6629872.405501: kvm_userspace_exit: reason KVM_EXIT_SHUTDOWN (8) CPU 0/KVM-11014 [014] .... 6629872.405502: kvm_fpu: unload I think this is SMM/SMI related but am not sure yet. I think it ends up in SMM - for reasons I don't understand - then runs through an area of junk before finally hitting an undefined instruction and triple faulting. I've not tracked down what that initial 'external interrupt' is - it doesn't seem to match the vector of any registered device on the guest. Still looking like SMM/SMI. 7.2 ends up setting CPU_INTERRUPT_SMI - but ignores it. When we read the migration stream we end up with that set and causing the SMI entry. What I've not figured out yet is why 7.3 doesn't do the entry - it has the SMI entry code. OK, the reason 7.3 worked is it had a bug with SMIs didn't get delivered; that was fixed in 7.4 by: 68c6efe07a4729b54947658df4fceed84f3d0fef Posted downstream: x86: Work around SMI breakages Please try this with lots of machine types and also with EFI firmware on q35. Taken a different tack, and posted upstream, we'll need to wait for it to swing back around. Fixed upstream in 2.9, commit fc3a1fd74fac0e3233060aaaf923fe8ec104b48f (In reply to Dr. David Alan Gilbert from comment #7) > Posted downstream: > x86: Work around SMI breakages > > Please try this with lots of machine types and also with EFI firmware on q35. Verify this bug using: rhel7.2.z host: kernel-3.10.0-327.53.1.el7.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.25.x86_64 rhel7.4 host: kernel-3.10.0-663.el7.x86_64 qemu-kvm-rhev-2.9.0-2.el7.x86_64 Test migration rhel7.2.z<->rhel7.4 with machine types "-M rhel7.2.0/rhel7.1.0/rhel7.0.0/rhel6.6.0/rhel6.5.0", test two guests: rhel7.2.z and win2012r2. The result is pass, migration can be finished normally and guest does not reboot after migration. For qemu-kvm-rhev-2.9.0-2.el7.x86_64, only supports pc-q35-rhel7.4.0 and pc-q35-rhel7.3.0. Test rhel7.3.z<->rhel7.4 with EFI firmware on q35, guest is rhel7.3.z, migration can be finished normally and the guest does not reboot. Based on comment #15, set this bug to be verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 |