Bug 1377920
| Summary: | Guest fails reboot and causes kernel-panic | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Pei Zhang <pezhang> |
| Component: | qemu-kvm-rhev | Assignee: | Dr. David Alan Gilbert <dgilbert> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 7.3 | CC: | chayang, coli, crose, dgilbert, hhuang, juzhang, michen, mrezanin, ngu, pbonzini, pezhang, prarit, qzhang, virt-maint, xfu, yduan |
| Target Milestone: | rc | Keywords: | Regression |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-rhev-2.6.0-27.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-07 21:36:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Pei Zhang
2016-09-21 03:06:15 UTC
Additional info(continued): 1. (fix typo in above) qemu-kvm-rhev-2.6.0-26.el7.x86_64 doesn't work 2. Also test with rhel kernel(non-rt), hit same issue Host: 3.10.0-509.el7.x86_64 qemu-kvm-rhev-2.6.0-26.el7.x86_64 Guest: 3.10.0-509.el7.x86_64 console info [ 10.365787] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 [ 20.637017] smpboot: do_boot_cpu failed(-1) to wakeup CPU#2 [ 30.907909] smpboot: do_boot_cpu failed(-1) to wakeup CPU#3 [ 42.063551] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 [ 52.318500] smpboot: do_boot_cpu failed(-1) to wakeup CPU#2 [ 62.576511] smpboot: do_boot_cpu failed(-1) to wakeup CPU#3 Additional info(continued) 3. rhel6.8GA hit this regression issue, windows doesn't. rhel6.8 GA doesn't work windows 7 work OK, I can reproduce this bug - only happens on my rhel/Xeon hosts, not on my fedora/laptop even with the same qemu. If I add back the kvm_put_apic in kvm_arch_put_registers it seems to work again; so I'm guessing that means that something somewhere is changing the APIC state after the point we reset and before that point and we're losing the change. Note if I turn on apic=verbose debug ignore_loglevel in the guest, I see it's stalled at 'calibrating APIC timer...' If I make the code in kvm_arch_put_registers: kvm_put_apic like it was before it apparently works (not checked the migrate) but if I make it: kvm_get_apic kvm_put_apic it fails just like the put isn't there; so again it sounds like some inconsistent state that the put is restoring; so I do: x86_cpu_dump_local_apic_state kvm_get_apic x86_cpu_dump_local_apic_state kvm_put_apic and expect to see the state that's different that's being lost without the put or with the get/put pair - nothing, no differences in the dump. Hmm. I add to the dump the dump of some other fields it hasn't got and the only ones that I've found different are initial_count_load_time and count_shift; however, if I preserve those around the get, it still fails - suggesting it's something else. Hmm. Debugging with Paolo; this works with a host -490 kernel, breaks with a -491 ... but it's still a QEMU bug. Fix included in qemu-kvm-rhev-2.6.0-27.el7 ==Verification== Versions: Host: 3.10.0-510.el7.x86_64 qemu-kvm-rhev-2.6.0-27.el7.x86_64 Guest: Same as Comment14. Steps: Same as Comment14. Results show: Guest Basic(boot/reboot/shutdown) Migration RHEL7.3 PASS PASS RHEL6.8GA PASS PASS RHEL7.2GA PASS PASS Windows7 PASS PASS Windows8 PASS PASS Windows8.1 PASS PASS Windows10 PASS \ So this bug has been fixed well, thank you. ==Verification(update)== Results show: Guest Basic(boot/reboot/shutdown) Migration ... Windows10 PASS PASS Note: In this testing, win10 migration works. I will update this testing in bug[1]. [1]Bug 1357765 - windows10 guest hangs after migration Set this bug as 'VERIFIED' as Comment 18 and Comment 19. *** Bug 1357765 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2673.html |