Bug 1568407
| Summary: | Guest is left paused on source host sometimes if kill source libvirtd during live migration due to QEMU image locking | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jiri Denemark <jdenemar> |
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
| Status: | CLOSED ERRATA | QA Contact: | Fangge Jin <fjin> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 7.5 | CC: | chayang, coli, dgilbert, dyuan, famz, fjin, hhuang, jdenemar, jinzhao, juzhang, knoel, kwolf, lmen, michen, ngu, pingl, qzhang, virt-bugs, virt-maint, xianwang, xuzhang, yafu, yuhuang |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-4.5.0-1.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1560854 | Environment: | |
| Last Closed: | 2018-10-30 09:53:26 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1560854 | ||
| Bug Blocks: | |||
|
Description
Jiri Denemark
2018-04-17 12:40:15 UTC
We'll need to enable a new migration capability (late-block-activate) to request delayed activation of block devices. Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2018-June/msg00228.html Fixed upstream by
commit 4370ac84f86e8f2d71a1d1e1cc65f7238359c36e
Refs: v4.4.0-89-g4370ac84f8
Author: Jiri Denemark <jdenemar>
AuthorDate: Tue Apr 17 14:46:29 2018 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Tue Jun 5 09:39:24 2018 +0200
qemu: Fix domain resume after failed migration
Libvirt relies on being able to kill the destination domain and resume
the source one during migration until we called "cont" on the
destination. Unfortunately, QEMU automatically activates block devices
at the end of migration even when it's called with -S. This wasn't a big
issue in the past since the guest is not running and thus no data are
written to the block devices. However, when QEMU introduced its internal
block device locks, we can no longer resume the source domain once the
destination domain already activated the block devices (and thus
acquired all locks) unless the destination domain is killed first.
Since it's impossible to synchronize the destination and the source
libvirt daemons after a failed migration, QEMU introduced a new
migration capability called "late-block-activate" which ensures QEMU
won't activate block devices until it gets "cont". The only thing we
need to do is to enable this capability whenever QEMU supports it.
https://bugzilla.redhat.com/show_bug.cgi?id=1568407
QEMU commit implementing the capability: v2.12.0-952-g0f073f44df
Signed-off-by: Jiri Denemark <jdenemar>
Reviewed-by: Ján Tomko <jtomko>
Verified with libvirt-4.5.0-4.el7.x86_64 and qemu-kvm-rhev-2.12.0-8.el7.x86_64. Steps are same as https://bugzilla.redhat.com/show_bug.cgi?id=1560854#c10, except that the breakpoint changed to:qemu/qemu_migration.c:4695 Also did regression test for cross migration, migration succeeds: 7.6<->7.5 7.6<->7.4 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3113 |