Bug 1354341
Summary: | guest hang after cancel migration then migrate again | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | mazhang <mazhang> |
Component: | qemu-kvm-rhev | Assignee: | Thomas Huth <thuth> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.3 | CC: | amit.shah, dgibson, dgilbert, knoel, michen, qzhang, thuth, virt-maint |
Target Milestone: | rc | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.6.0-16.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-11-07 21:22:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
mazhang
2016-07-11 07:21:46 UTC
Downgrade to qemu-kvm-rhev-2.3.0-31.el7.ppc64le and re-test, doesn't hit this problem, so this bug is a regression. I think the problem might be that when doing migrate_cancel, close_htab_fd() is currently not called, so the spapr->htab_fd file descriptor stays valid. During the next migration attempt, the htab is migrated using the old file descriptor, so it likely misses the beginning of the htab. I think we should make sure that the htab_fd is closed when migration fails. Something like this seems to work: diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1614,10 +1613,18 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id) return 0; } +static void htab_cleanup(void *opaque) +{ + sPAPRMachineState *spapr = opaque; + + close_htab_fd(spapr); +} + static SaveVMHandlers savevm_htab_handlers = { .save_live_setup = htab_save_setup, .save_live_iterate = htab_save_iterate, .save_live_complete_precopy = htab_save_complete, + .cleanup = htab_cleanup, .load_state = htab_load, }; Not sure yet whether this is really the right solution, though... I've now sent the patch with the htab_cleanup() fix upstream: http://news.gmane.org/find-root.php?message_id=1469092894-12801-1-git-send-email-thuth@redhat.com Fix included in qemu-kvm-rhev-2.6.0-16.el7 Test this bug on qemu-kvm-rhev-2.6.0-17.el7.ppc64le 3 times as the steps in comment#0, the problem not happened any more. Host: 3.10.0-481.el7.ppc64le qemu-kvm-rhev-2.6.0-17.el7.ppc64le Guest: 3.10.0-481.el7.ppc64 So this bug has been fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2673.html |