Bug 1425003
Summary: | virsh save doesn't work after canceled postcopy migration | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Milan Zamazal <mzamazal> | ||||
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||
Status: | CLOSED ERRATA | QA Contact: | zhe peng <zpeng> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.3 | CC: | dyuan, jdenemar, rbalakri, xuzhang, zpeng | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-3.2.0-4.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-08-01 17:21:45 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Milan Zamazal
2017-02-20 10:49:20 UTC
Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-April/msg00219.html Fixed upstream by commit 8be3ccd047e17c4998c669da2a63c3956e1f5225 Refs: v3.2.0-77-g8be3ccd04 Author: Jiri Denemark <jdenemar> AuthorDate: Wed Apr 5 13:05:25 2017 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Fri Apr 7 13:43:37 2017 +0200 qemu: Properly reset all migration capabilities So far only QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY was reset, but only in a single code path leaving post-copy enabled in quite a few cases. https://bugzilla.redhat.com/show_bug.cgi?id=1425003 Signed-off-by: Jiri Denemark <jdenemar> I can still reproduce this with build: libvirt-3.2.0-3.el7.x86_64 qemu-kvm-rhev-2.9.0-1.el7.x86_64 step: 1. Canceled post-copy migration by client(Ctrl+C) #virsh migrate rhel7 qemu+ssh://$targethost/system --postcopy --live --verbose Migration: [ 67 %]^Cerror: operation aborted: migration job: canceled by client # virsh save rhel7 /tmp/rhel7.save error: Failed to save domain rhel7 to /tmp/rhel7.save error: operation failed: domain save job: unexpectedly failed cat /var/log/libvirt/qemu/rhel7.log 2017-04-25 08:17:37.082+0000: initiating migration RP: Received invalid message 0x0000 length 0x0000 RP: Received invalid message 0x0000 length 0x0000 Created attachment 1273799 [details]
libvirtd.log
Can you check if it works in the following scenarios? 1. start a fresh domain and run "virsh save" 2. start a fresh domain, start a migration (without --postcopy), cancel the migration, and run "virsh save" And could you also test with older qemu-kvm-rhev packages (such as 2.8.0-*)? I analyzed the logs and it seems libvirt does not properly reset postcopy capability once migration is canceled. Which would mean there is a bug in the patches which were supposed to fix this issue. Feel free to confirm it by responding to the questions in comment 7. scenario 1: start a fresh domain and save # virsh save rhel7 /tmp/rhel7.save Domain rhel7 saved to /tmp/rhel7.save scenario 2: if without postcopy, domain can be saved. and i test with qemu-kvm-rhev-2.8.0-5.el7.x86_64 scenario 1, guest can be saved without error scenario 2. behavior same with qemu-kvm-rhev-2.9. The additional patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-April/msg01323.html BTW, it should work even without this patch for migrations started with --p2p option. Fixed upstream by commit eeb2feb9fbb66ea9026edc6451018fb3b94ffa58 Refs: v3.2.0-273-geeb2feb9f Author: Jiri Denemark <jdenemar> AuthorDate: Wed Apr 26 21:46:28 2017 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Thu Apr 27 13:55:46 2017 +0200 qemu: Properly reset non-p2p migration While peer-to-peer migration enters the Confirm phase even if the Perform phase fails, the client which initiated a non-p2p migration will never call virDomainMigrateConfirm* API if the Perform phase failed. Thus we need to explicitly reset migration before reporting a failure from the Perform phase API. https://bugzilla.redhat.com/show_bug.cgi?id=1425003 Signed-off-by: Jiri Denemark <jdenemar> verify with build: libvirt-3.2.0-4.el7.x86_64 qemu-kvm-rhev-2.8.0-5.el7.x86_64 step: 1. Canceled post-copy migration by client(Ctrl+C) #virsh migrate rhel7 qemu+ssh://$targethost/system --postcopy --live --verbose Migration: [ 67 %]^Cerror: operation aborted: migration job: canceled by client # virsh save rhel7 /tmp/rhel7.save Domain rhel7 saved to /tmp/rhel7.save # virsh restore /tmp/rhel7.save Domain restored from /tmp/rhel7.save do migration again # virsh migrate rhel7 qemu+ssh://$targethost/system --postcopy --live --verbose Migration: [100 %] 2.do p2p migration with/without postcopy, all can save guest. # virsh migrate rhel7 qemu+ssh://$targethost/system --p2p --postcopy --live --verbose Migration: [ 75 %]^Cerror: operation aborted: migration job: canceled by client # virsh save rhel7 /tmp/rhel7.save Domain rhel7 saved to /tmp/rhel7.save 3.# virsh migrate rhel7 qemu+ssh://$targethost/system --postcopy --postcopy-after-precopy --live --verbose Migration: [ 80 %]^Cerror: operation aborted: migration job: canceled by client [root@ibm-x3250m6-04 ~]# virsh save rhel7 /tmp/rhel7.save Domain rhel7 saved to /tmp/rhel7.save move to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 |