RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1425003 - virsh save doesn't work after canceled postcopy migration
Summary: virsh save doesn't work after canceled postcopy migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: zhe peng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-20 10:49 UTC by Milan Zamazal
Modified: 2017-08-02 00:01 UTC (History)
5 users (show)

Fixed In Version: libvirt-3.2.0-4.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-01 17:21:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
libvirtd.log (1.21 MB, text/plain)
2017-04-25 03:22 UTC, zhe peng
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1846 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2017-08-01 18:02:50 UTC

Description Milan Zamazal 2017-02-20 10:49:20 UTC
Description of problem:

When `virsh migrate' is called with --postcopy and the migration is canceled, then `virsh save' doesn't work.

Version-Release number of selected component (if applicable):

libvirt-2.0.0-10.el7_3.4.x86_64

How reproducible:

100%

Steps to Reproduce:

1. Start a VM.
2. Start to migrate it with a post-copy flag.
3. Cancel the migration before it completes.
4. Try to save the VM.

Actual results:

You receive an error like

  error: Failed to save domain dummy to /tmp/xxx
  error: operation failed: domain save job: unexpectedly failed

libvirt contains an error like

  2017-02-20T10:36:38.761085Z qemu-kvm: socket_writev_buffer: Got err=32 for (32768/18446744073709551615)
  Unable to open return-path for postcopy

Expected results:

The VM is saved.

Additional info:

The bug is similar to https://bugzilla.redhat.com/1374718, it just differs in that the migration is canceled.

Comment 1 Jiri Denemark 2017-04-05 13:10:59 UTC
Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-April/msg00219.html

Comment 2 Jiri Denemark 2017-04-07 13:36:15 UTC
Fixed upstream by

commit 8be3ccd047e17c4998c669da2a63c3956e1f5225
Refs: v3.2.0-77-g8be3ccd04
Author:     Jiri Denemark <jdenemar>
AuthorDate: Wed Apr 5 13:05:25 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Fri Apr 7 13:43:37 2017 +0200

    qemu: Properly reset all migration capabilities

    So far only QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY was reset, but only in
    a single code path leaving post-copy enabled in quite a few cases.

    https://bugzilla.redhat.com/show_bug.cgi?id=1425003

    Signed-off-by: Jiri Denemark <jdenemar>

Comment 5 zhe peng 2017-04-25 03:21:26 UTC
I can still reproduce this with build:
libvirt-3.2.0-3.el7.x86_64
qemu-kvm-rhev-2.9.0-1.el7.x86_64

step:
 1. Canceled post-copy migration by client(Ctrl+C)
#virsh migrate rhel7 qemu+ssh://$targethost/system --postcopy --live --verbose 
Migration: [ 67 %]^Cerror: operation aborted: migration job: canceled by client

# virsh save rhel7 /tmp/rhel7.save 
error: Failed to save domain rhel7 to /tmp/rhel7.save
error: operation failed: domain save job: unexpectedly failed

cat /var/log/libvirt/qemu/rhel7.log
2017-04-25 08:17:37.082+0000: initiating migration
RP: Received invalid message 0x0000 length 0x0000
RP: Received invalid message 0x0000 length 0x0000

Comment 6 zhe peng 2017-04-25 03:22:22 UTC
Created attachment 1273799 [details]
libvirtd.log

Comment 7 Jiri Denemark 2017-04-25 08:17:41 UTC
Can you check if it works in the following scenarios?

1. start a fresh domain and run "virsh save"
2. start a fresh domain, start a migration (without --postcopy), cancel the migration, and run "virsh save"

And could you also test with older qemu-kvm-rhev packages (such as 2.8.0-*)?

Comment 8 Jiri Denemark 2017-04-25 08:58:51 UTC
I analyzed the logs and it seems libvirt does not properly reset postcopy capability once migration is canceled. Which would mean there is a bug in the patches which were supposed to fix this issue.

Feel free to confirm it by responding to the questions in comment 7.

Comment 9 zhe peng 2017-04-25 09:01:34 UTC
scenario 1:
start a fresh domain and save
# virsh save rhel7 /tmp/rhel7.save

Domain rhel7 saved to /tmp/rhel7.save

scenario 2:
if without postcopy, domain can be saved.


and i test with qemu-kvm-rhev-2.8.0-5.el7.x86_64
scenario 1, guest can be saved without error
scenario 2.
behavior same with qemu-kvm-rhev-2.9.

Comment 10 Jiri Denemark 2017-04-26 20:00:27 UTC
The additional patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-April/msg01323.html

BTW, it should work even without this patch for migrations started with --p2p option.

Comment 11 Jiri Denemark 2017-04-27 12:04:19 UTC
Fixed upstream by

commit eeb2feb9fbb66ea9026edc6451018fb3b94ffa58
Refs: v3.2.0-273-geeb2feb9f
Author:     Jiri Denemark <jdenemar>
AuthorDate: Wed Apr 26 21:46:28 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Apr 27 13:55:46 2017 +0200

    qemu: Properly reset non-p2p migration

    While peer-to-peer migration enters the Confirm phase even if the
    Perform phase fails, the client which initiated a non-p2p migration will
    never call virDomainMigrateConfirm* API if the Perform phase failed.
    Thus we need to explicitly reset migration before reporting a failure
    from the Perform phase API.

    https://bugzilla.redhat.com/show_bug.cgi?id=1425003

    Signed-off-by: Jiri Denemark <jdenemar>

Comment 13 zhe peng 2017-05-04 06:48:14 UTC
verify with build:
libvirt-3.2.0-4.el7.x86_64
qemu-kvm-rhev-2.8.0-5.el7.x86_64

step:
 1. Canceled post-copy migration by client(Ctrl+C)
#virsh migrate rhel7 qemu+ssh://$targethost/system --postcopy --live --verbose 
Migration: [ 67 %]^Cerror: operation aborted: migration job: canceled by client

# virsh save rhel7 /tmp/rhel7.save 

Domain rhel7 saved to /tmp/rhel7.save

# virsh restore /tmp/rhel7.save
Domain restored from /tmp/rhel7.save

do migration again
# virsh migrate rhel7 qemu+ssh://$targethost/system --postcopy --live --verbose
Migration: [100 %]

 2.do p2p migration with/without postcopy, all can save guest.
# virsh migrate rhel7 qemu+ssh://$targethost/system --p2p --postcopy --live --verbose
Migration: [ 75 %]^Cerror: operation aborted: migration job: canceled by client

# virsh save rhel7 /tmp/rhel7.save

Domain rhel7 saved to /tmp/rhel7.save

3.# virsh migrate rhel7 qemu+ssh://$targethost/system --postcopy --postcopy-after-precopy --live --verbose
Migration: [ 80 %]^Cerror: operation aborted: migration job: canceled by client

[root@ibm-x3250m6-04 ~]# virsh save rhel7 /tmp/rhel7.save

Domain rhel7 saved to /tmp/rhel7.save

move to verified.

Comment 14 errata-xmlrpc 2017-08-01 17:21:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846

Comment 15 errata-xmlrpc 2017-08-02 00:01:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846


Note You need to log in before you can comment on or make changes to this bug.