RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2111948 - Postcopy-recover failed if vm I/O error occurred during postcopy-paused status
Summary: Postcopy-recover failed if vm I/O error occurred during postcopy-paused status
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Fangge Jin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-28 13:53 UTC by Fangge Jin
Modified: 2023-05-09 08:09 UTC (History)
6 users (show)

Fixed In Version: libvirt-9.0.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-09 07:26:34 UTC
Type: Bug
Target Upstream Version: 9.0.0
Embargoed:


Attachments (Terms of Use)
libvirt and qemu log (377.93 KB, application/x-bzip)
2022-07-28 13:53 UTC, Fangge Jin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-129460 0 None None None 2022-07-28 14:10:52 UTC
Red Hat Product Errata RHBA-2023:2171 0 None None None 2023-05-09 07:27:25 UTC

Description Fangge Jin 2022-07-28 13:53:42 UTC
Created attachment 1899963 [details]
libvirt and qemu log

Description of problem:
Make vm I/O error when postcopy is paused, then try to recover postcopy migration, it failed and said:
error: Requested operation is not valid: migration of domain uefi-5 is not in post-copy phase


Version-Release number of selected component (if applicable):
libvirt-8.5.0-2.el9.x86_64
qemu-kvm-7.0.0-9.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Start vm

2. Migrate vm to target host, and switch to postcopy
#  virsh migrate uefi-5 qemu+tcp://******/system --live  --postcopy --undefinesource --persistent --bandwidth 3 --postcopy-bandwidth 3   --migrateuri tcp://******:49153 --p2p
# virsh migrate-postcopy uefi-5

3. Abort migration:
# virsh domjobabort uefi-5 --postcopy

4. Make I/O error in vm.
For example: change ownership of vm to root:root, then do disk I/O in vm.

[target host]# virsh domstate uefi-5 --reason
paused (I/O error)

[src host]# virsh domstate uefi-5 --reason
paused (post-copy failed)

5. Try to recover postcopy migration
#  virsh migrate uefi-5 qemu+tcp://******/system --live  --postcopy --undefinesource --persistent --bandwidth 3 --postcopy-bandwidth 3   --migrateuri tcp://******:49153 --p2p --postcopy-resume
error: Requested operation is not valid: migration of domain uefi-5 is not in post-copy phase


Actual results:
As step5, postcopy recover failed.

Expected results:
Step5 can succeed.

Additional info:

Comment 1 Jiri Denemark 2022-12-15 14:38:46 UTC
Patches sent upstream for review: https://listman.redhat.com/archives/libvir-list/2022-December/236393.html

Comment 2 Jiri Denemark 2023-01-06 16:01:24 UTC
Fixed upstream by

commit b92cba67c67551139e5421d97a66620e836a0523
Refs: v8.10.0-202-gb92cba67c6
Author:     Jiri Denemark <jdenemar>
AuthorDate: Wed Dec 7 14:46:25 2022 +0100
Commit:     Jiri Denemark <jdenemar>
CommitDate: Fri Jan 6 16:17:38 2023 +0100

    conf: Drop virDomainJobOperation parameter from virDomainObjIsPostcopy

    The parameter was only used to select which states correspond to an
    active or failed post-copy migration. But these states are either
    applicable to both operations or the check would just paper over a code
    bug in case of an impossible combination of state and operation. By
    dropping the check we can make the code simpler and also reuse existing
    virDomainObjIsFailedPostcopy function and only check for active
    post-copy states.

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Michal Privoznik <mprivozn>

commit 49a57540638aa0898432ace1e016a77006d272af
Refs: v8.10.0-203-g49a5754063
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Dec 13 16:43:53 2022 +0100
Commit:     Jiri Denemark <jdenemar>
CommitDate: Fri Jan 6 16:17:38 2023 +0100

    conf: Add job parameter to virDomainObjIsFailedPostcopy

    Unused for now, but this will change soon.

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Michal Privoznik <mprivozn>

commit 7050dad5f92010720cc8e8b7d5c37eaad7696c5e
Refs: v8.10.0-204-g7050dad5f9
Author:     Jiri Denemark <jdenemar>
AuthorDate: Thu Dec 15 14:12:43 2022 +0100
Commit:     Jiri Denemark <jdenemar>
CommitDate: Fri Jan 6 16:17:38 2023 +0100

    qemu: Remember failed post-copy migration in job

    When post-copy migration fails, the domain stays running on the
    destination with a VIR_DOMAIN_RUNNING_POSTCOPY_FAILED reason. Both the
    state and the reason can later be rewritten in case the domain gets
    paused for other reasons (such as an I/O error). Thus we need a separate
    place to remember the post-copy migration failed to be able to resume
    the migration.

    https://bugzilla.redhat.com/show_bug.cgi?id=2111948

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Michal Privoznik <mprivozn>

Comment 3 Fangge Jin 2023-01-17 03:59:55 UTC
Pre-verify with libvirt-9.0.0-1.el9.x86_64

Steps:
1. Start vm

2. Migrate vm to target host and switch to postcopy
# virsh migrate vm1 qemu+tcp://{target_ip}/system --live --postcopy --undefinesource --persistent --p2p --bandwidth 3 --postcopy-bandwidth 3 --migrateuri tcp://{target_ip}:49153

3. Abort postcopy migration
# virsh domjobabort vm1 --postcopy

4. Make I/O error in vm.
For example: change ownership of vm to root:root, then do disk I/O in vm.

Event output:
2023-01-17 03:43:03.879+0000: event 'io-error' for domain 'vm1': /nfs/RHEL-9.1-x86_64-latest-ovmf.qcow2.2 (virtio-disk0) report
2023-01-17 03:43:03.879+0000: event 'io-error-reason' for domain 'vm1': /nfs/RHEL-9.1-x86_64-latest-ovmf.qcow2.2 (virtio-disk0) report due to 

5. Check domain state with reason:
# virsh domstate vm1 --reason
running (post-copy failed)

6. Resume postcopy migration, it succeeds:
# virsh migrate vm1 qemu+tcp://{target_ip}/system --live --postcopy --undefinesource --persistent --p2p --bandwidth 3 --postcopy-bandwidth 3 --migrateuri tcp://{target_ip}:49153 --postcopy-resume

Event output on target host:
2023-01-17 03:45:43.940+0000: event 'lifecycle' for domain 'vm1': Resumed Post-copy
2023-01-17 03:46:15.658+0000: event 'lifecycle' for domain 'vm1': Resumed Migrated

Comment 7 Fangge Jin 2023-02-10 06:03:19 UTC
Verified with libvirt-9.0.0-4.el9.x86_64

Comment 9 errata-xmlrpc 2023-05-09 07:26:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2171


Note You need to log in before you can comment on or make changes to this bug.