Bug 1791458 - VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event is emitted for incoming migration
Summary: VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event is emitted for incoming migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: 8.2
Assignee: Jiri Denemark
QA Contact: yafu
URL:
Whiteboard:
Depends On:
Blocks: 1774230 1791886
TreeView+ depends on / blocked
 
Reported: 2020-01-15 21:11 UTC by Jiri Denemark
Modified: 2020-06-08 07:38 UTC (History)
18 users (show)

Fixed In Version: libvirt-6.0.0-2.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1774230
: 1791886 (view as bug list)
Environment:
Last Closed: 2020-05-05 09:55:51 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2017 0 None None None 2020-05-05 09:57:21 UTC

Description Jiri Denemark 2020-01-15 21:11:54 UTC
+++ This bug was initially created as a clone of Bug #1774230 +++

Description of problem:

During post-copy migration libvirtd on the destination host unexpectedly emits
VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY lifecycle event just before resuming the
migration in post-copy mode.

Version-Release number of selected component (if applicable):

Any libvirt version since RHEL 7.7 (bug 1647365):
libvirt-4.5.0-23.el7_7.5
libvirt-4.5.0-31.el7
libvirt-4.5.0-35.2.el8
libvirt-5.6.0-10.el8
libvirt-6.0.0-1.el8

How reproducible:

100%

Steps to Reproduce:

1. start a new domain on a source host
2. make the domain dirty memory (e.g., by running stress command):
    stress --vm 2 --vm-bytes 512M
3. start watching for lifecycle events on a destination host:
    virsh event --event lifecycle --loop --timestamp
4. migrate the domain from the source host to the destination host:
    virsh migrate --p2p --live --postcopy --postcopy-after-precopy $DOM $DEST_URI
5. check the lifecycle events reported on the destination during migration

Actual results:

2020-01-15 14:20:26.689+0000: event 'lifecycle' for domain nest: Started Migrated
2020-01-15 14:21:01.837+0000: event 'lifecycle' for domain nest: Suspended Post-copy
2020-01-15 14:21:03.266+0000: event 'lifecycle' for domain nest: Resumed Post-copy
2020-01-15 14:21:32.060+0000: event 'lifecycle' for domain nest: Resumed Migrated

Expected results:

2020-01-15 14:28:53.803+0000: event 'lifecycle' for domain nest: Started Migrated
2020-01-15 14:28:56.156+0000: event 'lifecycle' for domain nest: Resumed Post-copy
2020-01-15 14:28:56.258+0000: event 'lifecycle' for domain nest: Resumed Migrated

In other words, no "Suspended Post-copy" event should be reported.

Additional info:

This issue was nicely analyzed in the original bug 1774230:

--- Additional comment from Benny Zlotnik on 2020-01-15 09:02:25 UTC ---

Hi Jiri,

After investigating this bug and discussing the proposed patch with Milan,
there is something unclear. It seems that in post-copy migrate both source and
destination receive VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY, and it is probably
has something to do with the change from[1], as I see the following logs on
the destination:

2020-01-15 08:17:54.802+0000: 17327: debug : qemuProcessHandleMigrationStatus:1647 :
    Migration of domain 0x7fae6801c310 vmski changed state to post-copy-active
2020-01-15 08:17:54.802+0000: 17327: debug : qemuProcessHandleMigrationStatus:1663 :
    Correcting paused state reason for domain vmski to post-copy <--- I assume this emits the VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event
2020-01-15 08:17:55.045+0000: 17327: debug : qemuProcessHandleResume:719 :
    Transitioned guest vmski into running state, reason 'post-copy', event detail 3

Is this the correct the behaviour, should the destination receive VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY as well?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1647365

--- Additional comment from Jiri Denemark on 2020-01-15 10:06:46 UTC ---

Your investigation seems to be correct. The domain is started as paused on the
destination with "migration" reason. Once migration switches to post-copy, the
code in qemuProcessHandleMigrationStatus will update the reason to "post-copy"
and emit a "suspended" event just a moment before the domain is resumed, which
should only happen on the source.

Comment 1 Jiri Denemark 2020-01-16 13:18:53 UTC
Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2020-January/msg00732.html

Comment 2 Jiri Denemark 2020-01-16 14:40:28 UTC
This is now fixed upstream by

commit bd04d63ad97c21b6955710e6473a502f49816a3c
Refs: v6.0.0-23-gbd04d63ad9
Author:     Jiri Denemark <jdenemar>
AuthorDate: Wed Jan 15 15:24:55 2020 +0100
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Jan 16 15:12:19 2020 +0100

    qemu: Don't emit SUSPENDED_POSTCOPY event on destination

    When pause-before-switchover QEMU capability is enabled, we get STOP
    event before MIGRATION event with postcopy-active state. To properly
    handle post-copy migration and emit correct events commit
    v4.10.0-rc1-4-geca9d21e6c added a hack to
    qemuProcessHandleMigrationStatus which translates the paused state
    reason to VIR_DOMAIN_PAUSED_POSTCOPY and emits
    VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event when migration state changes
    to post-copy.

    However, the code was effective on both sides of migration resulting in
    a confusing VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event on the destination
    host, where entering post-copy mode is already properly advertised by
    VIR_DOMAIN_EVENT_RESUMED_POSTCOPY event.

    https://bugzilla.redhat.com/show_bug.cgi?id=1791458

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Ján Tomko <jtomko>

Comment 5 yafu 2020-02-11 03:53:42 UTC
Verified with libvirt-6.0.0-4.module+el8.2.0+5642+838f3513.x86_64.

Test steps are the same with https://bugzilla.redhat.com/show_bug.cgi?id=1791886#c10 and https://bugzilla.redhat.com/show_bug.cgi?id=1791886#c11.

Comment 7 errata-xmlrpc 2020-05-05 09:55:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017


Note You need to log in before you can comment on or make changes to this bug.