+++ This bug was initially created as a clone of Bug #1774230 +++ Description of problem: During post-copy migration libvirtd on the destination host unexpectedly emits VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY lifecycle event just before resuming the migration in post-copy mode. Version-Release number of selected component (if applicable): Any libvirt version since RHEL 7.7 (bug 1647365): libvirt-4.5.0-23.el7_7.5 libvirt-4.5.0-31.el7 libvirt-4.5.0-35.2.el8 libvirt-5.6.0-10.el8 libvirt-6.0.0-1.el8 How reproducible: 100% Steps to Reproduce: 1. start a new domain on a source host 2. make the domain dirty memory (e.g., by running stress command): stress --vm 2 --vm-bytes 512M 3. start watching for lifecycle events on a destination host: virsh event --event lifecycle --loop --timestamp 4. migrate the domain from the source host to the destination host: virsh migrate --p2p --live --postcopy --postcopy-after-precopy $DOM $DEST_URI 5. check the lifecycle events reported on the destination during migration Actual results: 2020-01-15 14:20:26.689+0000: event 'lifecycle' for domain nest: Started Migrated 2020-01-15 14:21:01.837+0000: event 'lifecycle' for domain nest: Suspended Post-copy 2020-01-15 14:21:03.266+0000: event 'lifecycle' for domain nest: Resumed Post-copy 2020-01-15 14:21:32.060+0000: event 'lifecycle' for domain nest: Resumed Migrated Expected results: 2020-01-15 14:28:53.803+0000: event 'lifecycle' for domain nest: Started Migrated 2020-01-15 14:28:56.156+0000: event 'lifecycle' for domain nest: Resumed Post-copy 2020-01-15 14:28:56.258+0000: event 'lifecycle' for domain nest: Resumed Migrated In other words, no "Suspended Post-copy" event should be reported. Additional info: This issue was nicely analyzed in the original bug 1774230: --- Additional comment from Benny Zlotnik on 2020-01-15 09:02:25 UTC --- Hi Jiri, After investigating this bug and discussing the proposed patch with Milan, there is something unclear. It seems that in post-copy migrate both source and destination receive VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY, and it is probably has something to do with the change from[1], as I see the following logs on the destination: 2020-01-15 08:17:54.802+0000: 17327: debug : qemuProcessHandleMigrationStatus:1647 : Migration of domain 0x7fae6801c310 vmski changed state to post-copy-active 2020-01-15 08:17:54.802+0000: 17327: debug : qemuProcessHandleMigrationStatus:1663 : Correcting paused state reason for domain vmski to post-copy <--- I assume this emits the VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event 2020-01-15 08:17:55.045+0000: 17327: debug : qemuProcessHandleResume:719 : Transitioned guest vmski into running state, reason 'post-copy', event detail 3 Is this the correct the behaviour, should the destination receive VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY as well? [1] https://bugzilla.redhat.com/show_bug.cgi?id=1647365 --- Additional comment from Jiri Denemark on 2020-01-15 10:06:46 UTC --- Your investigation seems to be correct. The domain is started as paused on the destination with "migration" reason. Once migration switches to post-copy, the code in qemuProcessHandleMigrationStatus will update the reason to "post-copy" and emit a "suspended" event just a moment before the domain is resumed, which should only happen on the source.
Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2020-January/msg00732.html
This is now fixed upstream by commit bd04d63ad97c21b6955710e6473a502f49816a3c Refs: v6.0.0-23-gbd04d63ad9 Author: Jiri Denemark <jdenemar> AuthorDate: Wed Jan 15 15:24:55 2020 +0100 Commit: Jiri Denemark <jdenemar> CommitDate: Thu Jan 16 15:12:19 2020 +0100 qemu: Don't emit SUSPENDED_POSTCOPY event on destination When pause-before-switchover QEMU capability is enabled, we get STOP event before MIGRATION event with postcopy-active state. To properly handle post-copy migration and emit correct events commit v4.10.0-rc1-4-geca9d21e6c added a hack to qemuProcessHandleMigrationStatus which translates the paused state reason to VIR_DOMAIN_PAUSED_POSTCOPY and emits VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event when migration state changes to post-copy. However, the code was effective on both sides of migration resulting in a confusing VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event on the destination host, where entering post-copy mode is already properly advertised by VIR_DOMAIN_EVENT_RESUMED_POSTCOPY event. https://bugzilla.redhat.com/show_bug.cgi?id=1791458 Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Ján Tomko <jtomko>
Verified with libvirt-6.0.0-4.module+el8.2.0+5642+838f3513.x86_64. Test steps are the same with https://bugzilla.redhat.com/show_bug.cgi?id=1791886#c10 and https://bugzilla.redhat.com/show_bug.cgi?id=1791886#c11.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017