RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1791886 - VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event is emitted for incoming migration
Summary: VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event is emitted for incoming migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 7.8
Assignee: Jiri Denemark
QA Contact: yafu
URL:
Whiteboard:
Depends On: 1791458
Blocks: 1774230
TreeView+ depends on / blocked
 
Reported: 2020-01-16 16:33 UTC by Jiri Denemark
Modified: 2020-04-29 09:38 UTC (History)
18 users (show)

Fixed In Version: libvirt-4.5.0-32.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1791458
Environment:
Last Closed: 2020-03-31 19:59:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:1094 0 None None None 2020-03-31 19:59:44 UTC

Description Jiri Denemark 2020-01-16 16:33:28 UTC
+++ This bug was initially created as a clone of Bug #1791458 +++

+++ This bug was initially created as a clone of Bug #1774230 +++

Description of problem:

During post-copy migration libvirtd on the destination host unexpectedly emits
VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY lifecycle event just before resuming the
migration in post-copy mode.

Version-Release number of selected component (if applicable):

Any libvirt version since RHEL 7.7 (bug 1647365):
libvirt-4.5.0-23.el7_7.5
libvirt-4.5.0-31.el7
libvirt-4.5.0-35.2.el8
libvirt-5.6.0-10.el8
libvirt-6.0.0-1.el8

How reproducible:

100%

Steps to Reproduce:

1. start a new domain on a source host
2. make the domain dirty memory (e.g., by running stress command):
    stress --vm 2 --vm-bytes 512M
3. start watching for lifecycle events on a destination host:
    virsh event --event lifecycle --loop --timestamp
4. migrate the domain from the source host to the destination host:
    virsh migrate --p2p --live --postcopy --postcopy-after-precopy $DOM $DEST_URI
5. check the lifecycle events reported on the destination during migration

Actual results:

2020-01-15 14:20:26.689+0000: event 'lifecycle' for domain nest: Started Migrated
2020-01-15 14:21:01.837+0000: event 'lifecycle' for domain nest: Suspended Post-copy
2020-01-15 14:21:03.266+0000: event 'lifecycle' for domain nest: Resumed Post-copy
2020-01-15 14:21:32.060+0000: event 'lifecycle' for domain nest: Resumed Migrated

Expected results:

2020-01-15 14:28:53.803+0000: event 'lifecycle' for domain nest: Started Migrated
2020-01-15 14:28:56.156+0000: event 'lifecycle' for domain nest: Resumed Post-copy
2020-01-15 14:28:56.258+0000: event 'lifecycle' for domain nest: Resumed Migrated

In other words, no "Suspended Post-copy" event should be reported.

Additional info:

This issue was nicely analyzed in the original bug 1774230:

--- Additional comment from Benny Zlotnik on 2020-01-15 09:02:25 UTC ---

Hi Jiri,

After investigating this bug and discussing the proposed patch with Milan,
there is something unclear. It seems that in post-copy migrate both source and
destination receive VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY, and it is probably
has something to do with the change from[1], as I see the following logs on
the destination:

2020-01-15 08:17:54.802+0000: 17327: debug : qemuProcessHandleMigrationStatus:1647 :
    Migration of domain 0x7fae6801c310 vmski changed state to post-copy-active
2020-01-15 08:17:54.802+0000: 17327: debug : qemuProcessHandleMigrationStatus:1663 :
    Correcting paused state reason for domain vmski to post-copy <--- I assume this emits the VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event
2020-01-15 08:17:55.045+0000: 17327: debug : qemuProcessHandleResume:719 :
    Transitioned guest vmski into running state, reason 'post-copy', event detail 3

Is this the correct the behaviour, should the destination receive VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY as well?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1647365

--- Additional comment from Jiri Denemark on 2020-01-15 10:06:46 UTC ---

Your investigation seems to be correct. The domain is started as paused on the
destination with "migration" reason. Once migration switches to post-copy, the
code in qemuProcessHandleMigrationStatus will update the reason to "post-copy"
and emit a "suspended" event just a moment before the domain is resumed, which
should only happen on the source.

--- Additional comment from Jiri Denemark on 2020-01-16 13:18:53 UTC ---

Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2020-January/msg00732.html

--- Additional comment from Jiri Denemark on 2020-01-16 14:40:28 UTC ---

This is now fixed upstream by

commit bd04d63ad97c21b6955710e6473a502f49816a3c
Refs: v6.0.0-23-gbd04d63ad9
Author:     Jiri Denemark <jdenemar>
AuthorDate: Wed Jan 15 15:24:55 2020 +0100
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Jan 16 15:12:19 2020 +0100

    qemu: Don't emit SUSPENDED_POSTCOPY event on destination

    When pause-before-switchover QEMU capability is enabled, we get STOP
    event before MIGRATION event with postcopy-active state. To properly
    handle post-copy migration and emit correct events commit
    v4.10.0-rc1-4-geca9d21e6c added a hack to
    qemuProcessHandleMigrationStatus which translates the paused state
    reason to VIR_DOMAIN_PAUSED_POSTCOPY and emits
    VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event when migration state changes
    to post-copy.

    However, the code was effective on both sides of migration resulting in
    a confusing VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event on the destination
    host, where entering post-copy mode is already properly advertised by
    VIR_DOMAIN_EVENT_RESUMED_POSTCOPY event.

    https://bugzilla.redhat.com/show_bug.cgi?id=1791458

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Ján Tomko <jtomko>

Comment 9 yafu 2020-02-05 09:32:48 UTC
Reproduced with libvirt-4.5.0-30.el7.x86_64.

Test steps:
1. Monitor domain events on both source and target hosts:
# virsh event --all --loop 

2. Start guest and launch stress in guest:
# stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M 

3. Do live migration with --postcopy and switch to postcopy:
Terminal 1: #virsh migrate yafu qemu+ssh://X.X.X.X/system --live --verbose --postcopy

Terminal 2: # virsh migrate-postcopy yafu

4.Check the event on the source host:
event 'migration-iteration' for domain yafu: iteration: '1'
event 'migration-iteration' for domain yafu: iteration: '2'
event 'lifecycle' for domain yafu: Suspended Migrated
event 'lifecycle' for domain yafu: Suspended Post-copy
event 'migration-iteration' for domain yafu: iteration: '3' 
event 'lifecycle' for domain yafu: Stopped Migrated
event 'job-completed' for domain yafu:
	operation: 5
	time_elapsed: 8369
	time_elapsed_net: 8365
	downtime: 167
	downtime_net: 163
	setup_time: 26
	data_total: 2208120832
	data_processed: 913950968
	data_remaining: 0
	memory_total: 2208120832
	memory_processed: 913950968
	memory_remaining: 0
	memory_bps: 114331992
	memory_constant: 400031
	memory_normal: 221820
	memory_normal_bytes: 908574720
	memory_dirty_rate: 0
	memory_iteration: 3
	memory_page_size: 4096
	disk_total: 0
	disk_processed: 0
	disk_remaining: 0

5.Check the events on the target host:
event 'lifecycle' for domain yafu: Started Migrated
***event 'lifecycle' for domain yafu: Suspended Post-copy  ***
event 'lifecycle' for domain yafu: Resumed Post-copy
event 'lifecycle' for domain yafu: Resumed Migrated

6.Check the libvirtd log on the target host:
# cat /var/log/libvirt/libvirtd.log | grep -i post-copy
2020-02-05 08:07:57.153+0000: 11275: debug : qemuProcessHandleMigrationStatus:1553 : Correcting paused state reason for domain yafu to post-copy
2020-02-05 08:07:57.415+0000: 11279: debug : qemuMigrationAnyCompleted:1524 : Migration switched to post-copy
2020-02-05 08:07:57.434+0000: 11275: debug : qemuProcessHandleResume:708 : Transitioned guest yafu into running state, reason 'post-copy', event detail 3

Comment 10 yafu 2020-02-05 10:55:08 UTC
Verified with libvirt-4.5.0-32.el7.x86_64.

Test steps:
1. Monitor domain events on both source and target hosts:
# virsh event --all --loop 

2. Start guest and launch stress in guest:
# stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M 

3. Do live migration with --postcopy and switch to postcopy:
Terminal 1: #virsh migrate yafu qemu+ssh://X.X.X.X/system --live --verbose --postcopy

Terminal 2: # virsh migrate-postcopy yafu


4.Check the event on the source host:
event 'lifecycle' for domain yafu: Suspended Migrated
event 'lifecycle' for domain yafu: Suspended Post-copy
event 'migration-iteration' for domain yafu: iteration: '9'
event 'lifecycle' for domain yafu: Stopped Migrated
event 'job-completed' for domain yafu:
	operation: 5
	time_elapsed: 8792
	time_elapsed_net: 8791
	downtime: 126
	downtime_net: 125
	setup_time: 25
	data_total: 2208120832
	data_processed: 2594877328
	data_remaining: 0
	memory_total: 2208120832
	memory_processed: 2594877328
	memory_remaining: 0
	memory_bps: 113848113
	memory_constant: 803092
	memory_normal: 630518
	memory_normal_bytes: 2582601728
	memory_dirty_rate: 0
	memory_iteration: 9
	memory_page_size: 4096
	disk_total: 0
	disk_processed: 0
	disk_remaining: 0

5.Check events on target host and no VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event:
# virsh event --all --loop
event 'lifecycle' for domain yafu: Started Migrated
event 'lifecycle' for domain yafu: Resumed Post-copy
event 'lifecycle' for domain yafu: Resumed Migrated

6.Check libvirtd.log on target host:
# cat /var/log/libvirt/libvirtd.log | grep -i post-copy
2020-02-05 10:08:15.258+0000: 16346: debug : qemuMigrationAnyCompleted:1524 : Migration switched to post-copy
2020-02-05 10:08:15.274+0000: 16343: debug : qemuProcessHandleResume:708 : Transitioned guest yafu into running state, reason 'post-copy', event detail 3

Comment 11 yafu 2020-02-05 14:10:49 UTC
Also check the domjobinfo after migration completed:
# virsh domjobinfo yafu --completed
Job type:         Completed   
Operation:        Outgoing migration
Time elapsed:     8198         ms
Time elapsed w/o network: 8196         ms
Data processed:   849.538 MiB
Data remaining:   0.000 B
Data total:       2.056 GiB
Memory processed: 849.538 MiB
Memory remaining: 0.000 B
Memory total:     2.056 GiB
Memory bandwidth: 108.890 MiB/s
Dirty rate:       0            pages/s
Page size:        4096         bytes
Iteration:        2           
Constant pages:   391877      
Normal pages:     216198      
Normal data:      844.523 MiB
Total downtime:   196          ms
Downtime w/o network: 194          ms
Setup time:       31           ms

Comment 13 errata-xmlrpc 2020-03-31 19:59:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1094


Note You need to log in before you can comment on or make changes to this bug.