Bug 1212077
| Summary: | Add support for QMP event notification when migration finishes | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jiri Denemark <jdenemar> |
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 7.0 | CC: | berrange, dgilbert, dyuan, ehabkost, fjin, gcosta, hhuang, jdenemar, juzhang, lcapitulino, llim, mprivozn, mzhan, pdwyer, quintela, rbalakri, tburke, virt-bugs, virt-maint, yanyang, zhwang, zpeng |
| Target Milestone: | rc | Keywords: | FutureFeature |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-1.2.17-3.el7 | Doc Type: | Enhancement |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 580006 | Environment: | |
| Last Closed: | 2015-11-19 06:28:54 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 580006 | ||
| Bug Blocks: | |||
|
Description
Jiri Denemark
2015-04-15 13:59:35 UTC
*** Bug 1212032 has been marked as a duplicate of this bug. *** Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2015-June/msg00064.html The third version of the series is now ACKed upstream: https://www.redhat.com/archives/libvir-list/2015-June/msg00451.html I will push the patches once the corresponding QEMU part is applied upstream. Implemented upstream by v1.2.17-66-ga60ee61..v1.2.17-70-g3409f5b:
commit a60ee613c499c255faf7c36e7cd8582b5a79ca57
Author: Jiri Denemark <jdenemar>
Date: Thu May 28 13:35:52 2015 +0200
qemu_monitor: Wire up MIGRATION event
Thanks to Juan's work QEMU finally emits an event whenever migration
state changes.
Signed-off-by: Jiri Denemark <jdenemar>
commit 3df4d2a45aaafd0fe2191c648c83a609726638f7
Author: Jiri Denemark <jdenemar>
Date: Tue Jul 7 18:17:25 2015 +0200
qemu: Enable migration events on QMP monitor
Even if QEMU supports migration events it doesn't send them by default.
We have to enable them by calling migrate-set-capabilities. Let's enable
migration events everytime we can and clear QEMU_CAPS_MIGRATION_EVENT in
case migrate-set-capabilities does not support events.
Signed-off-by: Jiri Denemark <jdenemar>
commit 66c95964a5706c223f4857e55396318c857abb6f
Author: Jiri Denemark <jdenemar>
Date: Tue May 26 13:42:06 2015 +0200
qemuDomainGetJobStatsInternal: Support migration events
When QEMU supports migration events the qemuDomainJobInfo structure will
no longer be updated with migration statistics. We have to enter a job
and explicitly ask QEMU every time virDomainGetJob{Info,Stats} is
called.
Signed-off-by: Jiri Denemark <jdenemar>
commit 6d2edb6a42d0d41971fcf53c16cdc785d36c3ecc
Author: Jiri Denemark <jdenemar>
Date: Fri May 29 08:37:59 2015 +0200
qemu: Update migration state according to MIGRATION event
We don't need to call query-migrate every 50ms when we get the current
migration state via MIGRATION event.
Signed-off-by: Jiri Denemark <jdenemar>
commit 3409f5bc4e0ea870982f4eee3bb6f97fe9f9f883
Author: Jiri Denemark <jdenemar>
Date: Fri May 29 08:38:44 2015 +0200
qemu: Wait for migration events on domain condition
Since we already support the MIGRATION event, we just need to make sure
the domain condition is signalled whenever a p2p connection drops or the
domain is paused due to IO error and we can avoid waking up every 50 ms
to check whether something happened.
Signed-off-by: Jiri Denemark <jdenemar>
I'm affraid this introduced a small regression. Fortunatelly I've a patch for that: https://www.redhat.com/archives/libvir-list/2015-July/msg00503.html Moving back to ASSIGNED. Pushed upstream:
commit 45cc2fca5c97d7e6d6e00389c6f459b5660d0f1f
Author: Michal Privoznik <mprivozn>
AuthorDate: Mon Jul 13 14:15:03 2015 +0200
Commit: Michal Privoznik <mprivozn>
CommitDate: Mon Jul 13 15:07:12 2015 +0200
qemuProcessHandleMigrationStatus: Update migration status more frequently
After Jirka's migration patches libvirt is listening on migration
events from qemu instead of actively polling on the monitor. There is,
however, a little regression (introduced in 6d2edb6a42d0d41). The
problem is, the current status of migration job is updated in
qemuProcessHandleMigrationStatus if and only if migration job was
started. But eventually every asynchronous job may result in
migration. Therefore, since this job is not strictly a
migration job, internal state was not updated and later checks failed:
virsh # save fedora22 /tmp/fedora22_ble.save
error: Failed to save domain fedora22 to /tmp/fedora22_ble.save
error: operation failed: domain save job: is not active
Signed-off-by: Michal Privoznik <mprivozn>
v1.2.17-90-g45cc2fc
To POST:
http://post-office.corp.redhat.com/archives/rhvirt-patches/2015-July/msg00590.html
Reproduce comment#9 on build: libvirt-1.2.17-2.el7 qemu-kvm-rhev-2.3.0-18.el7.x86_64 # virsh save r71 /tmp/r71.save error: Failed to save domain r71 to /tmp/r71.save error: operation failed: domain save job: is not active Verify it on build: libvirt-1.2.17-5.el7 qemu-kvm-rhev-2.3.0-18.el7.x86_64 # virsh save r71 /tmp/r71.save Domain r71 saved to /tmp/r71.save Reproduce this bug on build: libvirt-1.2.8-16.el7.x86_64 qemu-kvm-rhev-2.2.0-9.el7.x86_64 Got this log 20 times per minute during migration: 2015-08-21 03:26:17.583+0000: 9231: debug : qemuMonitorGetMigrationStatus:2271 : mon=0x7fbe2c272b00 Verify this bug on build: libvirt-1.2.17-5.el7 qemu-kvm-rhev-2.3.0-18.el7.x86_64 Scenario 1(migration successfully without any error): On source host: # virsh migrate r71 qemu+ssh://10.66.6.6/system --live # tailf /var/log/libvirtd_debug.log | grep -E 'qemuProcessHandleMigrationStatus|qemuMonitorGetMigrationStatus' 2015-08-21 08:46:31.522+0000: 32688: debug : qemuProcessHandleMigrationStatus:1535 : Migration of domain 0x7fcf2026e4c0 r71 changed state to setup 2015-08-21 08:46:31.524+0000: 32688: debug : qemuProcessHandleMigrationStatus:1535 : Migration of domain 0x7fcf2026e4c0 r71 changed state to active 2015-08-21 08:46:34.829+0000: 32696: debug : qemuMonitorGetMigrationStatus:2132 : mon:0x7fcf60020230 vm:0x7fcf2026e4c0 json:1 fd:26 2015-08-21 08:46:55.903+0000: 32688: debug : qemuProcessHandleMigrationStatus:1535 : Migration of domain 0x7fcf2026e4c0 r71 changed state to completed 2015-08-21 08:46:55.903+0000: 32694: debug : qemuMonitorGetMigrationStatus:2132 : mon:0x7fcf60020230 vm:0x7fcf2026e4c0 json:1 fd:26 # virsh event r71 --loop --all event 'lifecycle' for domain r71: Suspended Paused event 'lifecycle' for domain r71: Stopped Migrated Get dominfo during migration: # virsh domjobinfo r71 Job type: Unbounded Time elapsed: 5310 ms Data processed: 94.084 MiB Data remaining: 764.305 MiB Data total: 1.103 GiB Memory processed: 94.084 MiB Memory remaining: 764.305 MiB Memory total: 1.103 GiB Memory bandwidth: 20.001 MiB/s Constant pages: 69495 Normal pages: 23886 Normal data: 93.305 MiB Expected downtime: 300 ms Setup time: 3 ms On target host: # virsh event r71 --loop --all event 'agent-lifecycle' for domain r71: state: 'disconnected' reason: 'domain started' event 'lifecycle' for domain r71: Started Migrated event 'lifecycle' for domain r71: Resumed Migrated event 'agent-lifecycle' for domain r71: state: 'connected' reason: 'channel event' # tailf /var/log/libvirtd_debug.log | grep -E 'qemuProcessHandleMigrationStatus|qemuMonitorGetMigrationStatus' 2015-08-21 08:46:31.497+0000: 5107: debug : qemuProcessHandleMigrationStatus:1532 : Migration of domain 0x7f596401ea50 r71 changed state to active 2015-08-21 08:46:55.927+0000: 5107: debug : qemuProcessHandleMigrationStatus:1532 : Migration of domain 0x7f596401ea50 r71 changed state to completed 2015-08-21 08:46:55.927+0000: 5107: debug : qemuProcessHandleMigrationStatus:1536 : got MIGRATION event without a migration job Scenario 2(p2p disconnect during migration): On source host: Migrate via p2p, and execute command "iptables -A INPUT -s 10.66.6.6 -j DROP" during migration, after 30s, migration failed: # virsh migrate r71 qemu+ssh://10.66.6.6/system --live --p2p error: operation failed: Lost connection to destination host # tailf /var/log/libvirtd_debug.log | grep -E 'qemuProcessHandleMigrationStatus|qemuMonitorGetMigrationStatus' 2015-08-21 08:58:12.042+0000: 32688: debug : qemuProcessHandleMigrationStatus:1535 : Migration of domain 0x7fcf2026e4c0 r71 changed state to setup 2015-08-21 08:58:12.045+0000: 32688: debug : qemuProcessHandleMigrationStatus:1535 : Migration of domain 0x7fcf2026e4c0 r71 changed state to active 2015-08-21 08:58:42.034+0000: 32688: debug : qemuProcessHandleMigrationStatus:1535 : Migration of domain 0x7fcf2026e4c0 r71 changed state to cancelling 2015-08-21 08:58:42.035+0000: 32688: debug : qemuProcessHandleMigrationStatus:1535 : Migration of domain 0x7fcf2026e4c0 r71 changed state to cancelled # virsh event r71 --loop --all On target host: # virsh event r71 --loop --all event 'agent-lifecycle' for domain r71: state: 'disconnected' reason: 'domain started' event 'lifecycle' for domain r71: Started Migrated # tailf /var/log/libvirtd_debug.log | grep -E 'qemuProcessHandleMigrationStatus|qemuMonitorGetMigrationStatus' 2015-08-21 08:58:12.018+0000: 5107: debug : qemuProcessHandleMigrationStatus:1532 : Migration of domain 0x7f596401ea50 r71 changed state to active Scenario 3(I/O error during migration): a)without --abort-on-error option, migration can succeed even if I/O error during migration. On source host: # virsh migrate r71 qemu+ssh://10.66.6.6/system --live # virsh event r71 --loop --all event 'io-error' for domain r71: /var/lib/libvirt/images/mypool/disk1.img (virtio-disk1) pause event 'io-error-reason' for domain r71: /var/lib/libvirt/images/mypool/disk1.img (virtio-disk1) pause due to enospc event 'lifecycle' for domain r71: Suspended I/O Error event 'io-error' for domain r71: /var/lib/libvirt/images/mypool/disk1.img (virtio-disk1) pause event 'io-error-reason' for domain r71: /var/lib/libvirt/images/mypool/disk1.img (virtio-disk1) pause due to enospc event 'lifecycle' for domain r71: Stopped Migrated # tailf /var/log/libvirtd_debug.log | grep -E 'qemuProcessHandleMigrationStatus|qemuMonitorGetMigrationStatus' 2015-08-21 09:52:47.957+0000: 32688: debug : qemuProcessHandleMigrationStatus:1535 : Migration of domain 0x7fcf2026e4c0 r71 changed state to setup 2015-08-21 09:52:47.958+0000: 32688: debug : qemuProcessHandleMigrationStatus:1535 : Migration of domain 0x7fcf2026e4c0 r71 changed state to active 2015-08-21 09:55:31.130+0000: 32688: debug : qemuProcessHandleMigrationStatus:1535 : Migration of domain 0x7fcf2026e4c0 r71 changed state to completed 2015-08-21 09:55:31.131+0000: 32694: debug : qemuMonitorGetMigrationStatus:2132 : mon:0x7fcf60010bd0 vm:0x7fcf2026e4c0 json:1 fd:26 On target host: # virsh event r71 --loop --all event 'agent-lifecycle' for domain r71: state: 'disconnected' reason: 'domain started' event 'lifecycle' for domain r71: Started Migrated event 'lifecycle' for domain r71: Resumed Migrated event 'agent-lifecycle' for domain r71: state: 'connected' reason: 'channel event' # tailf /var/log/libvirtd_debug.log | grep -E 2015-08-21 09:52:47.930+0000: 5107: debug : qemuProcessHandleMigrationStatus:1532 : Migration of domain 0x7f596401ea50 r71 changed state to active 2015-08-21 09:55:31.134+0000: 5107: debug : qemuProcessHandleMigrationStatus:1532 : Migration of domain 0x7f596401ea50 r71 changed state to completed 2015-08-21 09:55:31.134+0000: 5107: debug : qemuProcessHandleMigrationStatus:1536 : got MIGRATION event without a migration job b)with --abort-on-error option, migration is cancelled if I/O error during migration. On source host: # virsh migrate r71 qemu+ssh://10.66.6.6/system --live --abort-on-error error: internal error: early end of file from monitor: possible problem: 2015-08-21T10:50:17.782271Z qemu-kvm: error while loading state section id 2(ram) 2015-08-21T10:50:17.782445Z qemu-kvm: load of migration failed: Input/output error # virsh event r71 --loop --all event 'io-error' for domain r71: /var/lib/libvirt/images/mypool/disk1.img (virtio-disk1) pause event 'io-error-reason' for domain r71: /var/lib/libvirt/images/mypool/disk1.img (virtio-disk1) pause due to enospc event 'lifecycle' for domain r71: Suspended I/O Error ... event 'lifecycle' for domain r71: Resumed Unpaused event 'lifecycle' for domain r71: Resumed Migrated ... event 'lifecycle' for domain r71: Suspended I/O Error ... # tailf /var/log/libvirtd_debug.log | grep -E 'qemuProcessHandleMigrationStatus|qemuMonitorGetMigrationStatus' 2015-08-21 10:50:15.676+0000: 14267: debug : qemuProcessHandleMigrationStatus:1535 : Migration of domain 0x7f4d30229550 r71 changed state to setup 2015-08-21 10:50:15.680+0000: 14267: debug : qemuProcessHandleMigrationStatus:1535 : Migration of domain 0x7f4d30229550 r71 changed state to active 2015-08-21 10:50:17.871+0000: 14267: debug : qemuProcessHandleMigrationStatus:1535 : Migration of domain 0x7f4d30229550 r71 changed state to cancelling 2015-08-21 10:50:17.871+0000: 14267: debug : qemuProcessHandleMigrationStatus:1535 : Migration of domain 0x7f4d30229550 r71 changed state to cancelled On target host: # tailf /var/log/libvirtd_debug.log | grep -E 'qemuProcessHandleMigrationStatus|qemuMonitorGetMigrationStatus' 2015-08-21 10:50:15.647+0000: 5107: debug : qemuProcessHandleMigrationStatus:1532 : Migration of domain 0x7f596401ea50 r71 changed state to active 2015-08-21 10:50:17.782+0000: 5107: debug : qemuProcessHandleMigrationStatus:1532 : Migration of domain 0x7f596401ea50 r71 changed state to failed # virsh event r71 --loop --all event 'agent-lifecycle' for domain r71: state: 'disconnected' reason: 'domain started' event 'lifecycle' for domain r71: Started Migrated event 'lifecycle' for domain r71: Stopped Failed Is there anything I missed in my verification steps? And a small question, why there is always this log after migration finished on target host: 2015-08-21 09:55:31.134+0000: 5107: debug : qemuProcessHandleMigrationStatus:1536 : got MIGRATION event without a migration job It would be nice to also check that migration works as expected with qemu-kvm package (without "-rhev" suffix) with which libvirt should still be using polling for migration status. Verify with qemu-kvm package, version: libvirt-1.2.17-6.el7.x86_64 qemu-kvm-1.5.3-101.el7.x86_64 Steps: 1.# virsh migrate rhel7d1 qemu+ssh://10.66.6.6/system --live --verbose Migration: [100 %] Got this log 20 times per minute during migration, which indicates libvirt polls for migration status 20 times per minute: debug : qemuMonitorGetMigrationStatus:2132 : mon:0x7f98b800fe90 vm:0x7f98742254f0 json:1 fd:26 2.Get domjobinfo during migration: # virsh domjobinfo rhel7d1 Job type: Unbounded Time elapsed: 4594 ms Data processed: 61.614 MiB Data remaining: 709.559 MiB Data total: 1.102 GiB Memory processed: 61.614 MiB Memory remaining: 709.559 MiB Memory total: 1.102 GiB Memory bandwidth: 15.029 MiB/s Constant pages: 91761 Normal pages: 15541 Normal data: 60.707 MiB Expected downtime: 30 ms Setup time: 2 ms So the verification with qemu-kvm package passed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2202.html |