Created attachment 1515044 [details]
Description of problem:
Regression in Migration of VM that starts in pause mode: took 11 hours.
Version-Release number of selected component (if applicable):
Engine Version: 4.2.8-0.1.el7ev
OS Version: RHEL - 7.6 - 4.el7
LIBVIRT Version: libvirt-4.5.0-10.el7_6.3
VDSM Version: vdsm-4.20.43-1.el7
Steps to Reproduce:
1. Create VM and set start in pause mode
2. Run VM
3. Migrate VM
Migration stuck and eventually finished after 11 hours.
Migration completed fine.
The attached vdsm logs are from a different time period, they do not include the migration.
The migration started at 2018-12-16 17:13:44,196+02 and ended at 2018-12-17 05:13:56,950+02.
But the vdsm logs on the source host start at 2018-12-17 13:01:11,258+02 and on the destination host at 2018-12-17 14:01:01,646+02.
Re-targeting to 4.3.1 since it is missing a patch, an acked blocker flag, or both
Closing, since this has not been observed in months
Created attachment 1551164 [details]
Source logs right timeframe
Created attachment 1551165 [details]
Destination logs right timeframe
Andrej, Milan, ideas?
I was able to log into the enviroment to check (and get the right vdsm logs). The VM is up, and the vdsm logs from the destination host fire the appropriate JSONRPC event, but it's never received by the engine, so the VM is stuck in migration.
It is actually up. Kicking vdsm to refresh VMs unstuck the engine, but it looks racey
Polina, how did you reproduce manually?
I reproduced it manually . The VM was stuck in migration a long period
actually here is the event note:
Migration completed (VM: golden_env_mixed_virtio_1_0, Source: host_mixed_3, Destination: host_mixed_2, Duration: 6 hours 24 minutes 19 seconds, Total: 6 hours 24 minutes 27 seconds, Actual downtime: 5ms)
and also, I'm reproducing it just now in the same env.
Event ID 62 : Migration started (VM: golden_env_mixed_virtio_1_0, Source: host_mixed_3, Destination: host_mixed_2, User: admin@internal-authz).
migration is stuck.
I can reproduce the bug. Vdsm doesn't detect end of the migration on the destination because it waits for a resume event from libvirt on the destination, which doesn't come in case the VM is paused (this may be a libvirt change and the cause of the regression). That means Vdsm keeps reporting Migration Destination status to Engine.
A fix may be a bit delicate, since libvirt apparently doesn't send a specific migration-finished event in such a case. It sends more general events and we should be cautious not to confuse end of migration with something else.
Created attachment 1623746 [details]
migration of paused VM
Hi Ryan, it is still reproducible on ovirt-engine-22.214.171.124-0.1.el7.noarch
I attach the logs migration_of_paused_vm.tar.gz where you could see migration took 12 hr for golden_env_mixed_virtio_5, then succeeded.
Sometimes for such configuration migration fails and the VM is down.
2019-10-08 21:38:11,005+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [92795e81-bcdf-4bd7-849f-18ee7ae5ccdc] EVENT_ID: VM_MIGRATION_START(62), Migration started (VM: golden_env_mixed_virtio_5, Source: host_mixed_1, Destination: host_mixed_3, User: admin@internal-authz).
2019-10-08 21:38:16,019+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-10)  Handing over VM '838d7e75-125b-40e7-8c54-c99efd71f5b5'(golden_env_mixed_virtio_5) to Host '9fc15ca8-8494-4cb4-bc56-22ad80e341f4'. Setting VM to status 'MigratingTo'
2019-10-09 09:38:12,497+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-4)  VM '838d7e75-125b-40e7-8c54-c99efd71f5b5'(golden_env_mixed_virtio_5) moved from 'MigratingTo' --> 'Paused'
verified on 4.4.http://bob-dr.lab.eng.brq.redhat.com/builds/4.4/rhv-4.4.0-23
by running virt/rhevmtests.compute.virt.reg_vms.vm_power_actions_test.TestSuspendVM.test_migrate_suspend_vm
I 4.3 the bug still happens
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.