the guest drive mapping introduced a significant delay into the VM.getStats call since it tries to update the mapping when it detects a change. That is likely to happen on lifecycle changes. In the OST case it took 1.2s to finish the whole call, and in the meantime the migration has finished. The getStats() call is not written with possible state change in mind, so if it so happens and the state moves from anything to Down in the middle of it it returns a Down state without exitCode and exitReason which confuses engine. We started to use the exitReason code to differentiate the various flavors of Down in engine in ~4.1 and in this case it results in misleading “VM powered off by admin” case we need to fix the VM.getStats() to handle VM state changes in the middle we need to fix the guest drive mapping updates to handle cleanly situations when the VM is either not ready yet or already gone See http://lists.ovirt.org/pipermail/devel/2017-December/032282.html
workaround should be to not run ovirt-guest-agent in the guest during VM migration
Verify with: Engine Version: 4.2.1.2-0.1.el7 Host: OS Version: RHEL - 7.4 - 18.el7 Kernel Version:3.10.0 - 693.17.1.el7.x86_64 KVM Version:2.9.0 - 16.el7_4.14 LIBVIRT Version:libvirt-3.2.0-14.el7_4.7 VDSM Version:vdsm-4.20.14-1.el7ev Steps: 1. Create 12 VMs and start them 2. Set migration bandwidth to 5 mbps (min migration time of 1 min 50 sec) 3. Migrate all VMs and monitor VM status Results: All VMs migrated successfully, The status reported in the UI was correct for all VMS
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.