Bug 1527416 - Wrong state returned in VM getStats when actual state changes in the middle
Summary: Wrong state returned in VM getStats when actual state changes in the middle
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.20.15
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ovirt-4.2.1
: ---
Assignee: Milan Zamazal
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-19 11:41 UTC by Michal Skrivanek
Modified: 2018-02-12 11:54 UTC (History)
2 users (show)

Fixed In Version: vdsm v4.20.14
Clone Of:
Environment:
Last Closed: 2018-02-12 11:54:16 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.2+
ykaul: blocker+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 85583 0 master MERGED virt: Don't update disk mapping when VM is not Up 2018-01-03 09:01:31 UTC
oVirt gerrit 85638 0 master MERGED virt: Fix swapped variable value in Vm._update_guest_disk_mapping 2018-01-02 14:16:34 UTC
oVirt gerrit 85669 0 master MERGED virt: Ensure exit info presence in reported Down status 2018-01-04 15:34:14 UTC

Description Michal Skrivanek 2017-12-19 11:41:11 UTC
the guest drive mapping introduced a significant delay into the VM.getStats call since it tries to update the mapping when it detects a change. That is likely to happen on lifecycle changes. In the OST case it took 1.2s to finish the whole call, and in the meantime the migration has finished. The getStats() call is not written with possible state change in mind, so if it so happens and the state moves from anything to Down in the middle of it it returns a Down state without exitCode and exitReason which confuses engine. We started to use the exitReason code to differentiate the various flavors of Down in engine in ~4.1 and in this case it results in misleading “VM powered off by admin” case

we need to fix the VM.getStats() to handle VM state changes in the middle
we need to fix the guest drive mapping updates to handle cleanly situations when the VM is either not ready yet or already gone

See http://lists.ovirt.org/pipermail/devel/2017-December/032282.html

Comment 1 Michal Skrivanek 2017-12-19 11:42:47 UTC
workaround should be to not run ovirt-guest-agent in the guest during VM migration

Comment 2 Israel Pinto 2018-01-25 08:43:59 UTC
Verify with:
Engine Version: 4.2.1.2-0.1.el7
Host: 
OS Version: RHEL - 7.4 - 18.el7
Kernel Version:3.10.0 - 693.17.1.el7.x86_64
KVM Version:2.9.0 - 16.el7_4.14
LIBVIRT Version:libvirt-3.2.0-14.el7_4.7
VDSM Version:vdsm-4.20.14-1.el7ev

Steps: 
1. Create 12 VMs and start them
2. Set migration bandwidth to 5 mbps (min migration time of 1 min 50 sec) 
3. Migrate all VMs and monitor VM status
Results:
All VMs migrated successfully,
The status reported in the UI was correct for all VMS

Comment 3 Sandro Bonazzola 2018-02-12 11:54:16 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.