1527416 – Wrong state returned in VM getStats when actual state changes in the middle

Bug 1527416 - Wrong state returned in VM getStats when actual state changes in the middle

Summary: Wrong state returned in VM getStats when actual state changes in the middle

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	vdsm
Classification:	oVirt
Component:	Core
Sub Component:
Version:	4.20.15
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	ovirt-4.2.1
Target Release:	---
Assignee:	Milan Zamazal
QA Contact:	Israel Pinto
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-12-19 11:41 UTC by Michal Skrivanek
Modified:	2018-02-12 11:54 UTC (History)
CC List:	2 users (show)
Fixed In Version:	vdsm v4.20.14
Clone Of:
Environment:
Last Closed:	2018-02-12 11:54:16 UTC
oVirt Team:	Virt
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.2+ ykaul: blocker+

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
oVirt gerrit	85583	master	MERGED	virt: Don't update disk mapping when VM is not Up	2018-01-03 09:01:31 UTC
oVirt gerrit	85638	master	MERGED	virt: Fix swapped variable value in Vm._update_guest_disk_mapping	2018-01-02 14:16:34 UTC
oVirt gerrit	85669	master	MERGED	virt: Ensure exit info presence in reported Down status	2018-01-04 15:34:14 UTC

Description Michal Skrivanek 2017-12-19 11:41:11 UTC

the guest drive mapping introduced a significant delay into the VM.getStats call since it tries to update the mapping when it detects a change. That is likely to happen on lifecycle changes. In the OST case it took 1.2s to finish the whole call, and in the meantime the migration has finished. The getStats() call is not written with possible state change in mind, so if it so happens and the state moves from anything to Down in the middle of it it returns a Down state without exitCode and exitReason which confuses engine. We started to use the exitReason code to differentiate the various flavors of Down in engine in ~4.1 and in this case it results in misleading “VM powered off by admin” case

we need to fix the VM.getStats() to handle VM state changes in the middle
we need to fix the guest drive mapping updates to handle cleanly situations when the VM is either not ready yet or already gone

See http://lists.ovirt.org/pipermail/devel/2017-December/032282.html

Comment 1 Michal Skrivanek 2017-12-19 11:42:47 UTC

workaround should be to not run ovirt-guest-agent in the guest during VM migration

Comment 2 Israel Pinto 2018-01-25 08:43:59 UTC

Verify with:
Engine Version: 4.2.1.2-0.1.el7
Host: 
OS Version: RHEL - 7.4 - 18.el7
Kernel Version:3.10.0 - 693.17.1.el7.x86_64
KVM Version:2.9.0 - 16.el7_4.14
LIBVIRT Version:libvirt-3.2.0-14.el7_4.7
VDSM Version:vdsm-4.20.14-1.el7ev

Steps: 
1. Create 12 VMs and start them
2. Set migration bandwidth to 5 mbps (min migration time of 1 min 50 sec) 
3. Migrate all VMs and monitor VM status
Results:
All VMs migrated successfully,
The status reported in the UI was correct for all VMS

Comment 3 Sandro Bonazzola 2018-02-12 11:54:16 UTC

This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.