Bug 1196327

Summary:	[performance] bad getVMList output creates unnecessary calls from Engine
Product:	[Retired] oVirt	Reporter:	Francesco Romani <fromani>
Component:	vdsm	Assignee:	Francesco Romani <fromani>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Eldad Marciano <emarcian>
Severity:	unspecified	Docs Contact:
Priority:	high
Version:	3.6	CC:	bazulay, bugs, danken, ecohen, fromani, gklein, lsurette, mgoldboi, ofrenkel, rbalakri, rgolan, sbonazzo, yeylon
Target Milestone:	---
Target Release:	3.5.2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	virt
Fixed In Version:	vdsm-4.16.13-1.el6ev	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1196735 1202360 (view as bug list)		Environment:
Last Closed:	2015-04-29 06:19:59 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Virt	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1186161, 1193058, 1196735, 1198248, 1198680, 1202360, 1203305

Description Francesco Romani 2015-02-25 17:20:52 UTC

Description of problem:
Somewhere during the transition to JSON-RPC, the short output of getVMList changed to be a simple list of UUIDs.
Unfortunately, this is not what the monitoring code in Engine expects.

From a VERY high-level, monitoring does the following
- short cycle: monitoring calls getVMList, to fetch VM UUID *and* its status.
if status chenged since last poll, something interesting happened and *then*
monitoring calls getVMStats(UUID) to learn what happened

- long cycle: monitoring just calls getAllVmStats to get all the informations
about VMs.

Currently, short cycle is 3 seconds, long cycle is 15 seconds.
The whole point of this approach is to minimize the traffic and the VDSM load,
while keeping Engine able to respond quickly.

But, if VDSM retuns just the UUID, then Engine cannot know the status, and enters the recovery mode, so calls getVmStats for each VM.

This is practically equivalent to call getAllVmStats() every 3s (and _also_ every 15s), which is very wasteful.

It is important to point out that this affects *only* performance, the stats
are reported correctly so there is no functional impact.

Version-Release number of selected component (if applicable):
found in VDSM master 380713b80d124d1a19749085f477e7658468bf07, but most likely
introduced earlier

How reproducible:
100% with JSON-RPC protocol

Steps to Reproduce:
1. Configure Engine to use JSON-RPC (default)
2. Run a VM
3. snoop the traffic between VDSM and Engine, see VM.getStats() be called too often

Actual results:
On steady state, VM.getStats() get called after each short cycle, for each running VM

Expected results:
On steady state, VM.getStats() is never called

Additional info:
To be verified: Engine patch may be needd. VDSM-only fix may not be enough.

Comment 1 Roy Golan 2015-02-26 10:05:10 UTC

related to Bug 1196040 ?

Comment 2 Francesco Romani 2015-02-26 10:07:24 UTC

(In reply to Roy Golan from comment #1)
> related to Bug 1196040 ?

The only link so far is that I discovered this issue while investigating in that area, but so far 1196040 seems just noise in the logs.

Comment 7 Gil Klein 2015-03-16 11:46:01 UTC

Re-opening, due to possible incompatibility issue with MOM,  based on:

https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c20

If no code change is needed, feel free to move this BZ back to ON_QA

Comment 8 Eldad Marciano 2015-04-22 10:06:48 UTC

tested on top of vt14.3 , both engine and vdsm.

1 host, 1 vm
duration 600 sec.

profiler analysis results:

0 - getVmStats

Comment 9 Eyal Edri 2015-04-29 06:19:59 UTC

ovirt 3.5.2 was GA'd. closing current release.