Bug 1196327 - [performance] bad getVMList output creates unnecessary calls from Engine
Summary: [performance] bad getVMList output creates unnecessary calls from Engine
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: 3.6
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: 3.5.2
Assignee: Francesco Romani
QA Contact: Eldad Marciano
URL:
Whiteboard: virt
Depends On:
Blocks: oVirt_3.5.2_tracker 1193058 1196735 1198248 1198680 1202360 1203305
TreeView+ depends on / blocked
 
Reported: 2015-02-25 17:20 UTC by Francesco Romani
Modified: 2016-02-10 19:49 UTC (History)
13 users (show)

Fixed In Version: vdsm-4.16.13-1.el6ev
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1196735 1202360 (view as bug list)
Environment:
Last Closed: 2015-04-29 06:19:59 UTC
oVirt Team: Virt
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 38172 0 master MERGED json-rpc: fix the Host.getVMList return value Never
oVirt gerrit 38271 0 master MERGED json-rpc: fix the Host.getVMList return value Never
oVirt gerrit 38304 0 ovirt-engine-3.5.2 MERGED json-rpc: fix the Host.getVMList return value Never
oVirt gerrit 38321 0 ovirt-engine-3.5 MERGED json-rpc: fix the Host.getVMList return value Never
oVirt gerrit 38322 0 ovirt-3.5 MERGED json-rpc: fix the Host.getVMList return value Never
oVirt gerrit 38432 0 master MERGED Revert "json-rpc: fix the Host.getVMList return value" Never
oVirt gerrit 38433 0 ovirt-3.5 MERGED Revert "json-rpc: fix the Host.getVMList return value" Never
oVirt gerrit 38679 0 master MERGED API: onlyUUID should affect only short status Never
oVirt gerrit 38805 0 master MERGED API: getVMList: compatibity with internal clients Never
oVirt gerrit 38867 0 ovirt-3.5 MERGED API: onlyUUID should affect only short status Never
oVirt gerrit 38868 0 ovirt-3.5 MERGED API: getVMList: compatibity with internal clients Never

Description Francesco Romani 2015-02-25 17:20:52 UTC
Description of problem:
Somewhere during the transition to JSON-RPC, the short output of getVMList changed to be a simple list of UUIDs.
Unfortunately, this is not what the monitoring code in Engine expects.

From a VERY high-level, monitoring does the following
- short cycle: monitoring calls getVMList, to fetch VM UUID *and* its status.
  if status chenged since last poll, something interesting happened and *then*
  monitoring calls getVMStats(UUID) to learn what happened

- long cycle: monitoring just calls getAllVmStats to get all the informations 
  about VMs.

Currently, short cycle is 3 seconds, long cycle is 15 seconds.
The whole point of this approach is to minimize the traffic and the VDSM load,
while keeping Engine able to respond quickly.

But, if VDSM retuns just the UUID, then Engine cannot know the status, and enters the recovery mode, so calls getVmStats for each VM.

This is practically equivalent to call getAllVmStats() every 3s (and _also_ every 15s), which is very wasteful.

It is important to point out that this affects *only* performance, the stats
are reported correctly so there is no functional impact.


Version-Release number of selected component (if applicable):
found in VDSM master 380713b80d124d1a19749085f477e7658468bf07, but most likely
introduced earlier

How reproducible:
100% with JSON-RPC protocol

Steps to Reproduce:
1. Configure Engine to use JSON-RPC (default)
2. Run a VM
3. snoop the traffic between VDSM and Engine, see VM.getStats() be called too often

Actual results:
On steady state, VM.getStats() get called after each short cycle, for each running VM

Expected results:
On steady state, VM.getStats() is never called

Additional info:
To be verified: Engine patch may be needd. VDSM-only fix may not be enough.

Comment 1 Roy Golan 2015-02-26 10:05:10 UTC
related to Bug 1196040 ?

Comment 2 Francesco Romani 2015-02-26 10:07:24 UTC
(In reply to Roy Golan from comment #1)
> related to Bug 1196040 ?

The only link so far is that I discovered this issue while investigating in that area, but so far 1196040 seems just noise in the logs.

Comment 7 Gil Klein 2015-03-16 11:46:01 UTC
Re-opening, due to possible incompatibility issue with MOM,  based on:

https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c20

If no code change is needed, feel free to move this BZ back to ON_QA

Comment 8 Eldad Marciano 2015-04-22 10:06:48 UTC
tested on top of vt14.3 , both engine and vdsm.

1 host, 1 vm
duration 600 sec.

profiler analysis results:

0 - getVmStats

Comment 9 Eyal Edri 2015-04-29 06:19:59 UTC
ovirt 3.5.2 was GA'd. closing current release.


Note You need to log in before you can comment on or make changes to this bug.