DescriptionFrancesco Romani
2015-02-25 17:20:52 UTC
Description of problem:
Somewhere during the transition to JSON-RPC, the short output of getVMList changed to be a simple list of UUIDs.
Unfortunately, this is not what the monitoring code in Engine expects.
From a VERY high-level, monitoring does the following
- short cycle: monitoring calls getVMList, to fetch VM UUID *and* its status.
if status chenged since last poll, something interesting happened and *then*
monitoring calls getVMStats(UUID) to learn what happened
- long cycle: monitoring just calls getAllVmStats to get all the informations
about VMs.
Currently, short cycle is 3 seconds, long cycle is 15 seconds.
The whole point of this approach is to minimize the traffic and the VDSM load,
while keeping Engine able to respond quickly.
But, if VDSM retuns just the UUID, then Engine cannot know the status, and enters the recovery mode, so calls getVmStats for each VM.
This is practically equivalent to call getAllVmStats() every 3s (and _also_ every 15s), which is very wasteful.
It is important to point out that this affects *only* performance, the stats
are reported correctly so there is no functional impact.
Version-Release number of selected component (if applicable):
found in VDSM master 380713b80d124d1a19749085f477e7658468bf07, but most likely
introduced earlier
How reproducible:
100% with JSON-RPC protocol
Steps to Reproduce:
1. Configure Engine to use JSON-RPC (default)
2. Run a VM
3. snoop the traffic between VDSM and Engine, see VM.getStats() be called too often
Actual results:
On steady state, VM.getStats() get called after each short cycle, for each running VM
Expected results:
On steady state, VM.getStats() is never called
Additional info:
To be verified: Engine patch may be needd. VDSM-only fix may not be enough.
(In reply to Roy Golan from comment #1)
> related to Bug 1196040 ?
The only link so far is that I discovered this issue while investigating in that area, but so far 1196040 seems just noise in the logs.