Description of problem: Somewhere during the transition to JSON-RPC, the short output of getVMList changed to be a simple list of UUIDs. Unfortunately, this is not what the monitoring code in Engine expects. From a VERY high-level, monitoring does the following - short cycle: monitoring calls getVMList, to fetch VM UUID *and* its status. if status chenged since last poll, something interesting happened and *then* monitoring calls getVMStats(UUID) to learn what happened - long cycle: monitoring just calls getAllVmStats to get all the informations about VMs. Currently, short cycle is 3 seconds, long cycle is 15 seconds. The whole point of this approach is to minimize the traffic and the VDSM load, while keeping Engine able to respond quickly. But, if VDSM retuns just the UUID, then Engine cannot know the status, and enters the recovery mode, so calls getVmStats for each VM. This is practically equivalent to call getAllVmStats() every 3s (and _also_ every 15s), which is very wasteful. It is important to point out that this affects *only* performance, the stats are reported correctly so there is no functional impact. Version-Release number of selected component (if applicable): found in VDSM master 380713b80d124d1a19749085f477e7658468bf07, but most likely introduced earlier How reproducible: 100% with JSON-RPC protocol Steps to Reproduce: 1. Configure Engine to use JSON-RPC (default) 2. Run a VM 3. snoop the traffic between VDSM and Engine, see VM.getStats() be called too often Actual results: On steady state, VM.getStats() get called after each short cycle, for each running VM Expected results: On steady state, VM.getStats() is never called Additional info: To be verified: Engine patch may be needd. VDSM-only fix may not be enough.
related to Bug 1196040 ?
(In reply to Roy Golan from comment #1) > related to Bug 1196040 ? The only link so far is that I discovered this issue while investigating in that area, but so far 1196040 seems just noise in the logs.
Re-opening, due to possible incompatibility issue with MOM, based on: https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c20 If no code change is needed, feel free to move this BZ back to ON_QA
tested on top of vt14.3 , both engine and vdsm. 1 host, 1 vm duration 600 sec. profiler analysis results: 0 - getVmStats
ovirt 3.5.2 was GA'd. closing current release.