Bug 1139217

Summary: [RFE] [scale] improve resource usage during sampling
Product: [oVirt] vdsm Reporter: Francesco Romani <fromani>
Component: RFEsAssignee: Francesco Romani <fromani>
Status: CLOSED CURRENTRELEASE QA Contact: Eldad Marciano <emarcian>
Severity: medium Docs Contact:
Priority: high    
Version: ---CC: bazulay, bugs, fromani, iheim, mgoldboi, michal.skrivanek, mkalinin, rbalakri, s.kieske, yeylon, ykaul
Target Milestone: ovirt-3.6.0-rcKeywords: FutureFeature
Target Release: 4.17.0Flags: rule-engine: ovirt-3.6.0+
ylavi: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-3.6.0-alpha1.2 Doc Type: Enhancement
Doc Text:
VM monitoring in Vdsm was rewritten to use less host CPU
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-11 07:18:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1181653    
Bug Blocks:    

Description Francesco Romani 2014-09-08 12:20:33 UTC
Description of problem:
During normal usage, VDSM monitors the VM to gather statistics and report them to Engine. To do so, it must use the less amount of host resource as possible,
in order to leave them for VMs.

This is a generic tracker bug for improvements in this area.

Comment 1 Francesco Romani 2015-01-09 13:37:41 UTC
after long discussion, many failed attempts and lot of tinkering, patches posted

Comment 2 Francesco Romani 2015-01-13 14:55:52 UTC
the new libvirt bulk stats API are an improvement, even more in the long term.
But the biggest source of load is the disk usage threshold check.

This alone drives up the frequency of polling to very high rates.

Once we get events to be notified of disk usage threshold exceeded, we can greatly reduce the frequency of polling to sane values, thus greatly reducing the load of the system and improving the resource usage.

Comment 3 Michal Skrivanek 2015-03-26 10:16:29 UTC
changing to RFE, the improvements is very significant in cases of high number of VMs per host.
Estimated improvements are in order of 2-4 times less CPU usage

Comment 4 Francesco Romani 2015-05-19 06:38:12 UTC
VDSM patches all merged for 4.17.0 (oVirt 3.6.0)

MOM needs to be updated (work in progress on that front)

Moving to MODIFIED

Comment 5 Red Hat Bugzilla Rules Engine 2015-10-18 08:34:56 UTC
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.