Bug 647571

Summary: AS-5 Plugin performance issues when monitoring several EAP5 instances
Product: [Other] RHQ Project Reporter: Jay Shaughnessy <jshaughn>
Component: PluginsAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0.0   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-10 15:19:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 625146    

Description Jay Shaughnessy 2010-10-28 19:45:51 UTC
Description of problem:

See the case notes for details but in short, when monitoring several EAP5.0 instances the AS-5 plugin can generate significant CPU utilization, showing up as high, potentially constant, cou utilization in the rhq-agent process.

How reproducible:

Fairly, continue to add EAP5 instances and the problem occurs fairly linearly.  Of course, given system limitations there will always be a bounded number of EAP instances (or nearly any resource, really) that can be managed, but the agent should scale reasonably.  Currently, EAP5 scales much less that EAP4.

Even when performing well it is expected that the rhq agent will generate CPU utilization spikes when collecting availabity and/or metric information for a large resource population and/or large metric gathering.  But it should not generate constant cpu "thrashing" when presented with a reasonable "scale".

Comment 1 Corey Welton 2011-02-01 21:25:57 UTC
removing jon-241 tracker, per RT

Comment 3 Jay Shaughnessy 2011-05-18 20:15:21 UTC
f8f501532d2658d814048ecf3879685ddfdff984
Committer: Jay Shaughnessy <jshaughn>  2010-10-28 15:15:58

Perf Work on the AS-5 Plugin to try and reduce avail and metric gathering
times. In general trying to reduce interaction with Profile Service.
    
1) Cache ManagedComponent in ManagedComponentComponent

This is the base class for resources managed by a profile service
ManagedComponent.  It seems that for "runtime" properties getting the prop
value returns the live value, so we don't have to re-fetch the
ManagedComponent object from the ManagementView.  This greatly helps avail
checking since RunState is a runtime value.  It also helps metric collection
if all of the requested metrics are runtime values.
    
2) Cache whether metrics are runtime or not runtime

Instead of figuring out every time whether requested metrics are runtime
properties, cache this info to save time. This is a static cache so all
instances of the type can share the info.
    
3) Remove use of ManagedComponentUtils.getManagedComponent()

The PS API added ManagementView.getComponent(name, type), which is more
efficient.
    
4) Avoid load() when checking avail for ManagedDeployment components (EAR/WAR)

Although it is necessary to re-fetch the ManagedDeployment to get an
updated deployment state, a load of the management view is not necessary.

Comment 4 Charles Crouch 2011-09-27 02:06:08 UTC
Dropping priority on this as we're approaching the problem from a different angle namely being more selective in the metrics we collect and the rate we collect them: https://bugzilla.redhat.com/show_bug.cgi?id=741331

Comment 5 Charles Crouch 2011-09-30 17:48:49 UTC
removing superfluous trackers

Comment 6 Jay Shaughnessy 2012-02-10 15:19:43 UTC
closing this, the work is continued in bug 788638