Bug 813917

Summary: High Agent CPU utilization after enabling certain Metric Collection Templates
Product: [Other] RHQ Project Reporter: Jay Shaughnessy <jshaughn>
Component: MonitoringAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.0.1CC: dvanbale, hrupp, jlivings, jshaughn, loleary
Target Milestone: ---   
Target Release: JON 3.0.2   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 811696 Environment:
Last Closed: 2013-09-04 08:51:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 811696    
Bug Blocks: 812968, 818029    

Comment 1 Jay Shaughnessy 2012-04-18 18:58:51 UTC
release/jon3.0.x commit 67c43335379d1313fe83b53b25f79d7671ad8ef5

 The problem is actually independent of plugin or resource type.  It is a
 general problem with enable/disable of metrics at the template level.
 - The database is not compromised
 - It occurs only when applying the changes to existing inventory
 - It affects only running agents that are updated as a result of the
   changes.
 - The server code leaks bad collection interval values, used to indicate
   various enable/disable scenarios, to the agent update.

 This has been corrected.  The changes are limited to the server jar.
 There are additions to the MeasurementScheduleRemote and Local, and
 certain methods have been deprecated.  Existing CLI scripts should be
 ok, and I've added better validation of intervals being set in that
 way. Although, if they are using the deprecated methods they should move
 to the new methods after the next upgrade.

 Agents do not need to be updated. But, agents suffering from this problem
 should be re-synced, or restarted --purge.


Test Notes:
To reproduce I used AS4 WAR type and, via the GUI, enabled an out-of-box disabled metric to enabled, not changing the interval.

Again, the type is not relevant, there is a general issue with template metric manipulation.  Various enable/disable and interval update should be tried from
the template level.  For good measure, group and resource level should be
sanity checked, although the code-path is different.

The Agent prompt command:

  > schedules <resource-id> 

is very useful for looking at the intervals defined for a resource.  You
should never see 0 or -1 in this list.

Comment 2 Jay Shaughnessy 2012-04-18 21:04:58 UTC
Updating commit with a few more changes:

commit 5b1cd8fd1fd3dad13a9c9c2a5cd1788aee9624bf

    Fix issues due to code changes for [Bug 81391].


This does not add files to the patch file-set, it does update some
of the files.

Comment 3 Jay Shaughnessy 2012-04-18 21:18:22 UTC
Actually, that's not true, it adds WebservicesManagerBean to the file-set.

Comment 4 Jay Shaughnessy 2012-04-19 03:30:33 UTC
Patch note: The changes to MeasurementDataGwtImpl need to be applied to
coregui.war.  So, the patch should update or replace coregui.war.

Comment 6 Heiko W. Rupp 2013-09-04 08:51:38 UTC
Closing as this is for a very old release.