Bug 812968
Summary: | High Agent CPU utilization after enabling certain Metric Collection Templates | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Charles Crouch <ccrouch> |
Component: | Agent | Assignee: | Jay Shaughnessy <jshaughn> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 3.0.0 | CC: | ahovsepy, dvanbale, hbrock, hrupp, maurizio.antillon |
Target Milestone: | --- | ||
Target Release: | RHQ 4.4.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | 811696 | Environment: | |
Last Closed: | 2013-08-31 09:55:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 811696, 813917 | ||
Bug Blocks: | 782579 |
Description
Charles Crouch
2012-04-16 17:01:12 UTC
Relevant master commits commit 3cdb62feb318ec26bac53b87bf32c90915b088f6 commit 98ab742c79b8f878b160269aa7e4607136f6a0dc commit 80208e99755069d9b004ab2eb3bf8095cdb6b35a The problem is actually independent of plugin or resource type. It is a general problem with enable/disable of metrics at the template level. - The database is not compromised - It occurs only when applying the changes to existing inventory - It affects only running agents that are updated as a result of the changes. - The server code leaks bad collection interval values, used to indicate various enable/disable scenarios, to the agent update. This has been corrected. The changes are limited to the server jar. There are additions to the MeasurementScheduleRemote and Local, and certain methods have been deprecated. Existing CLI scripts should be ok, and I've added better validation of intervals being set in that way. Although, if they are using the deprecated methods they should move to the new methods after the next upgrade. Agents do not need to be updated. But, agents suffering from this problem should be re-synced, or restarted --purge. Test Notes: To reproduce I used AS4 WAR type and, via the GUI, enabled an out-of-box disabled metric to enabled, not changing the interval. Again, the type is not relevant, there is a general issue with template metric manipulation. Various enable/disable and interval update should be tried from the template level. For good measure, group and resource level should be sanity checked, although the code-path is different. The Agent prompt command: > schedules <resource-id> is very useful for looking at the intervals defined for a resource. You should never see 0 or -1 in this list. Countermeasure: Add TCMS Testcase per 4/24/2011 dev/support call. https://engineering.redhat.com/trac/jon/ticket/116 The bug is verified. High cpu is not being create/kept for metrics collection changes, anyway, after a restart or just a start of agent it uses about 90% cpu for ~10-20 seconds and then it calms down till 0.4~ 1.3%. Additional task is created https://engineering.redhat.com/trac/jon/ticket/289 The task will serve for later investigation and performance testing. |