Bug 811696

Summary: High Agent CPU utilization after enabling certain Metric Collection Templates
Product: [JBoss] JBoss Operations Network Reporter: David van Balen <dvanbale>
Component: AgentAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: JON 3.0.0CC: ahovsepy, dvanbale, jlivings, jshaughn
Target Milestone: ER01   
Target Release: JON 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 812968 813917 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 812968, 813917    
Attachments:
Description Flags
inventory.xml from lenovo
none
RHEL VM inventory file none

Description David van Balen 2012-04-11 18:39:31 UTC
Description of problem: If certain Metric Collection Templates are enabled for Tomcat Web Application (WAR) -- e.g. "Currently Active Sessions", "Processing Errors" or "Requests served" -- the RHQ agent will start displaying very high CPU utilization soon after. Running top on the agent's VM shows "Cpu(s)" at 51.7%us and the RHQ Agent process has a cpu value of 102-103% (even higher with more than one metric template enabled).

This appears to only apply to an agent that is monitoring a Tomcat/EWS server. An agent on a different server with no Tomcat/EWS instances didn't seem to have the same problem. It also doesn't seem to apply to the "Processing Errors per Minute" and "Requests served per Minute" templates, which I have enabled with a collection interval of 20 minutes and they don't seem to be causing problems.


Version-Release number of selected component (if applicable): JON 3.0.0.GA, JON Agent 4.2.0.JON300.GA, RHEL 5.5


How reproducible: Always


Steps to Reproduce:
1. Install/start JON server and an agent.
2. Install/start Tomcat/EWS (Tomcat 6)
3. Import Agent and Tomcat/EWS into JON server inventory
4. In the JON server UI, navigate to Administration->Metric Collection Templates->Tomcat Server->Tomcat Virtual Host->Tomcat Web Application (WAR)
5. Enable one of the problem metric templates (e.g. "Currently Active Sessions", "Processing Errors" or "Requests served"). Collection interval can be 10, 20 or 40 minutes, and should display the same results.
6. Navigate to the agent in the JON server UI and restart it.
7. Run top on the agent's server/VM
8. Within a minute or two, the agent's CPU utilization should increase substantially.
  
Actual results: RHQ Agent creates very high CPU load


Expected results: RHQ Agent should continue to create reasonable CPU load


Additional info: /proc/cpuinfo on the agent's VM reports two CPUs of type Intel Xeon X5690 @ 3.47GHz.

Comment 1 David van Balen 2012-04-13 23:24:53 UTC
See the same results on a lenovo laptop with quad core i7 (/proc/cpuinfo shows four of the following: Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz).

Comment 2 Charles Crouch 2012-04-16 16:39:27 UTC
Hi David
Thanks for the bug report. Can you supply some more info:

-Full version of EWS being monitored
-Java version running JON agent and Java version running EWS
-Can you attach a copy of the inventory.xml from underneath the JON agent install
-How long does the high CPU load last?

Comment 3 David van Balen 2012-04-16 19:05:40 UTC
Where in the agent's directory structure should the inventory.xml file be located? I ran a find in both locations and nothing came up.

As for the other questions:

On RHEL6 VM:

EWS 1.0.2

java version "1.6.0_24"
Java (TM) SE Runtime Environment (build 1.6.0_4-b07)
Java HotSpot (TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

On lenovo laptop running Fedora 15:

EWS 1.0.2-RHEL6-i386

java version "1.6.0_22"
OpenJDK Runtime Environment (IcedTea6 1.10.6) (fedora-63.1.10.6.fc15-i386)
OpenJDK Server VM (build 20.0-b11, mixed mode)


CPU load seems to continue indefinitely, as long as those EWS/Tomcat metrics are enabled. Even after disabling them, I had to restart the agent service in order for CPU usage to subside (note that restarting the service involved invoking rhq-agent-wraper.sh on the server/laptop. Simply restarting the agent through the JON server UI didn't work).

Comment 4 David van Balen 2012-04-16 19:45:53 UTC
Created attachment 577812 [details]
inventory.xml from lenovo

Comment 5 David van Balen 2012-04-16 19:46:40 UTC
Sorry, forgot you have to generate the inventory.xml file. I uploaded the file for the lenovo. I'll upload it for the VM in a bit.

Comment 6 David van Balen 2012-04-16 20:02:38 UTC
Created attachment 577819 [details]
RHEL VM inventory file

Comment 9 Jay Shaughnessy 2012-04-19 03:29:02 UTC

This has been fixed upstream. See bug 812968.

Comment 10 Larry O'Leary 2013-09-06 14:31:42 UTC
As this is MODIFIED or ON_QA, setting milestone to ER1.

Comment 11 Armine Hovsepyan 2013-09-19 16:00:23 UTC
bug was verified in upstream, re-tested with jon 3.2 er1 - no regression (only during the start/restart the cpu usage is high and then it gets back to ~1.2%)