Red Hat Bugzilla – Bug 811696
High Agent CPU utilization after enabling certain Metric Collection Templates
Last modified: 2014-01-02 15:38:59 EST
Description of problem: If certain Metric Collection Templates are enabled for Tomcat Web Application (WAR) -- e.g. "Currently Active Sessions", "Processing Errors" or "Requests served" -- the RHQ agent will start displaying very high CPU utilization soon after. Running top on the agent's VM shows "Cpu(s)" at 51.7%us and the RHQ Agent process has a cpu value of 102-103% (even higher with more than one metric template enabled).
This appears to only apply to an agent that is monitoring a Tomcat/EWS server. An agent on a different server with no Tomcat/EWS instances didn't seem to have the same problem. It also doesn't seem to apply to the "Processing Errors per Minute" and "Requests served per Minute" templates, which I have enabled with a collection interval of 20 minutes and they don't seem to be causing problems.
Version-Release number of selected component (if applicable): JON 3.0.0.GA, JON Agent 4.2.0.JON300.GA, RHEL 5.5
How reproducible: Always
Steps to Reproduce:
1. Install/start JON server and an agent.
2. Install/start Tomcat/EWS (Tomcat 6)
3. Import Agent and Tomcat/EWS into JON server inventory
4. In the JON server UI, navigate to Administration->Metric Collection Templates->Tomcat Server->Tomcat Virtual Host->Tomcat Web Application (WAR)
5. Enable one of the problem metric templates (e.g. "Currently Active Sessions", "Processing Errors" or "Requests served"). Collection interval can be 10, 20 or 40 minutes, and should display the same results.
6. Navigate to the agent in the JON server UI and restart it.
7. Run top on the agent's server/VM
8. Within a minute or two, the agent's CPU utilization should increase substantially.
Actual results: RHQ Agent creates very high CPU load
Expected results: RHQ Agent should continue to create reasonable CPU load
Additional info: /proc/cpuinfo on the agent's VM reports two CPUs of type Intel Xeon X5690 @ 3.47GHz.
See the same results on a lenovo laptop with quad core i7 (/proc/cpuinfo shows four of the following: Intel(R) Core(TM) i7 CPU M 620 @ 2.67GHz).
Thanks for the bug report. Can you supply some more info:
-Full version of EWS being monitored
-Java version running JON agent and Java version running EWS
-Can you attach a copy of the inventory.xml from underneath the JON agent install
-How long does the high CPU load last?
Where in the agent's directory structure should the inventory.xml file be located? I ran a find in both locations and nothing came up.
As for the other questions:
On RHEL6 VM:
java version "1.6.0_24"
Java (TM) SE Runtime Environment (build 1.6.0_4-b07)
Java HotSpot (TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
On lenovo laptop running Fedora 15:
java version "1.6.0_22"
OpenJDK Runtime Environment (IcedTea6 1.10.6) (fedora-22.214.171.124.fc15-i386)
OpenJDK Server VM (build 20.0-b11, mixed mode)
CPU load seems to continue indefinitely, as long as those EWS/Tomcat metrics are enabled. Even after disabling them, I had to restart the agent service in order for CPU usage to subside (note that restarting the service involved invoking rhq-agent-wraper.sh on the server/laptop. Simply restarting the agent through the JON server UI didn't work).
Created attachment 577812 [details]
inventory.xml from lenovo
Sorry, forgot you have to generate the inventory.xml file. I uploaded the file for the lenovo. I'll upload it for the VM in a bit.
Created attachment 577819 [details]
RHEL VM inventory file
This has been fixed upstream. See bug 812968.
As this is MODIFIED or ON_QA, setting milestone to ER1.
bug was verified in upstream, re-tested with jon 3.2 er1 - no regression (only during the start/restart the cpu usage is high and then it gets back to ~1.2%)