+++ This bug was initially created as a clone of Bug #1129910 +++ Description of problem: This will lead to eventual depletion of permgen and OOMEs. Version-Release number of selected component (if applicable): 4.13.0-SNAPSHOT How reproducible: always Steps to Reproduce: 1. Inventory the RHQ Agent resource 2. Run its "restartPluginContainer" operation repeatedly Actual results: The permgen usage will increase until its depleted Expected results: No leaks and stable permgen Additional info: --- Additional comment from Lukas Krejci on 2014-08-15 03:07:56 EDT --- Pull request opened: https://github.com/rhq-project/rhq/pull/109
Merge into release/jon3.3.x commit 2511a112f3c1c0f11496172b1381bba32fa65cd4 Author: Lukas Krejci <lkrejci> Date: Thu Aug 14 02:01:51 2014 +0200 (cherry picked from commit 5d8bd5cbcb1cbde0adcc5b0e7035a05db7150389) Signed-off-by: Thomas Segismont <tsegismo>
Moving to ON_QA as available for test with the following brew build: https://brewweb.devel.redhat.com//buildinfo?buildID=381194
restartPluginContainer operation was scheduled to run each minute. After 24 hours permgen is still stable -> verified Version : 3.3.0.ER02 Build Number : 4fbb183:7da54e2
Moving to ASSIGNED. During working on BZ 1077943 I found out that the class loader does not leak only in case of remote agent. In the case of agent installed with JON server it still seems to leak. After aprox. 15 plugin container restarts the Used Perm Gen memory gradually reaches 166 MB. Then the agent starts to throw OOME: 2014-09-16 10:33:50,174 ERROR [RHQ Agent Ping Thread-1] (enterprise.communications.command.client.ClientCommandSenderTask)- {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.agent-name=jbe-jon330er02-srv.bc.jonqe.lab.eng.bos.redhat.com, rhq.externalizable-strategy=AGENT, rhq.security-token=bKLQV6TXWrPgV9fxQMrgrmKNbZ0K0lDpmPf83YiVGBbWpjlmRBiEkXjjkCDfYWK24gk=, rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[ping], targetInterfaceName=org.rhq.core.clientapi.server.core.CoreServerService}]]. Cause: java.lang.Exception:java.lang.OutOfMemoryError: PermGen space -> java.lang.OutOfMemoryError:PermGen space. Cause: java.lang.Exception: java.lang.OutOfMemoryError: PermGen space See attached screenshots comparing the Perm Gen trend for remote and server's agent. Version:3.3.0.ER02 Build Number:4fbb183:7da54e2
Created attachment 938074 [details] Perm gen on server's agent
Created attachment 938075 [details] Perm gen on remote agent
We do not support running the agent embedded inside the JON server. It can be co-located on the same host as the JON server but still is a standalone process and so the "local" and "remote" cases you describe are equivalent. Even the local agent communicates with the server using HTTP connection as does the remote one. Please double check your agent is of the latest version and that it has its plugins updated to the latest versions. Also I'd need to see the full list of inventoried resources to try and hunt for additional leak suspects. It would be of tremendous help if you were able to run full heap dumps before each plugin container restart and share those with me somehow.
Created attachment 938523 [details] Inventoried resources
Based on the heapdumps provided by Jan, it seems there's yet another leak originating in the database plugin. I see in the inventory screenshot provided that a postgres server is inventoried so that should probably be our leak suspect. I reported the leak in BZ 1143048 for RHQ and BZ 1143050 for JON. I suggest re-running the tests with no database server inventoried.
The above mentioned leak originates in the Postgres JDBC driver that doesn't clean up after itself on unload, leaving a thread running that holds on to the classes indefinitely. I provided a fix for it to the postgres JDBC project: https://github.com/pgjdbc/pgjdbc/pull/188.
Verified on Version : 3.3.0.ER03 Build Number : 4aefe39:44e33a4 Scenario: 1- JON agent is running on a machine without any services monitored via JON 2- register and import the agent 3- schedule agent operation "Restart Plugin Container" each 2 minutes Result: After 6 hours which corresponds to ~ 180 Plugin container restarts is the perm gen still stable This verifies leaks in rhq-agent plugin. There are other leaks which are covered by bz 1145561