Bug 1134902 - Restarting the agent or plugin container through the rhq-agent plugin leaks classloaders
Summary: Restarting the agent or plugin container through the rhq-agent plugin leaks c...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Agent, Plugin -- Other
Version: JON 3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ER03
: JON 3.3.0
Assignee: Lukas Krejci
QA Contact: Filip Brychta
URL:
Whiteboard:
Depends On: 1129910
Blocks: 1145561
TreeView+ depends on / blocked
 
Reported: 2014-08-28 12:26 UTC by Lukas Krejci
Modified: 2014-12-11 14:00 UTC (History)
6 users (show)

Fixed In Version:
Clone Of: 1129910
Environment:
Last Closed: 2014-12-11 14:00:26 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Perm gen on server's agent (126.67 KB, application/octet-stream)
2014-09-16 15:04 UTC, Jan Bednarik
no flags Details
Perm gen on remote agent (123.83 KB, application/octet-stream)
2014-09-16 15:05 UTC, Jan Bednarik
no flags Details
Inventoried resources (198.37 KB, application/octet-stream)
2014-09-17 15:32 UTC, Jan Bednarik
no flags Details

Description Lukas Krejci 2014-08-28 12:26:18 UTC
+++ This bug was initially created as a clone of Bug #1129910 +++

Description of problem:
This will lead to eventual depletion of permgen and OOMEs.


Version-Release number of selected component (if applicable):
4.13.0-SNAPSHOT

How reproducible:
always

Steps to Reproduce:
1. Inventory the RHQ Agent resource
2. Run its "restartPluginContainer" operation repeatedly

Actual results:
The permgen usage will increase until its depleted

Expected results:
No leaks and stable permgen

Additional info:

--- Additional comment from Lukas Krejci on 2014-08-15 03:07:56 EDT ---

Pull request opened: https://github.com/rhq-project/rhq/pull/109

Comment 1 Thomas Segismont 2014-08-29 12:39:22 UTC
Merge into release/jon3.3.x
commit 2511a112f3c1c0f11496172b1381bba32fa65cd4
Author: Lukas Krejci <lkrejci>
Date:   Thu Aug 14 02:01:51 2014 +0200
    
    (cherry picked from commit 5d8bd5cbcb1cbde0adcc5b0e7035a05db7150389)
    Signed-off-by: Thomas Segismont <tsegismo>

Comment 2 Simeon Pinder 2014-09-03 20:31:53 UTC
Moving to ON_QA as available for test with the following brew build:
https://brewweb.devel.redhat.com//buildinfo?buildID=381194

Comment 3 Filip Brychta 2014-09-11 08:56:20 UTC
restartPluginContainer operation was scheduled to run each minute. After 24 hours permgen is still stable -> verified

Version :	
3.3.0.ER02
Build Number :	
4fbb183:7da54e2

Comment 4 Jan Bednarik 2014-09-16 15:03:28 UTC
Moving to ASSIGNED.

During working on BZ 1077943 I found out that the class loader does not leak only in case of remote agent. In the case of agent installed with JON server it still seems to leak. After aprox. 15 plugin container restarts the Used Perm Gen memory gradually reaches 166 MB. Then the agent starts to throw OOME:

2014-09-16 10:33:50,174 ERROR [RHQ Agent Ping Thread-1] (enterprise.communications.command.client.ClientCommandSenderTask)- {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.agent-name=jbe-jon330er02-srv.bc.jonqe.lab.eng.bos.redhat.com, rhq.externalizable-strategy=AGENT, rhq.security-token=bKLQV6TXWrPgV9fxQMrgrmKNbZ0K0lDpmPf83YiVGBbWpjlmRBiEkXjjkCDfYWK24gk=, rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[ping], targetInterfaceName=org.rhq.core.clientapi.server.core.CoreServerService}]]. Cause: java.lang.Exception:java.lang.OutOfMemoryError: PermGen space -> java.lang.OutOfMemoryError:PermGen space. Cause: java.lang.Exception: java.lang.OutOfMemoryError: PermGen space

See attached screenshots comparing the Perm Gen trend for remote and server's agent.

Version:3.3.0.ER02
Build Number:4fbb183:7da54e2

Comment 5 Jan Bednarik 2014-09-16 15:04:27 UTC
Created attachment 938074 [details]
Perm gen on server's agent

Comment 6 Jan Bednarik 2014-09-16 15:05:08 UTC
Created attachment 938075 [details]
Perm gen on remote agent

Comment 7 Lukas Krejci 2014-09-16 19:40:45 UTC
We do not support running the agent embedded inside the JON server. It can be co-located on the same host as the JON server but still is a standalone process and so the "local" and "remote" cases you describe are equivalent. Even the local agent communicates with the server using HTTP connection as does the remote one.

Please double check your agent is of the latest version and that it has its plugins updated to the latest versions. Also I'd need to see the full list of inventoried resources to try and hunt for additional leak suspects.

It would be of tremendous help if you were able to run full heap dumps before each plugin container restart and share those with me somehow.

Comment 9 Jan Bednarik 2014-09-17 15:32:01 UTC
Created attachment 938523 [details]
Inventoried resources

Comment 10 Lukas Krejci 2014-09-17 19:54:05 UTC
Based on the heapdumps provided by Jan, it seems there's yet another leak originating in the database plugin. I see in the inventory screenshot provided that a postgres server is inventoried so that should probably be our leak suspect.

I reported the leak in BZ 1143048 for RHQ and BZ 1143050 for JON.

I suggest re-running the tests with no database server inventoried.

Comment 12 Lukas Krejci 2014-09-18 17:53:55 UTC
The above mentioned leak originates in the Postgres JDBC driver that doesn't clean up after itself on unload, leaving a thread running that holds on to the classes indefinitely.

I provided a fix for it to the postgres JDBC project: https://github.com/pgjdbc/pgjdbc/pull/188.

Comment 14 Filip Brychta 2014-09-23 09:42:43 UTC
Verified on
Version :	
3.3.0.ER03
Build Number :	
4aefe39:44e33a4

Scenario:
1- JON agent is running on a machine without any services monitored via JON
2- register and import the agent
3- schedule agent operation "Restart Plugin Container" each 2 minutes

Result:
After 6 hours which corresponds to ~ 180 Plugin container restarts is the perm gen still stable


This verifies leaks in rhq-agent plugin. There are other leaks which are covered by bz 1145561


Note You need to log in before you can comment on or make changes to this bug.