Bug 615377 - restarting plugin container causes classloaders to leak and will eventually cause permgen to run out
Summary: restarting plugin container causes classloaders to leak and will eventually c...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Plugin Container
Version: 1.3.1
Hardware: All
OS: All
low
medium
Target Milestone: ---
: ---
Assignee: John Mazzitelli
QA Contact: Corey Welton
URL:
Whiteboard:
Depends On:
Blocks: 725852 jon30-bugs 648954
TreeView+ depends on / blocked
 
Reported: 2010-07-16 16:14 UTC by Ian Springer
Modified: 2018-12-04 14:20 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-24 01:07:38 UTC
Embargoed:


Attachments (Terms of Use)
classloaders-1-plugin-10-restarts.txt (14.79 KB, text/plain)
2010-07-16 16:31 UTC, John Mazzitelli
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1077943 0 unspecified CLOSED [AS7] plugin's connection cleaner leaks classloader 2021-02-22 00:41:40 UTC

Internal Links: 1077943

Description Ian Springer 2010-07-16 16:14:42 UTC
After the plugin container is restarted, the RootPluginClassLoader and all its descendant classloaders from the previous PC instance are not cleaned up. This caused permgen to leak and, if the pc is restarted enough times, the accumulating leaked classloaders will eventually cause a permgen OutOfMemoryError. The classloaders will consume significantly more space if one or more JBossAS servers are being managed by the pc - this is because these plugins define lots of resource types and have lots of dependencies, thereby causing lots of classes to be loaded into the corresponding plugin classloaders.

The underlying cause is the following bug in the JDK, which causes classloaders that are no longer used to not get cleaned up:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5041014

The bug has a comment that says this:

"Please see bug report 4167874. A new method URLClassLoader.close() is being added in jdk 7. It should be integrated in the next few weeks." 

so it sounds unlikely we will ever see a fix for this in JDK6.

There are lots of blogs on this topic - here are just a couple, one written by Mazz:

http://management-platform.blogspot.com/2009/01/classloaders-keeping-jar-files-open.html 
http://my.opera.com/karmazilla/blog/2007/03/13/good-riddance-permgen-outofmemoryerror

Comment 1 Ian Springer 2010-07-16 16:15:50 UTC
The following article gives a nice tutorial on how to analyze permgen out-of-memory-errors / leaks using Eclipse Memory Analyzer (MAT) to analyze a heap dump file:

http://dev.eclipse.org/blogs/memoryanalyzer/2008/05/17/the-unknown-generation-perm/

Note, if you notice a bunch of sun.reflect.DelegatingClassLoaders while analyzing the heap dump, this is what they are:

Hotspot JVM generates classes on-the-fly to speed up Java reflective calls (Constructor.newInstance, Method.invoke etc.). We want to find out how many such classes were generated. Piece of implementation detail before proceeding: All reflection speed-up classes are loaded by classloaders of type sun.reflect.DelegatingClassLoader. Each such loader only loads a single class.

So they are probably not the culprits for the permgen leak.

Comment 2 John Mazzitelli 2010-07-16 16:31:22 UTC
Created attachment 432433 [details]
classloaders-1-plugin-10-restarts.txt

taking hprof profiler dumps of the VM, I do see many tens (if not over a hundred or more) of sun.reflect.DelegatingClassLoaders instances. As ips said, that's probably not the source of the problem - for each one of those, there is only a single class definition loaded inside it.

I ran a test - I disabled all agent plugins but the platform plugin. I then "plugins update" 10 times. Took hprof dump and examined it via Eclipse MAT. I then looked at the classloaders and I see 132 of those sun.reflect.DelegatingClassLoaders. I also see a single RootPluginClassloader and a single PluginClassLoader - which is to be expected with a single plugin deployed. See attached "classloaders-1-plugin-10-restarts.txt".

So this tells me the core plugin container is ok and is cleaning up after itself - restarting the PC 10 times shows that I still have one plugin classloader.

I'm going to look at what happens with other plugins - I did some test runs with that and I've seen bunches of EMS classloaders - we might not be cleaning those up or EMS itself isn't cleaning itself up. I'm going to see if there is something we can do to help clean up the EMS classloaders (if indeed that is a source of leakage).

Comment 3 John Mazzitelli 2010-07-22 18:08:33 UTC
I noticed the EMS child-first classloader is referenced by javax.security.auth.login.Configuration

See this sun bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6727821

Comment 4 John Mazzitelli 2010-07-22 18:27:54 UTC
This might involve the JBossConfiguration object that is in EMS. Gonna investigate this further, but I sense this is the area where the "bad things" might be caused.

Comment 5 John Mazzitelli 2010-07-23 02:59:36 UTC
After the agent is fully shutdown (PC is shutdown, comm is down, just the agent input prompt thread is running), if I invoke this:

javax.security.auth.login.Configuration.getConfiguration()

I see EMS classes in here (stored here by the jboss plugin and/or the EMS library) - that object's data members are:

configuration=org.mc4j.ems.impl.jmx.connection.support.providers.jaas.JBossConfiguration@32d463e5
contextClassLoader=org.mc4j.ems.connection.support.classloader.ChildFirstClassloader@77a82f1

Remember, the entire PC is down, all of our classloaders should be freed/unused. However, we store references to these EMS classes in a JRE javax static location. This is why I think we are leaking.  Just a theory for now, but we definitely need to clean out these references regardless.

Comment 6 John Mazzitelli 2010-07-23 12:41:57 UTC
I added this to the code. Still seeing perm gen grow, but I verified in my debugger that the rhq/ems references in that javax Configuration static is no longer there. Still searching for other areas where we are similarly leaking. Notice I also do a LogFactory.releaseAll for good measure here:

@@ -116,2 +118,7 @@ public class PluginContainer implements ContainerService {
     private PluginContainer() {
+        // for why we need to do this, see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6727821 
+        try {
+            Configuration.getConfiguration();
+        } catch (Throwable t) {
+        }
     }
@@ -338,2 +345,11 @@ public class PluginContainer implements ContainerService {
         Introspector.flushCaches();
+
+        // for why we need to do this, see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6727821 
+        try {
+            Configuration.setConfiguration(null);
+        } catch (Throwable t) {
+        }
+
+        LogFactory.releaseAll();
+
         System.gc();

Comment 7 John Mazzitelli 2010-07-30 21:34:33 UTC
Another place to fix is inside EMS - doing this, I see perm gen usage be much more stable and things getting freed up more.

Index: src/ems-impl/org/mc4j/ems/impl/jmx/connection/DConnection.java
===================================================================
--- src/ems-impl/org/mc4j/ems/impl/jmx/connection/DConnection.java	(revision 616)
+++ src/ems-impl/org/mc4j/ems/impl/jmx/connection/DConnection.java	(working copy)
@@ -102,6 +102,7 @@
 
 //        tracker.stopTracker();
         connectionProvider.disconnect();
+        LogFactory.release(connectionProvider.getClass().getClassLoader());
     }

Comment 8 John Mazzitelli 2010-07-31 15:19:01 UTC
getting further along. I will be committing a new EMS change (1.2.13) that will provide a public API (ClassLoaderFactory.clearCaches) so I can clear its caches of jar files, temp files and most importantly classloaders. I ran a test and that seems to help even further.

However, I do notice some leakage if I just restart the PC (via "plugins update" for example). I see URLClassLoader instances grow unbounded, among other things. However, if I shutdown the full agent core internals (via "shutdown" which also kills the comm layer and the agent management MBean) most of those leaked instances free up (though not entirely). So there is still some things we can do to further fix this. At least with my current fixes, the agent is able to have a much more stable perm gen.

Comment 9 John Mazzitelli 2010-08-02 18:56:33 UTC
i checked in some minor tweeks to PluginContainer. I don't think it changes much in the way of issues with perm gen, but maybe. In addition, we may want to consider adding these VM options to the agent - I'm reading that they may help:

-XX:+UseConcMarkSweepGC
-XX:+CMSPermGenSweepingEnabled
-XX:+CMSClassUnloadingEnabled

Comment 10 John Mazzitelli 2010-08-03 14:12:11 UTC
I'm gonna close out this BZ. Most of the perm gen leaks are fixed. There still seems to be some minor perm gen leakage that occurs when restarting the internals (using either "shutdown/start", "plugins update" or "pc stop/pc start" - which all restart the PC; the first one also shuts down the rest of the core agent internals). Those minor leaks not withstanding, I think the bulk of the problems with perm gen are fixed.

Here's the master sha commits that were involved:

plugin container change:

4ae53d12b5b30aafe5362ad5435b0fbe548962b6
69c6da3af5ef3a12988836d5a11b5b09e459075c
b13693f2a55c24422be11d38bf23361ed1db3950

jmx-plugin change:

774ed6788f6d14ec7b14f169b2181fbcf20e5302

jboss-as plugin change:

724682f8e03a5f804774559a97a70298dc402a43

There is nothing to test really - this is all code changes. You'd actually test by hooking up JProfiler to the agent and confirm that perm gen is more stable. You'd have to do lots of "plugins update" commands after monitoring one or more JBossAS instances (the RHQ Server is a valid one to test with).

Comment 11 John Mazzitelli 2010-08-03 14:49:06 UTC
moving to verified.

Comment 12 Corey Welton 2010-08-12 16:44:10 UTC
Mass-closure of verified bugs against JON.

Comment 13 Charles Crouch 2010-11-02 16:55:44 UTC
This wasn't fixed in JON2.4

Comment 14 Charles Crouch 2010-11-02 16:56:36 UTC
Moving back to Mazz's last state of verified

Comment 17 Corey Welton 2011-05-24 01:07:38 UTC
Bookkeeping - closing bug - fixed in recent release.


Note You need to log in before you can comment on or make changes to this bug.