Red Hat Bugzilla – Bug 790908
[as5 plugin] consumes too much memory (heap) for managed components
Last modified: 2013-05-13 17:36:09 EDT
+++ This bug was initially created as a clone of Bug #788638 +++
Description of problem:
Some of the resources from the AS 5 plugin can wind up consuming way too much memory that can lead of OutOfMemoryErrors on the agent. We first noticed this with EJB2Component. In my local testing environment using some simple, dummy EJBs, I have observed EJB2Components consuming 260 KB of heap space, and that is per component (i.e., per each individual EJB). In some real-world deployments, we have seen them consume as much as 1.5 MB.
After doing analysis with some heap dumps we have determined that the problem is not a memory leak.
Some of the managed objects that the AS 5 plugin gets back from the profile service have some very large properties. The managed object for the EJB2Component for example has a managed property named BeanMetaData that is really big. In my local environment, I have seen it use around 155 KB of heap space. Looking at some heap dumps of real world deployments I have seen the BeanMetaData property consume over 300 KB of heap space.
It turns out that we do not need or use all of the properties on the managed objects we get back from the profile service. We do not use the BeanMetaData property; so, we can filter out that property to reduce the foot print of each managed object.
There may be other large properties we do not use that we can filter out to further reduce the memory foot print. Also note that this issue may not me limited to EJB2 resources. We have observed that WebApplicationContextComponent uses even more heap space; however, there will likely be far fewer web components than EJB components in an application.
We are also investigating EJB3Component to determine whether or not it is holding onto any really large managed properties that we do not need/use.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
--- Additional comment from firstname.lastname@example.org on 2012-02-08 12:15:46 EST ---
Another big property, relatively speaking, for EJB2Component is one named URL. This is another one we do not use. In my local environment it is about 50 KB. To put it in perspective, if we can filter out the URL and BeanMetaData properties, we would reduce the foot print of the EJB2Component by about 205 KB (in my environment). That would be a reduction in heap usage of over 75%.
--- Additional comment from email@example.com on 2012-02-09 10:08:15 EST ---
I need to make a slight correction in regards to my previous comment. I got the property name wrong. The correct property name is EJBModule. URL is a nested property inside of EJBModule. The size of 50 KB is correct though.
--- Additional comment from firstname.lastname@example.org on 2012-02-09 11:11:20 EST ---
We have seen this issue with EJB 3 resources as well. In one heap dump I saw EJBComponents consuming over 9 MB of heap. The deployment property of the ManagedComponent corresponding to the EJB3Component consumes the bulk of that 9 MB.
This issue is not specific to EJB2 or EJB3 resources; however, it is most likely to manifest itself with them because an application will typically consist of a lot more EJB resources as opposed to WAR, EAR, data source, etc. types. The reason the problem is not specific to one or the other of these types is because EJB2Component, EJB3Component, and a number of other types inherit from a common base class - ManagedComponentComponent. It is ManagedComponentComponent where we cache the managed objects that we get back from the profile service.
--- Additional comment from email@example.com on 2012-02-15 12:03:04 EST ---
The fix for this has been pushed to master.
master commit hashes:
There are two changes with these commits. First, ManagedComponentComponent no longer caches the ManagedComponent it gets back from the profile service. ManagedComponentComponent is the base class for most of the service types in the plugin including all EJB 2/3 resource types; so, this change effectively addresses the high memory usage issues describe in previous comments.
Secondly, ManagedComponentComponent.getAvailability has been refactored to avoid the high CPU utilization issues that were addressed in bug 647571. There is now a configurable interval that determines when the data used for availability checks needs to be refreshed, i.e., reloading the managed component from the profile service.That data, which is the RunState property of the ManagedComponent, is updated any time the ManagedComponent is retrieved from the profile service, like during metric collections. The interval is reset as well. The interval is configurable from the top-level application server resource. It has a new plugin configuration property named serviceAvailabilityRefreshInterval which defaults to 15 minutes.
--- Additional comment from firstname.lastname@example.org on 2012-02-15 12:23:51 EST ---
For QE testing, we first want to make sure these changes have not introduced any regressions. We want to metric collections, availability checks, etc. still behave as expected. If QE wants to do any performance testing involving large numbers of EJBs, please see https://fedorahosted.org/pipermail/rhq-devel/2012-February/001481.html and feel free to contact me directly via email or IRC for additional help with getting set up.
QE might also want to test new plugin configuration property, serviceAvailabilityRefreshInterval, which belongs to the JBossAS Server resource type. The following could be done to test it. For some child types of JBossAS Server (e.g., EJB types, data sources, etc.). Disable metric collections for those types. Set serviceAvailabilityRefreshInterval to a really low value, like 1 minute. On your agent, enable debug logging on the class org.rhq.plugins.jbossas5.ManagedComponentComponent. When an availability check is done and the refresh interval is exceeded as it should be with an interval of 1 minute, you should see log messages of the form, "The availability refresh interval for [resourceKey: <resource_key>, type: <resource_type>, name: <component_name>] has been exceeded by....Reloading managed component."
The fix has been cherry-picked to the release/jon3.0.x branch.
Moving to ON_QA for testing with JON 3.0.1.GA RC5 or better:
Created attachment 563930 [details]
Verification ... results of improvements
relying on the attached data for verification of the performance aspect.
verified JON 3.01 RC5.