Bug 788638

Summary: as-5 plugin consumes too much memory for managed components
Product: [Other] RHQ Project Reporter: John Sanda <jsanda>
Component: PluginsAssignee: John Sanda <jsanda>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.2CC: ccrouch, hrupp, jlivings, loleary
Target Milestone: ---   
Target Release: RHQ 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 790908 (view as bug list) Environment:
Last Closed: 2013-09-01 10:07:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 790908    

Description John Sanda 2012-02-08 16:54:04 UTC
Description of problem:
Some of the resources from the AS 5 plugin can wind up consuming way too much memory that can lead of OutOfMemoryErrors on the agent. We first noticed this with EJB2Component. In my local testing environment using some simple, dummy EJBs, I have observed EJB2Components consuming 260 KB of heap space, and that is per component (i.e., per each individual EJB). In some real-world deployments, we have seen them consume as much as 1.5 MB.

After doing analysis with some heap dumps we have determined that the problem is not a memory leak. 
Some of the managed objects that the AS 5 plugin gets back from the profile service have some very large properties. The managed object for the EJB2Component for example has a managed property named BeanMetaData that is really big. In my local environment, I have seen it use around 155 KB of heap space. Looking at some heap dumps of real world deployments I have seen the BeanMetaData property consume over 300 KB of heap space.

It turns out that we do not need or use all of the properties on the managed objects we get back from the profile service. We do not use the BeanMetaData property; so, we can filter out that property to reduce the foot print of each managed object. 

There may be other large properties we do not use that we can filter out to further reduce the memory foot print. Also note that this issue may not me limited to EJB2 resources. We have observed that WebApplicationContextComponent uses even more heap space; however, there will likely be far fewer web components than EJB components in an application.

We are also investigating EJB3Component to determine whether or not it is holding onto any really large managed properties that we do not need/use.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 John Sanda 2012-02-08 17:15:46 UTC
Another big property, relatively speaking, for EJB2Component is one named URL. This is another one we do not use. In my local environment it is about 50 KB. To put it in perspective, if we can filter out the URL and BeanMetaData properties, we would reduce the foot print of the EJB2Component by about 205 KB (in my environment). That would be a reduction in heap usage of over 75%.

Comment 2 John Sanda 2012-02-09 15:08:15 UTC
I need to make a slight correction in regards to my previous comment. I got the property name wrong. The correct property name is EJBModule. URL is a nested property inside of EJBModule.  The size of 50 KB is correct though.

Comment 3 John Sanda 2012-02-09 16:11:20 UTC
We have seen this issue with EJB 3 resources as well. In one heap dump I saw EJBComponents consuming over 9 MB of heap. The deployment property of the ManagedComponent corresponding to the EJB3Component consumes the bulk of that 9 MB.

This issue is not specific to EJB2 or EJB3 resources; however, it is most likely to manifest itself with them because an application will typically consist of a lot more EJB resources as opposed to WAR, EAR, data source, etc. types. The reason the problem is not specific to one or the other of these types is because EJB2Component, EJB3Component, and a number of other types inherit from a common base class - ManagedComponentComponent. It is ManagedComponentComponent where we cache the managed objects that we get back from the profile service.

Comment 4 John Sanda 2012-02-15 17:03:04 UTC
The fix for this has been pushed to master.

master commit hashes:
088d622f5f695a84f0f60d7ecfe087182d52b7df
16e393eea505f38dac2e29c72d3ebaec6a477fdd
4ad3388b4081b9689699c2d705f9162ab33fc37e
b973af82febbaba656e06209af7922fcb1891df8

There are two changes with these commits. First, ManagedComponentComponent no longer caches the ManagedComponent it gets back from the profile service. ManagedComponentComponent is the base class for most of the service types in the plugin including all EJB 2/3 resource types; so, this change effectively addresses the high memory usage issues describe in previous comments. 

Secondly, ManagedComponentComponent.getAvailability has been refactored to avoid the high CPU utilization issues that were addressed in bug 647571. There is now a configurable interval that determines when the data used for availability checks needs to be refreshed, i.e., reloading the managed component from the profile service.That data, which is the RunState property of the ManagedComponent, is updated any time the ManagedComponent is retrieved from the profile service, like during metric collections. The interval is reset as well. The interval is configurable from the top-level application server resource. It has a new plugin configuration property named serviceAvailabilityRefreshInterval which defaults to 15 minutes.

Comment 5 John Sanda 2012-02-15 17:23:51 UTC
For QE testing, we first want to make sure these changes have not introduced any regressions. We want to metric collections, availability checks, etc. still behave as expected. If QE wants to do any performance testing involving large numbers of EJBs, please see https://fedorahosted.org/pipermail/rhq-devel/2012-February/001481.html and feel free to contact me directly via email or IRC for additional help with getting set up.

QE might also want to test new plugin configuration property, serviceAvailabilityRefreshInterval, which belongs to the JBossAS Server resource type. The following could be done to test it. For some child types of  JBossAS Server (e.g., EJB types, data sources, etc.). Disable metric collections for those types. Set serviceAvailabilityRefreshInterval to a really low value, like 1 minute. On your agent, enable debug logging on the class org.rhq.plugins.jbossas5.ManagedComponentComponent. When an availability check is done and the refresh interval is exceeded as it should be with an interval of 1 minute, you should see log messages of the form, "The availability refresh interval for [resourceKey: <resource_key>, type: <resource_type>, name: <component_name>] has been exceeded by....Reloading managed component."

Comment 6 James Livingston 2012-06-27 04:20:24 UTC
This bug is still marked as ON_QA, but fixes appear to be included in RHQ 4.3 and 4.4. Did the status need to be flipped over, or am I mis-reading the git log and the fixes aren't included?

Comment 7 Larry O'Leary 2012-06-27 14:16:45 UTC
(In reply to comment #6)
> This bug is still marked as ON_QA, but fixes appear to be included in RHQ
> 4.3 and 4.4. Did the status need to be flipped over, or am I mis-reading the
> git log and the fixes aren't included?

The status is correct. This bug has been fixed in RHQ 4.3. It remains ON_QA until it is VERIFIED as fixed. The product version of this bug is Bug 790908 which has been VERIFIED in JON 3.0.1.

Comment 8 John Sanda 2012-06-27 14:26:30 UTC
I think that the status just did not get updated. Moving to VERIFIED.

Comment 9 Larry O'Leary 2012-06-27 14:31:20 UTC
(In reply to comment #8)
> I think that the status just did not get updated. Moving to VERIFIED.

QE hasn't VERIFIED this yet. It should remain ON_QA until QE verifies.

Comment 10 John Sanda 2012-06-27 16:15:37 UTC
I changed to VERIFIED on the basis that bug 790908 is a clone of this bug and has already been verified by QE. Since this fix has already been included in a community release and has been verified by QE on a product release branch, should I expect QE to go back and verify this?

For now I will move back to ON_QA until this gets sorted out.

Comment 11 Heiko W. Rupp 2013-09-01 10:07:35 UTC
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.