Bug 790908

Summary: [as5 plugin] consumes too much memory (heap) for managed components
Product: [JBoss] JBoss Operations Network Reporter: John Sanda <jsanda>
Component: Plugin -- JBoss EAP 5Assignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: urgent    
Version: JON 3.0.0CC: hrupp, ian.springer, loleary, spinder
Target Milestone: ---   
Target Release: JON 3.0.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 788638 Environment:
Last Closed: 2013-05-13 21:36:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 788638    
Bug Blocks: 765650, 791314, 798024    
Attachments:
Description Flags
Verification ... results of improvements none

Description John Sanda 2012-02-15 17:27:40 UTC
+++ This bug was initially created as a clone of Bug #788638 +++

Description of problem:
Some of the resources from the AS 5 plugin can wind up consuming way too much memory that can lead of OutOfMemoryErrors on the agent. We first noticed this with EJB2Component. In my local testing environment using some simple, dummy EJBs, I have observed EJB2Components consuming 260 KB of heap space, and that is per component (i.e., per each individual EJB). In some real-world deployments, we have seen them consume as much as 1.5 MB.

After doing analysis with some heap dumps we have determined that the problem is not a memory leak. 
Some of the managed objects that the AS 5 plugin gets back from the profile service have some very large properties. The managed object for the EJB2Component for example has a managed property named BeanMetaData that is really big. In my local environment, I have seen it use around 155 KB of heap space. Looking at some heap dumps of real world deployments I have seen the BeanMetaData property consume over 300 KB of heap space.

It turns out that we do not need or use all of the properties on the managed objects we get back from the profile service. We do not use the BeanMetaData property; so, we can filter out that property to reduce the foot print of each managed object. 

There may be other large properties we do not use that we can filter out to further reduce the memory foot print. Also note that this issue may not me limited to EJB2 resources. We have observed that WebApplicationContextComponent uses even more heap space; however, there will likely be far fewer web components than EJB components in an application.

We are also investigating EJB3Component to determine whether or not it is holding onto any really large managed properties that we do not need/use.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

--- Additional comment from jsanda on 2012-02-08 12:15:46 EST ---

Another big property, relatively speaking, for EJB2Component is one named URL. This is another one we do not use. In my local environment it is about 50 KB. To put it in perspective, if we can filter out the URL and BeanMetaData properties, we would reduce the foot print of the EJB2Component by about 205 KB (in my environment). That would be a reduction in heap usage of over 75%.

--- Additional comment from jsanda on 2012-02-09 10:08:15 EST ---

I need to make a slight correction in regards to my previous comment. I got the property name wrong. The correct property name is EJBModule. URL is a nested property inside of EJBModule.  The size of 50 KB is correct though.

--- Additional comment from jsanda on 2012-02-09 11:11:20 EST ---

We have seen this issue with EJB 3 resources as well. In one heap dump I saw EJBComponents consuming over 9 MB of heap. The deployment property of the ManagedComponent corresponding to the EJB3Component consumes the bulk of that 9 MB.

This issue is not specific to EJB2 or EJB3 resources; however, it is most likely to manifest itself with them because an application will typically consist of a lot more EJB resources as opposed to WAR, EAR, data source, etc. types. The reason the problem is not specific to one or the other of these types is because EJB2Component, EJB3Component, and a number of other types inherit from a common base class - ManagedComponentComponent. It is ManagedComponentComponent where we cache the managed objects that we get back from the profile service.

--- Additional comment from jsanda on 2012-02-15 12:03:04 EST ---

The fix for this has been pushed to master.

master commit hashes:
088d622f5f695a84f0f60d7ecfe087182d52b7df
16e393eea505f38dac2e29c72d3ebaec6a477fdd
4ad3388b4081b9689699c2d705f9162ab33fc37e
b973af82febbaba656e06209af7922fcb1891df8

There are two changes with these commits. First, ManagedComponentComponent no longer caches the ManagedComponent it gets back from the profile service. ManagedComponentComponent is the base class for most of the service types in the plugin including all EJB 2/3 resource types; so, this change effectively addresses the high memory usage issues describe in previous comments. 

Secondly, ManagedComponentComponent.getAvailability has been refactored to avoid the high CPU utilization issues that were addressed in bug 647571. There is now a configurable interval that determines when the data used for availability checks needs to be refreshed, i.e., reloading the managed component from the profile service.That data, which is the RunState property of the ManagedComponent, is updated any time the ManagedComponent is retrieved from the profile service, like during metric collections. The interval is reset as well. The interval is configurable from the top-level application server resource. It has a new plugin configuration property named serviceAvailabilityRefreshInterval which defaults to 15 minutes.

--- Additional comment from jsanda on 2012-02-15 12:23:51 EST ---

For QE testing, we first want to make sure these changes have not introduced any regressions. We want to metric collections, availability checks, etc. still behave as expected. If QE wants to do any performance testing involving large numbers of EJBs, please see https://fedorahosted.org/pipermail/rhq-devel/2012-February/001481.html and feel free to contact me directly via email or IRC for additional help with getting set up.

QE might also want to test new plugin configuration property, serviceAvailabilityRefreshInterval, which belongs to the JBossAS Server resource type. The following could be done to test it. For some child types of  JBossAS Server (e.g., EJB types, data sources, etc.). Disable metric collections for those types. Set serviceAvailabilityRefreshInterval to a really low value, like 1 minute. On your agent, enable debug logging on the class org.rhq.plugins.jbossas5.ManagedComponentComponent. When an availability check is done and the refresh interval is exceeded as it should be with an interval of 1 minute, you should see log messages of the form, "The availability refresh interval for [resourceKey: <resource_key>, type: <resource_type>, name: <component_name>] has been exceeded by....Reloading managed component."

Comment 1 John Sanda 2012-02-15 18:30:34 UTC
The fix has been cherry-picked to the release/jon3.0.x branch.

commit hashes:
dbd357d545896e61bdaaebba273dee93703aefb6
d274464f1b3901b97d74838db4fc0fb8db715f13
ec9ea86edbfa45e1fcb739d8788e919557630ebd
097bcd786afe718d5c8316152580956c009ed9b1

Comment 3 Simeon Pinder 2012-02-17 05:29:23 UTC
Moving to ON_QA for testing with JON 3.0.1.GA RC5 or better:
https://brewweb.devel.redhat.com//buildinfo?buildID=199114

Comment 4 Mike Foley 2012-02-17 15:30:00 UTC
Created attachment 563930 [details]
Verification ... results of improvements

Comment 5 Mike Foley 2012-02-17 21:25:16 UTC
relying on the attached data for verification of the performance aspect.

verified JON 3.01 RC5.