Bug 721152

Summary: Plug-ins initializing Augeas without explicitly de-initializing is resulting in large memory and resource leak
Product: [Other] RHQ Project Reporter: Larry O'Leary <loleary>
Component: PluginsAssignee: Lukas Krejci <lkrejci>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.2CC: fdrabek, hrupp, lkrejci, rsoares, sdharane, skondkar
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 721151 Environment:
Apache httpd Apache Plug-in Augeas libraries available in Agent installation Linux platform
Last Closed: 2012-02-07 19:28:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 721150, 721151    
Bug Blocks: 678340, 725459    

Description Larry O'Leary 2011-07-13 21:09:39 UTC
+++ This bug was initially created as a clone of Bug #721151 +++

+++ This bug was initially created as a clone of Bug #721150 +++

Agent JVM is using large amounts of system memory even when Java heap is relatively small. This results in slow system performance and in some instances, the agent's JVM being terminated by the Linux kernel.

This memory usage is outside of the JVM itself and appears to occur in a native library used by the agent's JVM. The library in use in all reported instances of the issue is libaugeas.so. 

The high memory usage is due to a plug-ins construction of net.augeas.Augeas without a corresponding Augeas.close() call when the object is no longer needed. This results in the underlying native library hanging onto resources that have been initialized/requested with the construction of the Augeas object.

Memory usage varies depending on the number of Apache instances that are on the agent host machine.

This issue occurs even if there are no Apache instances in inventory. This is because the Apache plug-in utilizes the Augeas libraries, if available, when attempting discovery, which by default is executed every 15 minutes.

Furthermore, if there are Apache instances in inventory, the leak occurs at a faster rate because instances of Augeas are created during service scans and configuration syncs.

Comment 1 Larry O'Leary 2011-07-13 21:15:01 UTC
This issue affects the Augeas and Apache plug-ins and potentially others. Essentially, anywhere that we are creating a new instance of net.augeas.Augeas without explicitly calling Augeas.close() after we are done with the Augeas libraries.

As suggested by Mike, we should probably be using some type of design pattern to ensure that the Augeas bindings can not be used directly but instead everything go through AugeasProxy or some other delegate that could help reduce this potential.

On that note, any object that requires the ability to use Augeas as a data field should also override the finalize() method to explicitly close the Augeas instance during garbage collection.

Comment 2 Larry O'Leary 2011-07-13 21:20:25 UTC
This issue may also have been originally identified in https://bugzilla.redhat.com/show_bug.cgi?id=589674 however it does not appear that it was directly linked to Augeas or the use of Apache however, the description and the comments in the bug are consistent to this issue. Looking at the Apache plug-in in the 3.0.0 release, it required the native library support to be enabled to start the Apache resources and perform server discovery. So, by disabling native support, you are essentially also disabling the use of the Augeas libs which would result in the issue going away.

Comment 3 Larry O'Leary 2011-07-19 23:43:25 UTC
Although we really need to properly address the use of the Augeas libraries, the Augeas Java Binding should at minimum clean up these resource leaks by using a finalizer which performs the cleanup in the event that the caller fails to do it. I logged the following two Augeas bugs to address this in the Augeas Java Binding:

https://fedorahosted.org/augeas/ticket/212
https://fedorahosted.org/augeas/ticket/213

Comment 4 Charles Crouch 2011-07-25 16:35:23 UTC
With Filip to fix in master

Comment 5 Lukas Krejci 2011-08-04 12:50:48 UTC
commit ba3fcbe5cdea5c384743c0a90e71e73a48707b9f
Author: Lukas Krejci <lkrejci>
Date:   Mon Aug 1 14:11:28 2011 +0200

    BZ 721152 - fixing the augeas memory leak - cherrypicked over from release-3.0.0

Comment 6 Mike Foley 2011-08-04 15:37:27 UTC
while verifying this, encountered https://bugzilla.redhat.com/show_bug.cgi?id=728292

Comment 7 Venkat 2011-09-26 12:34:48 UTC
Lukas could  you please help me by providing steps to verify this BZ

Comment 8 Lukas Krejci 2011-09-29 15:38:45 UTC
This is a hard one to test manually.

The steps would be:

1) Configure the agent to run configuration checks frequently (say every 5
minutes)
1) Inventorize an apache instance
2) Enable augeas in the connection settings
3) let it run overnight
4) no drastic memory increase of the agent process should be visible

Comment 9 Sunil Kondkar 2011-10-10 10:55:49 UTC
Verified on build# 485 (Version: 4.1.0-SNAPSHOT Build Number: e07065d)

Configured  the agent to run config checks every 5 minutes. Inventoried apache instance and enabled augeas in connection settings. 

After the overnight run observed that the agent process in top command does not show any drastic memory increase. (Earlier it was around 25 to 26% and now it is around 26 to 27%)

Comment 10 Mike Foley 2012-02-07 19:28:25 UTC
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE