Bug 1212950 - EAP 6 start operation causes agent to run out of memory due to storing console output in an unused buffer
Summary: EAP 6 start operation causes agent to run out of memory due to storing consol...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Agent
Version: JON 3.3.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ER01
: JON 3.3.3
Assignee: Libor Zoubek
QA Contact: Filip Brychta
URL:
Whiteboard:
: 1212951 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-17 18:17 UTC by Larry O'Leary
Modified: 2019-05-20 11:44 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-07-30 16:42:03 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1212933 0 unspecified CLOSED Resource start operation leads to broken managed server 2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution) 1409153 0 None None None Never
Red Hat Product Errata RHSA-2015:1525 0 normal SHIPPED_LIVE Moderate: Red Hat JBoss Operations Network 3.3.3 update 2015-07-30 20:41:08 UTC

Internal Links: 1212933

Description Larry O'Leary 2015-04-17 18:17:13 UTC
Description of problem:
After invoking an EAP 6 based resource's start operation, the agent's used heap continues to grow until an OutOfMemoryError is thrown resulting in the thread that is capturing console output to be terminated.

Version-Release number of selected component (if applicable):
3.3.1

How reproducible:
Always

Steps to Reproduce:
1.  Install, configure and start JBoss ON 3.3.0 system.
2.  Import RHQ Server resource into inventory.
3.  From RHQ Server resource's Subsystems / Core / logging / Console Handlers / CONSOLE set configuration property _Level_ to `ALL`.
4.  From RHQ Server resource's Subsystems / Core / logging / Loggers set configuration property _Level_ to `ALL` for all child resources.
5.  Invoke the _Restart_ resource operation for the RHQ Server resource.
6.  Generate logging output in the EAP standalone server. This can be done using a command similar to:

        while true; do curl http://localhost:9999; sleep 1; done

Actual results:
Agent heap continues to grow. Eventually the agent's thread named /usr/bin/nohup-stdout will be terminated and heap drops.

Expected results:
Agent heap should not grow.

Additional info:
When an operation is invoked which uses ProcessExecutor, the process's output or console can be captured to be used as the operation's result. In the case of the start operation however, or really any operation that executes a long running process, the operation returns but the process' console continues to be stored in a buffer.

In the case of EAP 6, org.rhq.modules.plugins.jbossas7.ServerControl is setting ProcessExecution's capture output to true. For org.rhq.modules.plugins.jbossas7.ServerControl.Lifecycle.startServer() this results in output being captured forever in the event the process was properly started and continues to run. Eventually, the buffer can not be extended due to insufficient Java heap.

Although an easy fix would be not to capture output at all in this case, considering we are already setting processExecution.setWaitForCompletion to -1 meaning we will not wait, it may not be ideal in cases that the start operation fails. Instead, to properly fix this, either the stream/process output redirection needs to support a timeout or a size limit. Additionally, in the event the operation returns, the thread should be interrupted as there is nothing to handle the output at that point.

Even better would be an option for the user to specify a file to write output into or even redirect to the agent's logger. In the even that the user specifies a file, the redirection thread should continue to run even after the operation has returned. In cases where output is being returned as part of the operation result and not being redirected to a file or logger, the thread should be terminated once the operation returns or times out.

Comment 1 Larry O'Leary 2015-04-17 18:19:12 UTC
*** Bug 1212951 has been marked as a duplicate of this bug. ***

Comment 2 Libor Zoubek 2015-04-30 17:10:23 UTC
I created pull request https://github.com/rhq-project/rhq/pull/169

This BZ should be regargetted to JON, as the proposed fix goes to agent internals and not plugin.

Comment 3 Larry O'Leary 2015-05-06 11:00:09 UTC
Based on the proposed upstream fix for this issue, the fix would be in the core native-system module. This is part of the base/core agent and not specific to the EAP 6 plug-in. Setting target to 3.3.3 for consideration in next maintenance release.

Comment 4 Libor Zoubek 2015-05-15 23:37:18 UTC
in master

commit 13439fe5ee67ed55e1eef307a08254594f98b9cd
Author: Libor Zoubek <lzoubek>
Date:   Thu Apr 30 18:58:04 2015 +0200

    Bug 1212950 - EAP 6 start operation causes agent to run out of memory due to
    storing console output in an unused buffer
    
    Now process output is captured (if captured) up to 2MB size, once output
    exceeds this limit, it is ignored - so we don't run out of memory (unless
    agent does not start plenty of verbose processes). Default limit can be
    changed via rhq.process-execution.captured-output.limit system property.
    
    This commit also gives more power to plugin writers about capturing process
    outputs. ProcessExecution#setCaptureOutput is now deprecated in favor of new
    CaptureMode setting. CaptureMode can capture to memory and/or forward to
    agent.log as well as setting captured limit.

Comment 9 Simeon Pinder 2015-07-10 18:55:46 UTC
Available for test with 3.3.3 ER01 build: 
https://brewweb.devel.redhat.com/buildinfo?buildID=446732
 *Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of
 jon-server-3.3.0.GA-update-03.zip.

Comment 10 Filip Brychta 2015-07-16 12:31:08 UTC
After 24h the heap is stable -> verified
Version :	
3.3.0.GA Update 03
Build Number :	
e4b348a:2f80c8c

Comment 12 errata-xmlrpc 2015-07-30 16:42:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1525.html


Note You need to log in before you can comment on or make changes to this bug.