Description of problem:
After invoking an EAP 6 based resource's start operation, the agent's used heap continues to grow until an OutOfMemoryError is thrown resulting in the thread that is capturing console output to be terminated.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Install, configure and start JBoss ON 3.3.0 system.
2. Import RHQ Server resource into inventory.
3. From RHQ Server resource's Subsystems / Core / logging / Console Handlers / CONSOLE set configuration property _Level_ to `ALL`.
4. From RHQ Server resource's Subsystems / Core / logging / Loggers set configuration property _Level_ to `ALL` for all child resources.
5. Invoke the _Restart_ resource operation for the RHQ Server resource.
6. Generate logging output in the EAP standalone server. This can be done using a command similar to:
while true; do curl http://localhost:9999; sleep 1; done
Agent heap continues to grow. Eventually the agent's thread named /usr/bin/nohup-stdout will be terminated and heap drops.
Agent heap should not grow.
When an operation is invoked which uses ProcessExecutor, the process's output or console can be captured to be used as the operation's result. In the case of the start operation however, or really any operation that executes a long running process, the operation returns but the process' console continues to be stored in a buffer.
In the case of EAP 6, org.rhq.modules.plugins.jbossas7.ServerControl is setting ProcessExecution's capture output to true. For org.rhq.modules.plugins.jbossas7.ServerControl.Lifecycle.startServer() this results in output being captured forever in the event the process was properly started and continues to run. Eventually, the buffer can not be extended due to insufficient Java heap.
Although an easy fix would be not to capture output at all in this case, considering we are already setting processExecution.setWaitForCompletion to -1 meaning we will not wait, it may not be ideal in cases that the start operation fails. Instead, to properly fix this, either the stream/process output redirection needs to support a timeout or a size limit. Additionally, in the event the operation returns, the thread should be interrupted as there is nothing to handle the output at that point.
Even better would be an option for the user to specify a file to write output into or even redirect to the agent's logger. In the even that the user specifies a file, the redirection thread should continue to run even after the operation has returned. In cases where output is being returned as part of the operation result and not being redirected to a file or logger, the thread should be terminated once the operation returns or times out.
*** Bug 1212951 has been marked as a duplicate of this bug. ***
I created pull request https://github.com/rhq-project/rhq/pull/169
This BZ should be regargetted to JON, as the proposed fix goes to agent internals and not plugin.
Based on the proposed upstream fix for this issue, the fix would be in the core native-system module. This is part of the base/core agent and not specific to the EAP 6 plug-in. Setting target to 3.3.3 for consideration in next maintenance release.
Author: Libor Zoubek <firstname.lastname@example.org>
Date: Thu Apr 30 18:58:04 2015 +0200
Bug 1212950 - EAP 6 start operation causes agent to run out of memory due to
storing console output in an unused buffer
Now process output is captured (if captured) up to 2MB size, once output
exceeds this limit, it is ignored - so we don't run out of memory (unless
agent does not start plenty of verbose processes). Default limit can be
changed via rhq.process-execution.captured-output.limit system property.
This commit also gives more power to plugin writers about capturing process
outputs. ProcessExecution#setCaptureOutput is now deprecated in favor of new
CaptureMode setting. CaptureMode can capture to memory and/or forward to
agent.log as well as setting captured limit.
Available for test with 3.3.3 ER01 build:
*Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of
After 24h the heap is stable -> verified
3.3.0.GA Update 03
Build Number :
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.