Description of problem: After invoking an EAP 6 based resource's start operation, the agent's used heap continues to grow until an OutOfMemoryError is thrown resulting in the thread that is capturing console output to be terminated. Version-Release number of selected component (if applicable): 3.3.1 How reproducible: Always Steps to Reproduce: 1. Install, configure and start JBoss ON 3.3.0 system. 2. Import RHQ Server resource into inventory. 3. From RHQ Server resource's Subsystems / Core / logging / Console Handlers / CONSOLE set configuration property _Level_ to `ALL`. 4. From RHQ Server resource's Subsystems / Core / logging / Loggers set configuration property _Level_ to `ALL` for all child resources. 5. Invoke the _Restart_ resource operation for the RHQ Server resource. 6. Generate logging output in the EAP standalone server. This can be done using a command similar to: while true; do curl http://localhost:9999; sleep 1; done Actual results: Agent heap continues to grow. Eventually the agent's thread named /usr/bin/nohup-stdout will be terminated and heap drops. Expected results: Agent heap should not grow. Additional info: When an operation is invoked which uses ProcessExecutor, the process's output or console can be captured to be used as the operation's result. In the case of the start operation however, or really any operation that executes a long running process, the operation returns but the process' console continues to be stored in a buffer. In the case of EAP 6, org.rhq.modules.plugins.jbossas7.ServerControl is setting ProcessExecution's capture output to true. For org.rhq.modules.plugins.jbossas7.ServerControl.Lifecycle.startServer() this results in output being captured forever in the event the process was properly started and continues to run. Eventually, the buffer can not be extended due to insufficient Java heap. Although an easy fix would be not to capture output at all in this case, considering we are already setting processExecution.setWaitForCompletion to -1 meaning we will not wait, it may not be ideal in cases that the start operation fails. Instead, to properly fix this, either the stream/process output redirection needs to support a timeout or a size limit. Additionally, in the event the operation returns, the thread should be interrupted as there is nothing to handle the output at that point. Even better would be an option for the user to specify a file to write output into or even redirect to the agent's logger. In the even that the user specifies a file, the redirection thread should continue to run even after the operation has returned. In cases where output is being returned as part of the operation result and not being redirected to a file or logger, the thread should be terminated once the operation returns or times out.
*** Bug 1212951 has been marked as a duplicate of this bug. ***
I created pull request https://github.com/rhq-project/rhq/pull/169 This BZ should be regargetted to JON, as the proposed fix goes to agent internals and not plugin.
Based on the proposed upstream fix for this issue, the fix would be in the core native-system module. This is part of the base/core agent and not specific to the EAP 6 plug-in. Setting target to 3.3.3 for consideration in next maintenance release.
in master commit 13439fe5ee67ed55e1eef307a08254594f98b9cd Author: Libor Zoubek <lzoubek> Date: Thu Apr 30 18:58:04 2015 +0200 Bug 1212950 - EAP 6 start operation causes agent to run out of memory due to storing console output in an unused buffer Now process output is captured (if captured) up to 2MB size, once output exceeds this limit, it is ignored - so we don't run out of memory (unless agent does not start plenty of verbose processes). Default limit can be changed via rhq.process-execution.captured-output.limit system property. This commit also gives more power to plugin writers about capturing process outputs. ProcessExecution#setCaptureOutput is now deprecated in favor of new CaptureMode setting. CaptureMode can capture to memory and/or forward to agent.log as well as setting captured limit.
Available for test with 3.3.3 ER01 build: https://brewweb.devel.redhat.com/buildinfo?buildID=446732 *Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of jon-server-3.3.0.GA-update-03.zip.
After 24h the heap is stable -> verified Version : 3.3.0.GA Update 03 Build Number : e4b348a:2f80c8c
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1525.html