Created attachment 1015704 [details] Proposed patch that fixes this Description of problem: When starting a server resource using the _Start_ resource operation, such as what is used for a standalone JBoss EAP 6 server or FSW server, the server resource will eventually stop functioning. In other words, the act of starting a server from JBoss ON results in the started server seizing at some point in the future. Version-Release number of selected component (if applicable): 3.3.1 How reproducible: Always Steps to Reproduce: 1. Install, configure and start JBoss ON 3.3.0 system. 2. Import RHQ Server resource into inventory. 3. From RHQ Server resource's Subsystems / Core / logging / Console Handlers / CONSOLE set configuration property _Level_ to `ALL`. 4. From RHQ Server resource's Subsystems / Core / logging / Loggers set configuration property _Level_ to `ALL` for all child resources. 5. Invoke the _Restart_ resource operation for the RHQ Server resource. 6. Generate logging output in the EAP standalone server. This can be done using a command similar to: while true; do curl http://localhost:9999; sleep 1; done Actual results: JBoss EAP server stops working after an hour or two. No errors are logged. Expected results: JBoss EAP server continues to work as normal. Error is logged to agent.log indicating that an unexpected error occurred while capturing console output. Additional info: This issue is a result of not properly handling a potential error in StreamRedirectorRunnable. In the event of a failure that is not of type Exception, the input and output streams are not being closed but the thread is stopped. This causes the sdtout and stderr streams to buffer until the buffer is filled to capacity. Because there is nothing reading from the buffer any longer, the process/thread writing to the stream (this lives in the managed resource such as the JBoss EAP server) will block. The end result is the managed server stops functioning. In this specific scenario, the unexpected error is an OOME that is caused by capturing the processes console output forever without actually doing anything with it. This issue could have been avoided by using a finally block. Attached patch demonstrates the necessary changes to properly close the streams no matter what. This does not directly fix the OOME as that will be captured in a separate bug.
I think fix for Bug 1212950 also partially fixes this Bug, but still OOM is possible (at least in case of Bug 1212950 can agent manage 100EAPs each filling up default 2MB memory with it's log). I'll merge suggested patch from Larry
I was not able to use the patch, git was telling me it was invalid, so I merged it manually branch: master link: https://github.com/rhq-project/rhq/commit/24bd7c37e time: 2015-06-05 14:01:24 +0200 commit: 24bd7c37e5a6495025c556779a800566ebda2008 author: Libor Zoubek - lzoubek message: Bug 1212933 - Resource start operation leads to broken managed server Close output and input streams in case of Error, not just Exception is thrown.
Available for test with 3.3.3 ER01 build: https://brewweb.devel.redhat.com/buildinfo?buildID=446732 *Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of jon-server-3.3.0.GA-update-03.zip.
After 24h it's still ok -> verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1525.html