Bug 1212933 - Resource start operation leads to broken managed server
Summary: Resource start operation leads to broken managed server
Status: CLOSED ERRATA
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Agent
Version: JON 3.3.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ER01
: JON 3.3.3
Assignee: Libor Zoubek
QA Contact: Filip Brychta
URL:
Whiteboard:
Keywords: Triaged
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-17 17:34 UTC by Larry O'Leary
Modified: 2019-05-20 11:44 UTC (History)
5 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2015-07-30 16:42:01 UTC


Attachments (Terms of Use)
Proposed patch that fixes this (5.07 KB, patch)
2015-04-17 17:34 UTC, Larry O'Leary
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1525 normal SHIPPED_LIVE Moderate: Red Hat JBoss Operations Network 3.3.3 update 2015-07-30 20:41:08 UTC
Red Hat Bugzilla 1212950 None None None Never
Red Hat Knowledge Base (Solution) 1409153 None None None Never

Internal Trackers: 1212950

Description Larry O'Leary 2015-04-17 17:34:56 UTC
Created attachment 1015704 [details]
Proposed patch that fixes this

Description of problem:
When starting a server resource using the _Start_ resource operation, such as what is used for a standalone JBoss EAP 6 server or FSW server, the server resource will eventually stop functioning. In other words, the act of starting a server from JBoss ON results in the started server seizing at some point in the future.

Version-Release number of selected component (if applicable):
3.3.1

How reproducible:
Always

Steps to Reproduce:
1.  Install, configure and start JBoss ON 3.3.0 system.
2.  Import RHQ Server resource into inventory.
3.  From RHQ Server resource's Subsystems / Core / logging / Console Handlers / CONSOLE set configuration property _Level_ to `ALL`.
4.  From RHQ Server resource's Subsystems / Core / logging / Loggers set configuration property _Level_ to `ALL` for all child resources.
5.  Invoke the _Restart_ resource operation for the RHQ Server resource.
6.  Generate logging output in the EAP standalone server. This can be done using a command similar to:

        while true; do curl http://localhost:9999; sleep 1; done

Actual results:
JBoss EAP server stops working after an hour or two. No errors are logged.

Expected results:
JBoss EAP server continues to work as normal. Error is logged to agent.log indicating that an unexpected error occurred while capturing console output.

Additional info:
This issue is a result of not properly handling a potential error in StreamRedirectorRunnable. In the event of a failure that is not of type Exception, the input and output streams are not being closed but the thread is stopped. This causes the sdtout and stderr streams to buffer until the buffer is filled to capacity. Because there is nothing reading from the buffer any longer, the process/thread writing to the stream (this lives in the managed resource such as the JBoss EAP server) will block. The end result is the managed server stops functioning.

In this specific scenario, the unexpected error is an OOME that is caused by capturing the processes console output forever without actually doing anything with it. This issue could have been avoided by using a finally block.

Attached patch demonstrates the necessary changes to properly close the streams no matter what. This does not directly fix the OOME as that will be captured in a separate bug.

Comment 1 Libor Zoubek 2015-06-05 11:11:04 UTC
I think fix for Bug 1212950 also partially fixes this Bug, but still OOM is possible (at least in case of Bug 1212950 can agent manage 100EAPs each filling up default 2MB memory with it's log).

I'll merge suggested patch from Larry

Comment 2 Libor Zoubek 2015-06-05 12:06:13 UTC
I was not able to use the patch, git was telling me it was invalid, so I merged it manually

branch:  master
link:    https://github.com/rhq-project/rhq/commit/24bd7c37e
time:    2015-06-05 14:01:24 +0200
commit:  24bd7c37e5a6495025c556779a800566ebda2008
author:  Libor Zoubek - lzoubek@redhat.com
message: Bug 1212933 - Resource start operation leads to broken managed server

         Close output and input streams in case of Error, not just
         Exception is thrown.

Comment 4 Simeon Pinder 2015-07-10 18:55:20 UTC
Available for test with 3.3.3 ER01 build: 
https://brewweb.devel.redhat.com/buildinfo?buildID=446732
 *Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of
 jon-server-3.3.0.GA-update-03.zip.

Comment 5 Filip Brychta 2015-07-16 12:32:11 UTC
After 24h it's still ok -> verified

Comment 7 errata-xmlrpc 2015-07-30 16:42:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1525.html


Note You need to log in before you can comment on or make changes to this bug.