1212933 – Resource start operation leads to broken managed server

Bug 1212933 - Resource start operation leads to broken managed server

Summary: Resource start operation leads to broken managed server

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	JBoss Operations Network
Classification:	JBoss
Component:	Agent
Sub Component:
Version:	JON 3.3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	ER01
Target Release:	JON 3.3.3
Assignee:	Libor Zoubek
QA Contact:	Filip Brychta
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-04-17 17:34 UTC by Larry O'Leary
Modified:	2019-05-20 11:44 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-07-30 16:42:01 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
Proposed patch that fixes this (5.07 KB, patch) 2015-04-17 17:34 UTC, Larry O'Leary	no flags	Details \| Diff
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1212950	unspecified	CLOSED	EAP 6 start operation causes agent to run out of memory due to storing console output in an unused buffer	2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution)	1409153	None	None	None	Never
Red Hat Product Errata	RHSA-2015:1525	normal	SHIPPED_LIVE	Moderate: Red Hat JBoss Operations Network 3.3.3 update	2015-07-30 20:41:08 UTC

Internal Links: 1212950

Description Larry O'Leary 2015-04-17 17:34:56 UTC

Created attachment 1015704 [details]
Proposed patch that fixes this

Description of problem:
When starting a server resource using the _Start_ resource operation, such as what is used for a standalone JBoss EAP 6 server or FSW server, the server resource will eventually stop functioning. In other words, the act of starting a server from JBoss ON results in the started server seizing at some point in the future.

Version-Release number of selected component (if applicable):
3.3.1

How reproducible:
Always

Steps to Reproduce:
1.  Install, configure and start JBoss ON 3.3.0 system.
2.  Import RHQ Server resource into inventory.
3.  From RHQ Server resource's Subsystems / Core / logging / Console Handlers / CONSOLE set configuration property _Level_ to `ALL`.
4.  From RHQ Server resource's Subsystems / Core / logging / Loggers set configuration property _Level_ to `ALL` for all child resources.
5.  Invoke the _Restart_ resource operation for the RHQ Server resource.
6.  Generate logging output in the EAP standalone server. This can be done using a command similar to:

        while true; do curl http://localhost:9999; sleep 1; done

Actual results:
JBoss EAP server stops working after an hour or two. No errors are logged.

Expected results:
JBoss EAP server continues to work as normal. Error is logged to agent.log indicating that an unexpected error occurred while capturing console output.

Additional info:
This issue is a result of not properly handling a potential error in StreamRedirectorRunnable. In the event of a failure that is not of type Exception, the input and output streams are not being closed but the thread is stopped. This causes the sdtout and stderr streams to buffer until the buffer is filled to capacity. Because there is nothing reading from the buffer any longer, the process/thread writing to the stream (this lives in the managed resource such as the JBoss EAP server) will block. The end result is the managed server stops functioning.

In this specific scenario, the unexpected error is an OOME that is caused by capturing the processes console output forever without actually doing anything with it. This issue could have been avoided by using a finally block.

Attached patch demonstrates the necessary changes to properly close the streams no matter what. This does not directly fix the OOME as that will be captured in a separate bug.

Comment 1 Libor Zoubek 2015-06-05 11:11:04 UTC

I think fix for Bug 1212950 also partially fixes this Bug, but still OOM is possible (at least in case of Bug 1212950 can agent manage 100EAPs each filling up default 2MB memory with it's log).

I'll merge suggested patch from Larry

Comment 2 Libor Zoubek 2015-06-05 12:06:13 UTC

I was not able to use the patch, git was telling me it was invalid, so I merged it manually

branch:  master
link:    https://github.com/rhq-project/rhq/commit/24bd7c37e
time:    2015-06-05 14:01:24 +0200
commit:  24bd7c37e5a6495025c556779a800566ebda2008
author:  Libor Zoubek - lzoubek
message: Bug 1212933 - Resource start operation leads to broken managed server

         Close output and input streams in case of Error, not just
         Exception is thrown.

Comment 4 Simeon Pinder 2015-07-10 18:55:20 UTC

Available for test with 3.3.3 ER01 build: 
https://brewweb.devel.redhat.com/buildinfo?buildID=446732
 *Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of
 jon-server-3.3.0.GA-update-03.zip.

Comment 5 Filip Brychta 2015-07-16 12:32:11 UTC

After 24h it's still ok -> verified

Comment 7 errata-xmlrpc 2015-07-30 16:42:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1525.html

Note You need to log in before you can comment on or make changes to this bug.