Bug 1018233 - OOM issue when JON server simultaneously working with multiple agents
Summary: OOM issue when JON server simultaneously working with multiple agents
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Core Server
Version: JON 3.2
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ER05
: JON 3.2.0
Assignee: Lukas Krejci
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1012435 1025767
TreeView+ depends on / blocked
 
Reported: 2013-10-11 14:04 UTC by Radim Hatlapatka
Modified: 2014-01-02 20:35 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-02 20:35:40 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Error shown in JON UI when OOM error occured (47.56 KB, image/png)
2013-10-11 14:04 UTC, Radim Hatlapatka
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1023451 0 unspecified CLOSED [perf] Large retained heap during inventory report merge (causing OOMs) 2021-02-22 00:41:40 UTC

Internal Links: 1023451

Description Radim Hatlapatka 2013-10-11 14:04:45 UTC
Created attachment 811138 [details]
Error shown in JON UI when OOM error occured

Description of problem:
OOM issue when running EAP 6 JON plugins tests in parallel for multiple environment setting. It looks like when there is more than 20 or 30 agents running in parallel it can result in OOM error on the server side.


WARNING [org.jboss.netty.channel.socket.nio.AbstractNioSelector] (New I/O worker #27) Unexpected exception in the selector loop.: java.lang.OutOfMemoryError: GC overhead limit exceeded
00:00:33,821 ERROR [org.rhq.enterprise.server.system.SystemManagerBean] (EJB default - 3) Failed to reload the system config cache - will try again later. Cause: java.lang.OutOfMemoryError: GC overhead limit exceeded
00:00:19,422 WARN  [org.jboss.as.ejb3] (EJB default - 6) JBAS014143: A previous execution of timer [rhq.rhq-server.ServerManagerBean 13291c23-e2e4-4152-a3b6-07b42578ef2a] is still in progress, skipping this overlapping scheduled execution at: Fri Oct 11 00:00:19 EDT 2013
00:00:19,422 WARN  [org.jboss.as.ejb3] (EJB default - 8) JBAS014143: A previous execution of timer [rhq.rhq-server.CacheConsistencyManagerBean 77912439-5119-4369-9633-4b6e59220a8d] is still in progress, skipping this overlapping scheduled execution at: Fri Oct 11 00:00:19 EDT 2013
23:59:57,375 ERROR [org.jboss.as.ejb3.invocation] (EJB default - 2) JBAS014134: EJB Invocation failed on component ServerManagerBean for method public abstract void org.rhq.enterprise.server.cloud.instance.ServerManagerLocal.beat(): javax.ejb.EJBException: JBAS014580: Unexpected Error



Version-Release number of selected component (if applicable):
JON 3.2.0.ER3

How reproducible: don't know yet, trying to reproduce again


Steps to Reproduce:
    1 JON 3.2 server
    multiple agents (30-50 should be enough)
    1 EAP6 server and 1 EAP6 Domain per agent
    run EAP6  automation on each in parallel 

Actual results:
JON server is in frozen with OOM exception being shown in server logs

Expected results:


Additional info:
The machine itself has more than 10 GB of memory

Comment 3 Heiko W. Rupp 2013-10-11 15:27:30 UTC
We also need all the sizing info p / s / s, metrics per minute and so on.
There is in the admin ui the "dump info to log" button. Hit that and add the information that is emitted in the server log to this BZ please.

Comment 9 Heiko W. Rupp 2013-10-14 16:36:55 UTC
Can we please get a complete *HEAP* dump? That we can analyze?

Comment 36 Mike Foley 2013-10-31 18:06:25 UTC
comment #35 indicates possible items which may need to be documented in release notes.  adding sunny-dee to the :cc list, and adding the keyword 

question:  will  the commit on comment #35 be in ER5?

Comment 40 Lukas Krejci 2013-11-01 12:49:52 UTC
release/jon3.2.x 6123befdd990458fb1564204c0ba0721c7d3b1ad
master 3e5d0121a5b383762315b6acd9557c3fd70e0106
Author: Lukas Krejci <lkrejci>
Date:   Thu Oct 31 18:32:17 2013 +0100

    [BZ 1018233] - Fix leaks of model controller client instances
    
    This should prevent a deadlock in the java finalizer thread that
    seems to be caused by an attempt to close the leaked clients
    in a constrained environment where the server is (close to) unable
    create new threads.

Comment 41 Simeon Pinder 2013-11-07 02:17:12 UTC
Moving to ON_QA for test with new brew build.

Comment 42 Radim Hatlapatka 2013-11-12 15:09:50 UTC
I no longer see the OOM issue when using JON 3.2.0.ER5


Note You need to log in before you can comment on or make changes to this bug.