Bug 1018233 - OOM issue when JON server simultaneously working with multiple agents
OOM issue when JON server simultaneously working with multiple agents
Status: CLOSED CURRENTRELEASE
Product: JBoss Operations Network
Classification: JBoss
Component: Core Server (Show other bugs)
JON 3.2
Unspecified Unspecified
urgent Severity urgent
: ER05
: JON 3.2.0
Assigned To: Lukas Krejci
Mike Foley
:
Depends On:
Blocks: 1012435 1025767
  Show dependency treegraph
 
Reported: 2013-10-11 10:04 EDT by Radim Hatlapatka
Modified: 2014-01-02 15:35 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-02 15:35:40 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Error shown in JON UI when OOM error occured (47.56 KB, image/png)
2013-10-11 10:04 EDT, Radim Hatlapatka
no flags Details

  None (edit)
Description Radim Hatlapatka 2013-10-11 10:04:45 EDT
Created attachment 811138 [details]
Error shown in JON UI when OOM error occured

Description of problem:
OOM issue when running EAP 6 JON plugins tests in parallel for multiple environment setting. It looks like when there is more than 20 or 30 agents running in parallel it can result in OOM error on the server side.


WARNING [org.jboss.netty.channel.socket.nio.AbstractNioSelector] (New I/O worker #27) Unexpected exception in the selector loop.: java.lang.OutOfMemoryError: GC overhead limit exceeded
00:00:33,821 ERROR [org.rhq.enterprise.server.system.SystemManagerBean] (EJB default - 3) Failed to reload the system config cache - will try again later. Cause: java.lang.OutOfMemoryError: GC overhead limit exceeded
00:00:19,422 WARN  [org.jboss.as.ejb3] (EJB default - 6) JBAS014143: A previous execution of timer [rhq.rhq-server.ServerManagerBean 13291c23-e2e4-4152-a3b6-07b42578ef2a] is still in progress, skipping this overlapping scheduled execution at: Fri Oct 11 00:00:19 EDT 2013
00:00:19,422 WARN  [org.jboss.as.ejb3] (EJB default - 8) JBAS014143: A previous execution of timer [rhq.rhq-server.CacheConsistencyManagerBean 77912439-5119-4369-9633-4b6e59220a8d] is still in progress, skipping this overlapping scheduled execution at: Fri Oct 11 00:00:19 EDT 2013
23:59:57,375 ERROR [org.jboss.as.ejb3.invocation] (EJB default - 2) JBAS014134: EJB Invocation failed on component ServerManagerBean for method public abstract void org.rhq.enterprise.server.cloud.instance.ServerManagerLocal.beat(): javax.ejb.EJBException: JBAS014580: Unexpected Error



Version-Release number of selected component (if applicable):
JON 3.2.0.ER3

How reproducible: don't know yet, trying to reproduce again


Steps to Reproduce:
    1 JON 3.2 server
    multiple agents (30-50 should be enough)
    1 EAP6 server and 1 EAP6 Domain per agent
    run EAP6  automation on each in parallel 

Actual results:
JON server is in frozen with OOM exception being shown in server logs

Expected results:


Additional info:
The machine itself has more than 10 GB of memory
Comment 3 Heiko W. Rupp 2013-10-11 11:27:30 EDT
We also need all the sizing info p / s / s, metrics per minute and so on.
There is in the admin ui the "dump info to log" button. Hit that and add the information that is emitted in the server log to this BZ please.
Comment 9 Heiko W. Rupp 2013-10-14 12:36:55 EDT
Can we please get a complete *HEAP* dump? That we can analyze?
Comment 36 Mike Foley 2013-10-31 14:06:25 EDT
comment #35 indicates possible items which may need to be documented in release notes.  adding sunny-dee to the :cc list, and adding the keyword 

question:  will  the commit on comment #35 be in ER5?
Comment 40 Lukas Krejci 2013-11-01 08:49:52 EDT
release/jon3.2.x 6123befdd990458fb1564204c0ba0721c7d3b1ad
master 3e5d0121a5b383762315b6acd9557c3fd70e0106
Author: Lukas Krejci <lkrejci@redhat.com>
Date:   Thu Oct 31 18:32:17 2013 +0100

    [BZ 1018233] - Fix leaks of model controller client instances
    
    This should prevent a deadlock in the java finalizer thread that
    seems to be caused by an attempt to close the leaked clients
    in a constrained environment where the server is (close to) unable
    create new threads.
Comment 41 Simeon Pinder 2013-11-06 21:17:12 EST
Moving to ON_QA for test with new brew build.
Comment 42 Radim Hatlapatka 2013-11-12 10:09:50 EST
I no longer see the OOM issue when using JON 3.2.0.ER5

Note You need to log in before you can comment on or make changes to this bug.