Red Hat Bugzilla – Bug 1018233
OOM issue when JON server simultaneously working with multiple agents
Last modified: 2014-01-02 15:35:40 EST
Created attachment 811138 [details]
Error shown in JON UI when OOM error occured
Description of problem:
OOM issue when running EAP 6 JON plugins tests in parallel for multiple environment setting. It looks like when there is more than 20 or 30 agents running in parallel it can result in OOM error on the server side.
WARNING [org.jboss.netty.channel.socket.nio.AbstractNioSelector] (New I/O worker #27) Unexpected exception in the selector loop.: java.lang.OutOfMemoryError: GC overhead limit exceeded
00:00:33,821 ERROR [org.rhq.enterprise.server.system.SystemManagerBean] (EJB default - 3) Failed to reload the system config cache - will try again later. Cause: java.lang.OutOfMemoryError: GC overhead limit exceeded
00:00:19,422 WARN [org.jboss.as.ejb3] (EJB default - 6) JBAS014143: A previous execution of timer [rhq.rhq-server.ServerManagerBean 13291c23-e2e4-4152-a3b6-07b42578ef2a] is still in progress, skipping this overlapping scheduled execution at: Fri Oct 11 00:00:19 EDT 2013
00:00:19,422 WARN [org.jboss.as.ejb3] (EJB default - 8) JBAS014143: A previous execution of timer [rhq.rhq-server.CacheConsistencyManagerBean 77912439-5119-4369-9633-4b6e59220a8d] is still in progress, skipping this overlapping scheduled execution at: Fri Oct 11 00:00:19 EDT 2013
23:59:57,375 ERROR [org.jboss.as.ejb3.invocation] (EJB default - 2) JBAS014134: EJB Invocation failed on component ServerManagerBean for method public abstract void org.rhq.enterprise.server.cloud.instance.ServerManagerLocal.beat(): javax.ejb.EJBException: JBAS014580: Unexpected Error
Version-Release number of selected component (if applicable):
How reproducible: don't know yet, trying to reproduce again
Steps to Reproduce:
1 JON 3.2 server
multiple agents (30-50 should be enough)
1 EAP6 server and 1 EAP6 Domain per agent
run EAP6 automation on each in parallel
JON server is in frozen with OOM exception being shown in server logs
The machine itself has more than 10 GB of memory
We also need all the sizing info p / s / s, metrics per minute and so on.
There is in the admin ui the "dump info to log" button. Hit that and add the information that is emitted in the server log to this BZ please.
Can we please get a complete *HEAP* dump? That we can analyze?
comment #35 indicates possible items which may need to be documented in release notes. adding sunny-dee to the :cc list, and adding the keyword
question: will the commit on comment #35 be in ER5?
Author: Lukas Krejci <email@example.com>
Date: Thu Oct 31 18:32:17 2013 +0100
[BZ 1018233] - Fix leaks of model controller client instances
This should prevent a deadlock in the java finalizer thread that
seems to be caused by an attempt to close the leaked clients
in a constrained environment where the server is (close to) unable
create new threads.
Moving to ON_QA for test with new brew build.
I no longer see the OOM issue when using JON 3.2.0.ER5