Created attachment 811138 [details] Error shown in JON UI when OOM error occured Description of problem: OOM issue when running EAP 6 JON plugins tests in parallel for multiple environment setting. It looks like when there is more than 20 or 30 agents running in parallel it can result in OOM error on the server side. WARNING [org.jboss.netty.channel.socket.nio.AbstractNioSelector] (New I/O worker #27) Unexpected exception in the selector loop.: java.lang.OutOfMemoryError: GC overhead limit exceeded 00:00:33,821 ERROR [org.rhq.enterprise.server.system.SystemManagerBean] (EJB default - 3) Failed to reload the system config cache - will try again later. Cause: java.lang.OutOfMemoryError: GC overhead limit exceeded 00:00:19,422 WARN [org.jboss.as.ejb3] (EJB default - 6) JBAS014143: A previous execution of timer [rhq.rhq-server.ServerManagerBean 13291c23-e2e4-4152-a3b6-07b42578ef2a] is still in progress, skipping this overlapping scheduled execution at: Fri Oct 11 00:00:19 EDT 2013 00:00:19,422 WARN [org.jboss.as.ejb3] (EJB default - 8) JBAS014143: A previous execution of timer [rhq.rhq-server.CacheConsistencyManagerBean 77912439-5119-4369-9633-4b6e59220a8d] is still in progress, skipping this overlapping scheduled execution at: Fri Oct 11 00:00:19 EDT 2013 23:59:57,375 ERROR [org.jboss.as.ejb3.invocation] (EJB default - 2) JBAS014134: EJB Invocation failed on component ServerManagerBean for method public abstract void org.rhq.enterprise.server.cloud.instance.ServerManagerLocal.beat(): javax.ejb.EJBException: JBAS014580: Unexpected Error Version-Release number of selected component (if applicable): JON 3.2.0.ER3 How reproducible: don't know yet, trying to reproduce again Steps to Reproduce: 1 JON 3.2 server multiple agents (30-50 should be enough) 1 EAP6 server and 1 EAP6 Domain per agent run EAP6 automation on each in parallel Actual results: JON server is in frozen with OOM exception being shown in server logs Expected results: Additional info: The machine itself has more than 10 GB of memory
We also need all the sizing info p / s / s, metrics per minute and so on. There is in the admin ui the "dump info to log" button. Hit that and add the information that is emitted in the server log to this BZ please.
Can we please get a complete *HEAP* dump? That we can analyze?
comment #35 indicates possible items which may need to be documented in release notes. adding sunny-dee to the :cc list, and adding the keyword question: will the commit on comment #35 be in ER5?
release/jon3.2.x 6123befdd990458fb1564204c0ba0721c7d3b1ad master 3e5d0121a5b383762315b6acd9557c3fd70e0106 Author: Lukas Krejci <lkrejci> Date: Thu Oct 31 18:32:17 2013 +0100 [BZ 1018233] - Fix leaks of model controller client instances This should prevent a deadlock in the java finalizer thread that seems to be caused by an attempt to close the leaked clients in a constrained environment where the server is (close to) unable create new threads.
Moving to ON_QA for test with new brew build.
I no longer see the OOM issue when using JON 3.2.0.ER5