Bug 1043589

Summary: "OutOfMemoryError: PermGen space" during deploy affects the admin console operation
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: William Antônio <wsiqueir>
Component: Domain ManagementAssignee: Brian Stansberry <brian.stansberry>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Kremensky <pkremens>
Severity: high Docs Contact: Russell Dickenson <rdickens>
Priority: high    
Version: 6.1.0CC: acavalla, chuffman, csutherl, emuckenh, jlivings, kkhan, lkonno, steven.post, tim.peeters
Target Milestone: ER6   
Target Release: EAP 6.3.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-28 15:29:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1109758    
Bug Blocks:    
Attachments:
Description Flags
Attached a screenshot of the Web Console from the domain controller that fails to list any servers from the domain after the OOM PermGen during deploy. none

Description William Antônio 2013-12-16 18:02:31 UTC
Description of problem:

In a domain mode setup, when a deploy fail in a server with error "java.lang.OutOfMemoryError: PermGen space", it breaks the Management Interface. Sometimes the servers aren't listed or the servers just goes away.

Seems that the host controller gets busy waiting the deploy, but it will never finish since the JVM became irresponsible. Notice: The OOME should happens on deploy using CLI or the management interface itself.

The environment went back to normal after we restart the host controller that contains the server which failed.

Version-Release number of selected component (if applicable):

6.1


How reproducible:

Easy to reproduce, but requires patience...

Steps to Reproduce:

JVM:

[rmartine@rmartine ~]$ java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

EAP can be 6.1 or 6.2;  (customer is using EAP 6.1, but we could reproduce with EAP 6.2)

* Configure an EAP environment with 1 Domain Controller, 2 Host Controllers with 1 server each;
* Set a small size for the PermGen space (in our test was used 100mb);
* Pick a host controller and deploys as many application as you can until you have the "java.lang.OutOfMemoryError: PermGen space" during the deploy action;

Actual results:

* Once you have the OOME, go to the management console. You should notice some misbehavior with it, such as servers not being listed, or the metrics aren't being updated anymore;
* After restarting the host controller, everything should be back working again.

Expected results:

It should not impact the management console. The deployment action should recover from the error or kill the server automatically.

Additional info:

Ricardo, who reproduce the error, captured a Thread Dump and notice two threads in waiting status, see at the end of this message.

Customer is using EAP 6.1, but in EAP 6.2 a feature allows him to kill a server manually using the management consoler. Also, OOEM PermGen Space is not EAP issue itself, the JEE application seems bigger than usual (it was 90mb initially, after working on it, it's 14mb, mainly due libraries packaged with the application)

--

domain-connection-threads - 5 [WAITING]
sun.misc.Unsafe.park(boolean, long)
java.util.concurrent.locks.LockSupport.park(Object)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(int)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(int)
java.util.concurrent.CountDownLatch.await()
org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ProxyOperationControlProxy.operationPrepared(ModelController$OperationTransaction, ModelNode)
org.jboss.as.controller.ModelControllerImpl$2.operationPrepared(ModelController$OperationTransaction, ModelNode)
org.jboss.as.controller.AbstractOperationContext.doCompleteStep()
org.jboss.as.controller.AbstractOperationContext.completeStepInternal()
org.jboss.as.controller.AbstractOperationContext.finishStep(AbstractOperationContext$Step)
org.jboss.as.controller.AbstractOperationContext.executeStep(AbstractOperationContext$Step)
org.jboss.as.controller.AbstractOperationContext.doCompleteStep()
org.jboss.as.controller.AbstractOperationContext.completeStepInternal()
org.jboss.as.controller.AbstractOperationContext.executeOperation()
org.jboss.as.controller.ModelControllerImpl.internalExecute(ModelNode, OperationMessageHandler, ModelController$OperationTransactionControl, OperationAttachments, OperationStepHandler)
org.jboss.as.controller.ModelControllerImpl.execute(ModelNode, OperationMessageHandler, ModelController$OperationTransactionControl, OperationAttachments)
org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ExecuteRequestHandler.doExecute(ModelNode, int, ManagementRequestContext)
org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ExecuteRequestHandler$1.run()<2 recursive calls>
java.security.AccessController.doPrivileged(PrivilegedAction, AccessControlContext)
javax.security.auth.Subject.doAs(Subject, PrivilegedAction)
org.jboss.as.controller.AccessAuditContext.doAs(Subject, PrivilegedAction)
org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ExecuteRequestHandler$2$1.run()<2 recursive calls>
java.security.AccessController.doPrivileged(PrivilegedAction)
org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ExecuteRequestHandler$2.execute(ManagementRequestContext)
org.jboss.as.protocol.mgmt.AbstractMessageHandler$2$1.doExecute()
org.jboss.as.protocol.mgmt.AbstractMessageHandler$AsyncTaskRunner.run()
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Runnable)
java.util.concurrent.ThreadPoolExecutor$Worker.run()
java.lang.Thread.run()
org.jboss.threads.JBossThread.run()


domain-connection-threads - 6 [WAITING]
sun.misc.Unsafe.park(boolean, long)
java.util.concurrent.locks.LockSupport.park(Object)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()
java.util.concurrent.ArrayBlockingQueue.take()
org.jboss.as.controller.remote.RemoteProxyController.execute(ModelNode, OperationMessageHandler, ProxyController$ProxyOperationControl, OperationAttachments)
org.jboss.as.controller.TransformingProxyController$TransformingProxyControllerImpl.execute(ModelNode, OperationMessageHandler, ProxyController$ProxyOperationControl, OperationAttachments)
org.jboss.as.controller.ProxyStepHandler.execute(OperationContext, ModelNode)
org.jboss.as.controller.AbstractOperationContext.executeStep(AbstractOperationContext$Step)
org.jboss.as.controller.AbstractOperationContext.doCompleteStep()
org.jboss.as.controller.AbstractOperationContext.completeStepInternal()
org.jboss.as.controller.AbstractOperationContext.executeOperation()
org.jboss.as.controller.ModelControllerImpl.internalExecute(ModelNode, OperationMessageHandler, ModelController$OperationTransactionControl, OperationAttachments, OperationStepHandler)
org.jboss.as.controller.ModelControllerImpl.execute(ModelNode, OperationMessageHandler, ModelController$OperationTransactionControl, OperationAttachments)
org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ExecuteRequestHandler.doExecute(ModelNode, int, ManagementRequestContext)
org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ExecuteRequestHandler$1.run()<2 recursive calls>
java.security.AccessController.doPrivileged(PrivilegedAction, AccessControlContext)
javax.security.auth.Subject.doAs(Subject, PrivilegedAction)
org.jboss.as.controller.AccessAuditContext.doAs(Subject, PrivilegedAction)
org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ExecuteRequestHandler$2$1.run()<2 recursive calls>
java.security.AccessController.doPrivileged(PrivilegedAction)
org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ExecuteRequestHandler$2.execute(ManagementRequestContext)
org.jboss.as.protocol.mgmt.AbstractMessageHandler$2$1.doExecute()
org.jboss.as.protocol.mgmt.AbstractMessageHandler$AsyncTaskRunner.run()
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Runnable)
java.util.concurrent.ThreadPoolExecutor$Worker.run()
java.lang.Thread.run()
org.jboss.threads.JBossThread.run()

Comment 1 Tim Peeters 2014-01-23 14:44:27 UTC
I wanted to update this case because my customer ran into the same issue using EAP 6.1 and 6.1.1 (we have not confirmed 6.2 yet). Because of this bug, my customer switched from domain mode to standalone mode.

We see exactly the same symptoms as described above. If an OOM PermGen occurs during a deploy (in our case via JON), the host controller is completely stuck. But worse, in our case this also affects the domain controller. Even when killing the server and host controller that ran into the issue, the domain controller does not recover. The Web Console of the domain controller becomes unresponsive and fails to list any servers (even those of other unrelated host controllers with other server groups).

A Red Hat support case was opened for this issue and the information in this ticket might be useful for whoever is working on this bug:
https://access.redhat.com/support/cases/00924443/

Comment 2 Petr Kremensky 2014-01-23 14:50:43 UTC
I was able to reproduce this on EAP 6.2.

Comment 3 Tim Peeters 2014-01-23 14:54:07 UTC
Created attachment 854422 [details]
Attached a screenshot of the Web Console from the domain controller that fails to list any servers from the domain after the OOM PermGen during deploy.

Comment 10 Petr Kremensky 2014-06-18 13:44:04 UTC
Verified on EAP 6.3.0.ER7.