Bug 1147948

Summary: (6.4.z) Hanging EJB threads because of a persistent timer and failed deployment
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Jan Martiska <jmartisk>
Component: EJBAssignee: Enrique Gonzalez Martinez <egonzale>
Status: CLOSED CURRENTRELEASE QA Contact: Jan Martiska <jmartisk>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.4.0CC: bbaranow, bmaxwell, cdewolf, egonzale, jbilek, msochure, ppalaga
Target Milestone: CR1   
Target Release: EAP 6.4.12   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: eap6412-proposed
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-17 13:10:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1375585    
Attachments:
Description Flags
reproducer none

Description Jan Martiska 2014-09-30 11:43:14 UTC
Created attachment 942672 [details]
reproducer

If an application with a persistent EJB timer is about to be re-deployed while there are queued timeouts for that timer, the taks for serving these timeouts seem to be fired *before* the whole deployment is processed. These threads are then scheduled to wait for the EJB component to be started (and then invoke the timeout method). However, if the re-deployment fails for some reason, these threads will remain stuck and never return. 

Consequently, if there are 10 or more queued timeouts before the failed deployment attempt, all EJB service threads will get stuck (by default, EJB subsystem uses a thread pool of 10 threads max) and EAP will be unable to process any EJB calls (including timer timeouts). Also, EAP will get stuck during an attempt to shut down.

The stack trace of a stuck thread looks like this:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:502)
org.jboss.as.ee.component.BasicComponent.waitForComponentStart(BasicComponent.java:117)
org.jboss.as.ee.component.BasicComponent.constructComponentInstance(BasicComponent.java:147)
org.jboss.as.ee.component.BasicComponent.constructComponentInstance(BasicComponent.java:135)
org.jboss.as.ee.component.BasicComponent.createInstance(BasicComponent.java:90)
org.jboss.as.ejb3.component.stateless.StatelessSessionComponent$1.create(StatelessSessionComponent.java:64)
org.jboss.as.ejb3.component.stateless.StatelessSessionComponent$1.create(StatelessSessionComponent.java:61)
org.jboss.as.ejb3.pool.AbstractPool.create(AbstractPool.java:60)
org.jboss.as.ejb3.pool.strictmax.StrictMaxPool.get(StrictMaxPool.java:123)
org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:47)
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInOurTx(CMTTxInterceptor.java:274)
org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:341)
org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:240)
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41)
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64)
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:55)
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45)
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
org.jboss.as.ejb3.timerservice.TimedObjectInvokerImpl.callTimeout(TimedObjectInvokerImpl.java:101)
org.jboss.as.ejb3.timerservice.task.CalendarTimerTask.callTimeout(CalendarTimerTask.java:60)
org.jboss.as.ejb3.timerservice.task.TimerTask.run(TimerTask.java:132)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
org.jboss.threads.JBossThread.run(JBossThread.java:122)

Attaching a reproducer with two deployments - good/ejb.jar and bad/ejb.jar (they need to have the same filename).
- deploy the 'good' one - it creates a persistent timer firing every second
- undeploy it and wait ~10 seconds for the timeouts to queue up
- try to deploy the 'bad' one - the deployment will fail, because there is a @RequestScoped annotation on a @Stateless EJB, which is against CDI spec (and that's the only difference from the 'good' one)
- see stuck EJB service threads and EAP being unable to stop using ctrl-c

Comment 1 Enrique Gonzalez Martinez 2014-11-20 16:31:37 UTC
The TimerService starts executing the pending timeouts after the BasicComponentCreateService starts. This happens before failing the deployment due to the WeldBootstrapService failure (in this case)

The lock happens before the ComponentStartService starts.This service invokes the BasicComponent::start resposible for unlocking any service waiting in BasicComponent.waitForComponentStart.

Comment 2 Enrique Gonzalez Martinez 2015-03-13 09:38:00 UTC
upstream: https://github.com/wildfly/wildfly/pull/7006
6.4.x https://github.com/jbossas/jboss-eap/pull/2351

Comment 4 Mike McCune 2016-03-28 22:33:16 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 6 Jiří Bílek 2016-11-04 14:26:30 UTC
Verified with EAP 6.4.12.CP.CR1

Comment 7 Petr Penicka 2017-01-17 13:10:12 UTC
Retroactively bulk-closing issues from released EAP 6.4 cummulative patches.