Bug 1252327 - TransactionLockInterceptor cannot unlock on transaction timeout
Summary: TransactionLockInterceptor cannot unlock on transaction timeout
Keywords:
Status: CLOSED EOL
Alias: None
Product: JBoss BPMS Platform 6
Classification: Retired
Component: jBPM Core
Version: 6.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ER4
: 6.2.0
Assignee: Alessandro Lazarotti
QA Contact: Radovan Synek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-08-11 07:51 UTC by Toshiya Kobayashi
Modified: 2020-03-27 20:11 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-27 20:11:43 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
helloTaskRepo.zip (60.27 KB, application/zip)
2015-08-12 07:32 UTC, Toshiya Kobayashi
no flags Details
slowAfterTLIrelease.rule (220 bytes, text/plain)
2015-08-12 07:33 UTC, Toshiya Kobayashi
no flags Details
taskAndScriptSleep.bpmn2 (12.36 KB, application/xml)
2015-08-14 08:03 UTC, Toshiya Kobayashi
no flags Details
rewards-reproducer.zip (53.66 KB, application/zip)
2015-08-18 05:18 UTC, Toshiya Kobayashi
no flags Details
rewards-basic-tx-timeout-workaround.zip (19.87 KB, application/zip)
2015-08-19 14:27 UTC, Toshiya Kobayashi
no flags Details

Description Toshiya Kobayashi 2015-08-11 07:51:59 UTC
Description of problem:

When transaction timeout happened in JBoss EAP, TransactionSynchronization.afterCompletion() is called in "Transaction Reaper" thread which is not the holder of the TransactionLockInterceptor.lock. So it fails to unlock the lock:

=====
10:10:43,929 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff8c6e8f22:-484f9624:55c9502d:22293 in state  RUN
10:10:43,939 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012095: Abort of action id 0:ffff8c6e8f22:-484f9624:55c9502d:22293 invoked while multiple threads active within it.
10:10:43,940 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012108: CheckedAction::check - atomic action 0:ffff8c6e8f22:-484f9624:55c9502d:22293 aborting with 1 threads active!
10:10:43,944 WARN  [com.arjuna.ats.jta] (Transaction Reaper Worker 0) ARJUNA016029: SynchronizationImple.afterCompletion - failed for org.drools.persistence.jta.JtaTransactionSynchronizationAdapter@1f56fc6c with exception: java.lang.IllegalMonitorStateException
        at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:155) [rt.jar:1.7.0_67]
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260) [rt.jar:1.7.0_67]
        at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:460) [rt.jar:1.7.0_67]
        at org.drools.persistence.jta.TransactionLockInterceptor$1.afterCompletion(TransactionLockInterceptor.java:92)
        at org.drools.persistence.TransactionSynchronizationContainer.afterCompletion(TransactionSynchronizationContainer.java:26)
        at org.drools.persistence.jta.JtaTransactionSynchronizationAdapter.afterCompletion(JtaTransactionSynchronizationAdapter.java:25)
        at com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple.afterCompletion(SynchronizationImple.java:96)
        at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.afterCompletion(TwoPhaseCoordinator.java:402)
        at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.afterCompletion(TwoPhaseCoordinator.java:334)
        at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.cancel(TwoPhaseCoordinator.java:120)
        at com.arjuna.ats.arjuna.AtomicAction.cancel(AtomicAction.java:215)
        at com.arjuna.ats.arjuna.coordinator.TransactionReaper.doCancellations(TransactionReaper.java:370)
        at com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.run(ReaperWorkerThread.java:78)
====

So no one will unlock the lock and then subsequent threads will be stuck (= system unresponsive).

Steps to Reproduce:

--- Sorry, I have not yet confirmed reproduce steps. I will update this BZ once confirmed.

Comment 3 Toshiya Kobayashi 2015-08-12 07:32:30 UTC
Created attachment 1061824 [details]
helloTaskRepo.zip

Comment 4 Toshiya Kobayashi 2015-08-12 07:33:05 UTC
Created attachment 1061825 [details]
slowAfterTLIrelease.rule

Comment 5 Toshiya Kobayashi 2015-08-12 07:35:56 UTC
Here are the reproduce steps in BPMS 6.1.0.

1. Change the default timeout value to 5 seconds.

	<subsystem xmlns="urn:jboss:domain:transactions:1.5">
	    <core-environment>
	        <process-id>
	            <uuid/>
	        </process-id>
	    </core-environment>
	    <recovery-environment socket-binding="txn-recovery-environment" status-socket-binding="txn-status-manager"/>
	    <coordinator-environment default-timeout="5"/>
	</subsystem>

2. Start BPMS
3. Login to business-central
4. Go to [Authoring]->[Administration]
5. Clone the attached repository (helloTaskRepo.zip)
6. Go to [Authoring]->[Project Authoring]->[Open Project Editor]
7. If your login user is not 'bpmsAdmin', edit 'helloTask' bpmn2 to set your username in 'Actors' of 'Task1' and save.
8. Press [Build & Deploy]
9. Go to [Process Management]->[Process Definitions] 
10. Start "helloProcess"
11. Go to [Tasks]->[Task List]
12. Start the Task
13. Attach Byteman
$ bin/bminstall.sh <PID>
$ bin/bmsubmit.sh -l slowAfterTLIrelease.rule
14. Complete the Task
 -> "TransactionLockInterceptor.release() exitting" is logged and sleeps 10 seconds.
 -> Transaction timeout happens

Actual Result:

Transaction Reaper thread fails to unlock.

======
16:05:27,153 WARN  [com.arjuna.ats.jta] (Transaction Reaper Worker 0) ARJUNA016029: SynchronizationImple.afterCompletion - failed for org.drools.persistence.jta.JtaTransactionSynchronizationAdapter@425855aa with exception: java.lang.IllegalMonitorStateException
	at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:155) [rt.jar:1.7.0_67]
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260) [rt.jar:1.7.0_67]
	at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:460) [rt.jar:1.7.0_67]
	at org.drools.persistence.jta.TransactionLockInterceptor$1.afterCompletion(TransactionLockInterceptor.java:99)
	at org.drools.persistence.TransactionSynchronizationContainer.afterCompletion(TransactionSynchronizationContainer.java:33)
	at org.drools.persistence.jta.JtaTransactionSynchronizationAdapter.afterCompletion(JtaTransactionSynchronizationAdapter.java:25)
	at com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple.afterCompletion(SynchronizationImple.java:96) [jbossjts-jacorb-4.17.29.Final-redhat-1.jar:4.17.29.Final-redhat-1]
	at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.afterCompletion(TwoPhaseCoordinator.java:402) [jbossjts-jacorb-4.17.29.Final-redhat-1.jar:4.17.29.Final-redhat-1]
	at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.afterCompletion(TwoPhaseCoordinator.java:334) [jbossjts-jacorb-4.17.29.Final-redhat-1.jar:4.17.29.Final-redhat-1]
	at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.cancel(TwoPhaseCoordinator.java:120) [jbossjts-jacorb-4.17.29.Final-redhat-1.jar:4.17.29.Final-redhat-1]
	at com.arjuna.ats.arjuna.AtomicAction.cancel(AtomicAction.java:215) [jbossjts-jacorb-4.17.29.Final-redhat-1.jar:4.17.29.Final-redhat-1]
	at com.arjuna.ats.arjuna.coordinator.TransactionReaper.doCancellations(TransactionReaper.java:370) [jbossjts-jacorb-4.17.29.Final-redhat-1.jar:4.17.29.Final-redhat-1]
	at com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.run(ReaperWorkerThread.java:78) [jbossjts-jacorb-4.17.29.Final-redhat-1.jar:4.17.29.Final-redhat-1]
======

Then, if you try to start another process instances via business-central, no process instance will be created. If you capture a thread dump then, you will see that threads are blocked at TransactionLockInterceptor lock or SingleSessionCommandService synchronization.

Expected Result:

TransactionLockInterceptor lock is unlocked and subsequent requests will be processed normally.

Comment 6 Maciej Swiderski 2015-08-13 11:47:11 UTC
Toshiya,

the issue with the reproducer is that you explicitly by pass the lock release by holding up on the release method after it registered tx synchronization and actually performed all work.

When you look at the TransactionLockInterceptor.execute method it is capable of releasing lock in case of timeouts and this will be done by finally block.

Here is what happens on transaction timeout (assuming timeout is caught when engine executes the process - meaning it is still in executeNext of that interceptor)
- engine is still in the executeNext method meaning process tasks
- transaction timeout occurs 
- reaper rollbacks transaction
- reaper thread goes into the picture and calls synchronizations (afterCompletion)
- since the executeNext was not completed there is no tx sync for releasing the lock so there should be no IllegalMonitorStateException
- since transaction is rolled back it will the leave executeNext method of TransactionLockInterceptor and go into finally block
- then it will attempt to register tx sync for releasing the lock but that will fail as there is no active transaction any more and will fall back to catch block of release method and unlock it directly

I verified it working correctly on simple case with 5 sec default transaction timeout and process with scrip task that holds the execution for 10 seconds. Bit of results:

##############
13:01:18,185 INFO  [stdout] (http-/127.0.0.1:8080-4) About to wait 10s
13:01:23,162 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffffc0a80067:-12731ea7:55cc77a7:51d in state  RUN
13:01:23,164 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012095: Abort of action id 0:ffffc0a80067:-12731ea7:55cc77a7:51d invoked while multiple threads active within it.
13:01:23,164 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012108: CheckedAction::check - atomic action 0:ffffc0a80067:-12731ea7:55cc77a7:51d aborting with 1 threads active!
13:01:23,165 WARN  [org.hibernate.engine.transaction.synchronization.internal.SynchronizationCallbackCoordinatorTrackingImpl] (Transaction Reaper Worker 0) HHH000451: Transaction afterCompletion called by a background thread; delaying afterCompletion processing until the original thread can handle it. [status=4]
13:01:23,166 WARN  [org.hibernate.engine.transaction.synchronization.internal.SynchronizationCallbackCoordinatorTrackingImpl] (Transaction Reaper Worker 0) HHH000451: Transaction afterCompletion called by a background thread; delaying afterCompletion processing until the original thread can handle it. [status=4]
13:01:23,166 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012121: TransactionReaper::doCancellations worker Thread[Transaction Reaper Worker 0,5,main] successfully canceled TX 0:ffffc0a80067:-12731ea7:55cc77a7:51d
13:01:28,188 INFO  [stdout] (http-/127.0.0.1:8080-4) Done waiting...
13:01:28,189 WARN  [org.drools.persistence.TransactionSynchronizationRegistryHelper] (http-/127.0.0.1:8080-4) Unable to put resource app-updateable-resource value [] due to No transaction is running
13:01:28,189 WARN  [org.drools.persistence.SingleSessionCommandService] (http-/127.0.0.1:8080-4) Could not commit session: java.lang.IllegalStateException: Process instance 2242[Evaluation.timeouttest] is disconnected.
##############

so this clearly fails to complete the process execution but at the same time properly releases the lock.

Does this make sense to you?

Comment 7 Toshiya Kobayashi 2015-08-14 08:02:50 UTC
Thank you for looking into this, Maciej.

> so this clearly fails to complete the process execution but at the same time properly releases the lock.

Yes, this issue only happens when a transaction timeout is triggered AFTER TransactionLockInterceptor.release() and BEFORE transaction commit. (e.g. txm.commit( transactionOwner ); in TaskTransactionInterceptor.execute())

I attached taskAndScriptSleep.bpmn2 which is more common use case where [Human Task] -> [Script (sleep 10 sec)].

You can reproduce this issue with the steps: (now you don't need Byteman)

1. Change the default timeout value to 5 seconds.
2. Start BPMS
3. Create a Business process and import the taskAndScriptSleep.bpmn2
4. If your login user is not 'bpmsAdmin', set your username in 'Actors' of 'Task1' and save.
5. Go to [Authoring]->[Project Authoring]->[Open Project Editor]
6. Press [Build & Deploy]
7. Go to [Process Management]->[Process Definitions] 
8. Start "helloProcess"
9. Go to [Tasks]->[Task List]
10. Start the Task
11. Complete the Task
 -> "going to sleep..." is logged and sleeps 10 seconds.
 -> Transaction timeout happens

In this case, ExternalTaskEventListener.processTaskState() calls session.getEnvironment() before WorkItemManager.completeWorkItem(). So TransactionLockInterceptor.release() is called before the process execution (including the 10 sec sleep).

https://github.com/droolsjbpm/jbpm/blob/6.2.x/jbpm-human-task/jbpm-human-task-workitems/src/main/java/org/jbpm/services/task/wih/ExternalTaskEventListener.java#L64

Comment 8 Toshiya Kobayashi 2015-08-14 08:03:46 UTC
Created attachment 1062917 [details]
taskAndScriptSleep.bpmn2

Comment 9 Maciej Swiderski 2015-08-14 15:10:32 UTC
Toshiya,

created PR with fix for this issue: https://github.com/droolsjbpm/drools/pull/479

please have a look and let me know if that works out for you.

Comment 10 Toshiya Kobayashi 2015-08-18 05:18:51 UTC
Created attachment 1064163 [details]
rewards-reproducer.zip

Comment 11 Toshiya Kobayashi 2015-08-18 05:20:30 UTC
Hi Maciej,

I tested the reproducers (both byteman and taskAndScriptSleep.bpmn2) and verified that the issue is resolved with the fix.

A confirmation;

The forceUnlock logic expects that the lock owner thread is executed again after the transaction timeout. If there is a case like this:

1. Transaction begin
2. Process is executed (TrasactionLockInterceptor enlists ReleaseLockTransactionSynchronization)
3. Slowness (e.g. slow application logic)
4. Transaction timeout -> adds registrationThreadId to forceUnlock list.
5. Transaction commit (actually, it will fail with no transaction)

So there is no chance to unlock until another web request comes into the same thread. Am I correct? (I attached rewards-reproducer.zip for the scenario)

Sorry about sticking to this but I wanted to clarify the risk as it could result in a system-wide unresponsive issue.

I wonder if we can provide any rescue (e.g. re-creating the lock) when an application hits the above issue. If there is no way, we would guide users to reboot the server...?

Comment 12 Maciej Swiderski 2015-08-19 09:53:30 UTC
Toshiya,

since the code is outside of jBPM control we cannot do much about it. It's not the engine that takes the time to timeout transaction. So there could be one way to deal with it in custom code - catch exception and invoke any operation on jBPM api - like get process instance or whatever that will hit the TransactionLockInterceptor and unlock it based on the item in the forceUnlock set.

I updated PR with releaseAfterFailure to be invoked before attempting to lock it to avoid deadlocks for the same thread as initially it was only in finally block.

Let me know what you think.

Comment 13 Toshiya Kobayashi 2015-08-19 14:27:47 UTC
Created attachment 1064894 [details]
rewards-basic-tx-timeout-workaround.zip

Comment 14 Toshiya Kobayashi 2015-08-19 14:37:22 UTC
Thank you, Maciej. That makes sense.

I tested your suggestion and confirmed that worked. It's also useful to sweep out the remaining hibernate synchronizations :)

Attached as rewards-basic-tx-timeout-workaround.zip. I will spend some more time to test this and get back to you soon. Thanks!

Comment 15 Maciej Swiderski 2015-08-19 15:01:05 UTC
thanks Toshiya, I'll wait with merging of that PR to get your confirmation.

Comment 17 Toshiya Kobayashi 2015-08-25 08:01:01 UTC
Hi Maciej,

The customer hit a deadlock with the fix (test patch) on top of BPMS 6.0.3.

-------------------------------------------
Found one Java-level deadlock:
=============================
"pool-50-thread-2":
  waiting for ownable synchronizer 0x000000065669b2c8, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "HTTP-99"
"HTTP-99":
  waiting to lock monitor 0x00007f4ed002f218 (object 0x000000063676a300, a org.drools.persistence.SingleSessionCommandService),
  which is held by "pool-50-thread-2"

Java stack information for the threads listed above:
===================================================
"pool-50-thread-2":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000065669b2c8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
        at org.drools.persistence.jta.TransactionLockInterceptor.execute(TransactionLockInterceptor.java:74)
        at org.drools.persistence.SingleSessionCommandService.execute(SingleSessionCommandService.java:391)
        - locked <0x000000063676a300> (a org.drools.persistence.SingleSessionCommandService)
        at org.jbpm.process.core.timer.impl.GlobalTimerService$DisposableCommandService.execute(GlobalTimerService.java:280)
        at org.jbpm.persistence.timer.GlobalJpaTimerJobInstance.call(GlobalJpaTimerJobInstance.java:62)
        at org.jbpm.persistence.timer.GlobalJpaTimerJobInstance.call(GlobalJpaTimerJobInstance.java:42)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
"HTTP-99":
        at org.drools.persistence.SingleSessionCommandService.execute(SingleSessionCommandService.java:391)
        - waiting to lock <0x000000063676a300> (a org.drools.persistence.SingleSessionCommandService)
        at org.drools.core.command.impl.CommandBasedStatefulKnowledgeSession$1.completeWorkItem(CommandBasedStatefulKnowledgeSession.java:147)
        at org.jbpm.services.task.wih.ExternalTaskEventListener.processTaskState(	.java:75)
        at org.jbpm.services.task.wih.ExternalTaskEventListener.afterTaskCompletedEvent(ExternalTaskEventListener.java:119)
        at org.jbpm.services.task.events.TaskEventSupport.fireAfterTaskCompleted(TaskEventSupport.java:189)
        at org.jbpm.services.task.internals.lifecycle.MVELLifeCycleManager.taskOperation(MVELLifeCycleManager.java:340)
        at org.jbpm.services.task.identity.UserGroupLifeCycleManagerDecorator.taskOperation(UserGroupLifeCycleManagerDecorator.java:46)
        at org.jbpm.services.task.impl.TaskInstanceServiceImpl.complete(TaskInstanceServiceImpl.java:142)
        at org.jbpm.services.task.commands.CompleteTaskCommand.execute(CompleteTaskCommand.java:70)
        at org.jbpm.services.task.commands.CompleteTaskCommand.execute(CompleteTaskCommand.java:37)
        at org.jbpm.services.task.commands.CompositeCommand.execute(CompositeCommand.java:38)
        at org.jbpm.services.task.commands.TaskCommandExecutorImpl$SelfExecutionCommandService.execute(TaskCommandExecutorImpl.java:65)
        at org.drools.core.command.impl.AbstractInterceptor.executeNext(AbstractInterceptor.java:41)
        at org.jbpm.services.task.persistence.TaskTransactionInterceptor.execute(TaskTransactionInterceptor.java:54)
        - locked <0x0000000684b10aa0> (a org.jbpm.services.task.persistence.TaskTransactionInterceptor)
        at org.jbpm.services.task.commands.TaskCommandExecutorImpl.execute(TaskCommandExecutorImpl.java:40)
        at org.jbpm.services.task.impl.command.CommandBasedTaskService.execute(CommandBasedTaskService.java:128)
        at org.kie.remote.services.cdi.ProcessRequestBean.doTaskOperation(ProcessRequestBean.java:249)
        at org.kie.remote.services.cdi.ProcessRequestBean.processCommand(ProcessRequestBean.java:121)
        at org.kie.remote.services.cdi.ProcessRequestBean$Proxy$_$$_WeldClientProxy.processCommand(ProcessRequestBean$Proxy$_$$_WeldClientProxy.java)
        at org.kie.remote.services.rest.ResourceBase.restProcessJaxbCommandsRequest(ResourceBase.java:69)
        at org.kie.remote.services.rest.RuntimeResource.execute(RuntimeResource.java:95)
        at org.kie.remote.services.rest.RuntimeResource$Proxy$_$$_WeldClientProxy.execute(RuntimeResource$Proxy$_$$_WeldClientProxy.java)
        at sun.reflect.GeneratedMethodAccessor505.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:167)
        at org.jboss.resteasy.core.ResourceMethod.invokeOnTarget(ResourceMethod.java:269)
        at org.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:227)
        at org.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:216)
        at org.jboss.resteasy.core.SynchronousDispatcher.getResponse(SynchronousDispatcher.java:542)
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:524)
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:126)
        at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:208)
        at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:55)
        at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:50)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:295)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
        at org.jboss.solder.servlet.exception.CatchExceptionFilter.doFilter(CatchExceptionFilter.java:65)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
        at org.jboss.solder.servlet.event.ServletEventBridgeFilter.doFilter(ServletEventBridgeFilter.java:74)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
        at org.kie.remote.services.rest.jaxb.DynamicJaxbContextFilter.doFilter(DynamicJaxbContextFilter.java:29)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
        at org.jbpm.designer.web.filter.impl.PluggableFilter.doFilter(PluggableFilter.java:70)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
        at org.uberfire.security.server.UberFireSecurityFilter.doFilter(UberFireSecurityFilter.java:266)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:149)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:407)
        at org.jboss.as.jpa.interceptor.WebNonTxEmCloserValve.invoke(WebNonTxEmCloserValve.java:50)
        at org.jboss.as.jpa.interceptor.WebNonTxEmCloserValve.invoke(WebNonTxEmCloserValve.java:50)
        at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:559)
        at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:145)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97)
        at org.apache.catalina.authenticator.SingleSignOn.invoke(SingleSignOn.java:408)
        at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:559)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:336)
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:856)
        at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:653)
        at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:508)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
        at org.jboss.threads.JBossThread.run(JBossThread.java:122)

Found 1 deadlock.
---------------------------------

As this is a patch on top of BPMS 6.0.3 (6.0.3-redhat-6), the line number in the stack trace may not match with yours. But I hope you can understand the sequence:

1. "HTTP-99" was going to do a Task complete operation via REST
2. "HTTP-99" acquired a lock of TransactionLockInterceptor and acquired a lock of SingleSessionCommandService to do something (e.g. session.getEnvironment() in ExternalTaskEventListener.processTaskState())
3. "HTTP-99" released a lock of SingleSessionCommandService but still kept the lock of TransactionLockInterceptor because the transaction was not committed yet.
4. "pool-50-thread-2" was triggered by Timer. Acquired the lock of SingleSessionCommandService but waiting for the lock of TransactionLockInterceptor
5. "HTTP-99" called workItemManager.completeWorkItem() so waiting for the lock of SingleSessionCommandService.

SingleSessionCommandService.java (test patch)
====
390     public synchronized <T> T execute(Command<T> command) {
391         return commandService.execute(command);
392     }
====

Please let me know if you need any further information.

Comment 20 Toshiya Kobayashi 2015-09-01 10:34:34 UTC
Hi Maciej,

I have created a reproducer for the deadlock issue described in comment#17. But certainly this is a different issue from the original one ("unlock on transaction timeout"). I'll file another BZ for it so please go ahead to merge and set to "MODIFIED".

Comment 22 Toshiya Kobayashi 2015-09-03 09:22:54 UTC
Thanks Maciej,

I file a new JIRA https://issues.jboss.org/browse/JBPM-4755 for the deadlock issue.

Comment 23 Tomas Livora 2015-11-23 15:20:34 UTC
Verified on BPMS 6.2.0 CR1

However, I have not written any test for this so far. I do not think we need all the test cases suggested in comment 18. Especially the first reproducer which should use REST API is not a good idea since we would need to set the transaction timeout slightly longer than our current test timeout (5 minutes for WebSphere and 1 minute for other containers) and write a test running longer than this which would prolong our test suite execution time by 10%.

In my opinion the best and sufficient solution is to write a simple test on the engine level. Toshiya, all your reproducers seem to be container-based. You mentioned that it is not possible to write a simple unit test using Bitronix because it does not call TransactionSynchronization.afterCompletion(). So do you have any idea how to bypass this a write a simple reproducer?

And what exactly do you mean by "High load test in business-central including REST API Task complete and frequent process execution triggered by Timer" ?


Note You need to log in before you can comment on or make changes to this bug.