Bug 778804 (SOA-1254) - Deadlock in JobExecutor on Oracle10g
Summary: Deadlock in JobExecutor on Oracle10g
Keywords:
Status: CLOSED NEXTRELEASE
Alias: SOA-1254
Product: JBoss Enterprise SOA Platform 4
Classification: JBoss
Component: JBPM - within SOA
Version: 4.3 CP01
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.3 CP01
Assignee: Martin Vecera
QA Contact:
URL: http://jira.jboss.org/jira/browse/SOA...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-03-26 15:27 UTC by Martin Vecera
Modified: 2009-04-02 14:54 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Oracle10g, other databases (MySQL, PostgreSQL) work fine
Last Closed: 2009-04-02 14:54:28 UTC
Type: Bug


Attachments (Terms of Use)
Performance4.esb (15.18 KB, application/octet-stream)
2009-03-26 15:28 UTC, Martin Vecera
no flags Details
server.log.bz2 (273.65 KB, application/x-bzip)
2009-03-26 15:28 UTC, Martin Vecera
no flags Details
server.tracker.log.bz2 (548.07 KB, application/x-bzip)
2009-03-27 13:45 UTC, Martin Vecera
no flags Details
indexes.diff (22.88 KB, application/octet-stream)
2009-03-31 09:02 UTC, Alejandro Guizar
no flags Details
bpm_orchestration_deadlock.tar.bz2 (37.66 KB, application/x-bzip)
2009-03-31 14:31 UTC, Martin Vecera
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker JBPM-2094 0 Major Resolved Unindexed Foreign Keys cause deadlocks in oracle 2013-12-30 07:06:30 UTC
Red Hat Issue Tracker SOA-1254 0 Blocker Closed Deadlock in JobExecutor on Oracle10g 2013-12-30 07:06:30 UTC

Description Martin Vecera 2009-03-26 15:27:38 UTC
Date of First Response: 2009-03-28 04:21:05
project_key: SOA

We found the following issue when trying to execute performance test based on bpm_orchestration4. We are using the attached .esb archive (including sources) and sending 10 messages (transacted JMS session, persistent messages, messages are of type TextMessage).

2009-03-26 16:15:37,469 ERROR [org.hibernate.event.def.AbstractFlushingEventListener] (JbpmJobExector:10.34.32.154:1) Could not synchronize database state with session
org.hibernate.exception.LockAcquisitionException: Could not execute JDBC batch update
        at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:87)
        at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
        at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:254)
        at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:93)
        at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:88)
        at org.hibernate.persister.entity.AbstractEntityPersister.delete(AbstractEntityPersister.java:2487)
        at org.hibernate.persister.entity.AbstractEntityPersister.delete(AbstractEntityPersister.java:2697)
        at org.hibernate.action.EntityDeleteAction.execute(EntityDeleteAction.java:74)
        at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:250)
        at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:234)
        at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:146)
        at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
        at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
        at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000)
        at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:338)
        at org.hibernate.transaction.CacheSynchronization.beforeCompletion(CacheSynchronization.java:59)
        at com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple.beforeCompletion(SynchronizationImple.java:114)
        at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.beforeCompletion(TwoPhaseCoordinator.java:247)
        at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:86)
        at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:177)
        at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1406)
        at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:135)
        at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:87)
        at org.jboss.tm.usertx.client.ServerVMClientUserTransaction.commit(ServerVMClientUserTransaction.java:146)
        at org.jbpm.persistence.jta.JtaDbPersistenceService.endUserTransaction(JtaDbPersistenceService.java:94)
        at org.jbpm.persistence.jta.JtaDbPersistenceService.close(JtaDbPersistenceService.java:63)
        at org.jbpm.svc.Services.close(Services.java:294)
        at org.jbpm.JbpmContext.close(JbpmContext.java:136)
        at org.jbpm.job.executor.JobExecutorThread.executeJob(JobExecutorThread.java:194)
        at org.jbpm.job.executor.JobExecutorThread.run(JobExecutorThread.java:60)
Caused by: java.sql.BatchUpdateException: ORA-00060: deadlock detected while waiting for resource

        at oracle.jdbc.driver.DatabaseError.throwBatchUpdateException(DatabaseError.java:343)
        at oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:10698)
        at org.jboss.resource.adapter.jdbc.WrappedStatement.executeBatch(WrappedStatement.java:620)
        at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48)
        at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:247)
        ... 27 more

The messages cannot be delivered and end in DLQ.

We tried to use jms-jca-provider but it didn't help. Attached is a server log showing the problem with DEBUG level.

Comment 1 Martin Vecera 2009-03-26 15:28:14 UTC
Attachment: Added: Performance4.esb


Comment 2 Martin Vecera 2009-03-26 15:28:32 UTC
Attachment: Added: server.log.bz2


Comment 3 Martin Vecera 2009-03-26 15:31:38 UTC
The issue was introduced in CR4 when jBPM was updated to fix the performance problem.

Comment 4 Martin Vecera 2009-03-27 13:45:51 UTC
Attached server log with tracker JDBC driver installed on JbpmDS.

Comment 5 Martin Vecera 2009-03-27 13:45:51 UTC
Attachment: Added: server.tracker.log.bz2


Comment 6 Kevin Conner 2009-03-28 08:21:05 UTC
My current theory, from looking through the last log, is that the deadlock is caused by a modification to the JBPM_BYTEARRAY table, specifically 'delete from JBPM_BYTEARRAY where ID_=?.'

From what I can see there are two tables which reference this table as a foreign key, JBPM_BYTEBLOCK and JBPM_VARIABLEINSTANCE, but I cannot find any indexes for their constraints.  I believe the outcome of this is that Oracle will be forced into enforcing a table lock onto these tables whenever a row is deleted from JBPM_BYTEARRAY.

My knowledge of Oracle is limited so advice should be sought from other sources, however I believe that this theory could easily be tested from a command line client.


Comment 7 Kevin Conner 2009-03-28 08:22:32 UTC
I should also add that I believe a solution would be to create the indexes on the foreign key constraints.

We should also check all other constraints to see whether they require indexes.


Comment 8 Alejandro Guizar 2009-03-28 15:36:55 UTC
Kevin, thank you for your insights. In the original log that Martin posted, the error occurs around inserting a ByteArray. Deadlocks during both insertion and deletion can be explained by a missing index resulting in locking the whole table/block.

I am going to review the index declarations in the relationships you mention and others.

Comment 9 Kevin Conner 2009-03-29 08:41:59 UTC
Hiya Alejandro.

The original log shows the same issue as the second, with the last statement being executed on the deadlocked thread being the full table lock on delete of JBPM_BYTEARRAY.

The insert you are referring to is from a different thread and is not relevant to the deadlock, that thread should continue once the attempt to acquire a full table lock has been broken.

Kev


Comment 10 Alejandro Guizar 2009-03-30 00:49:39 UTC
Link: Added: This issue depends JBPM-1367


Comment 11 Alejandro Guizar 2009-03-30 20:46:48 UTC
One of our users seems to have hit this very same problem recently, and proposed some index additions: JBPM-2094

Comment 12 Alejandro Guizar 2009-03-30 20:46:48 UTC
Link: Added: This issue related JBPM-2094


Comment 13 Alejandro Guizar 2009-03-31 08:46:35 UTC
I committed a test case based on the process definition in bpm_orchestration 4. 

http://fisheye.jboss.com/changelog/JbpmSvn/?cs=4363

Unfortunately I could not reproduce the problem.

http://jbpm.dyndns.org:8180/hudson/job/jBPM3-JDK1.5/5/

Any thoughts as to what is missing in my test case? A pointer to the original test case would help too.

Comment 14 Alejandro Guizar 2009-03-31 09:02:07 UTC
The attached patch provides the missing foreign key indexes in many-to-one mappings. It is ready to be applied once the deadlock is reproduced.

Comment 15 Alejandro Guizar 2009-03-31 09:02:07 UTC
Attachment: Added: indexes.diff


Comment 16 trev 2009-03-31 11:27:37 UTC
 The messages must be persistent and there must be a  fork/join in the process

Comment 17 trev 2009-03-31 11:28:14 UTC
We are applying the patch and will hand it over to QE to try the test aginst it

Comment 18 Kevin Conner 2009-03-31 12:08:07 UTC
The real trigger for this instance of the bug is concurrent access (from any source) to the BYTEARRAY variable tables within the context of a transaction.

This is not necessarily from the same process instance although the fork/join within the bpm orchestration 4 does provoke it quickly.

Comment 19 Martin Vecera 2009-03-31 14:31:05 UTC
Quickstart to reproduce the problem.
Run with ant deploy, ant deployProcess, ant startProcess

Comment 20 Martin Vecera 2009-03-31 14:31:05 UTC
Attachment: Added: bpm_orchestration_deadlock.tar.bz2


Comment 21 Alejandro Guizar 2009-03-31 22:51:52 UTC
I was able to reproduce at last. I was missing the byte array variable updates as Kevin mentioned.
The good news is that after applying the patch the problem went away :-)

Comment 22 Martin Vecera 2009-04-01 13:14:53 UTC
The fix has been verified with test build from Julian and Trevor. Waiting for another CR to resolve the issue.

Comment 23 Martin Vecera 2009-04-02 14:54:28 UTC
Verified with CR6.


Note You need to log in before you can comment on or make changes to this bug.