Description of problem: When using AsyncWorkItemHandler with Oracle, a job can be picked up by two threads when the thread pool size is larger than 1. For queries which use rownum + for update, Hibernate issues not a single query, but breaks this up in two. This allows for the same job being processed by different AvailableJobsExecutor threads: 2015-06-21 12:53:58,489 DEBUG [org.jbpm.executor.impl.AvailableJobsExecutor] (EJB default - 10) Executor Thread org.jbpm.executor.impl.AvailableJobsExecutor@434eb850 Waking Up!!! 2015-06-21 12:53:58,489 DEBUG [org.jbpm.executor.impl.AvailableJobsExecutor] (EJB default - 9) Executor Thread org.jbpm.executor.impl.AvailableJobsExecutor@6d182f77 Waking Up!!! 2015-06-21 12:53:58,490 INFO [stdout] (EJB default - 10) Hibernate: select * from ( select requestinf0_.id...) ) where rownum <= ? 2015-06-21 12:53:58,492 INFO [stdout] (EJB default - 9) Hibernate: select * from ( select requestinf0_.id ...) ) where rownum <= ? 2015-06-21 12:53:58,556 INFO [stdout] (EJB default - 10) Hibernate: select id from RequestInfo where id =? for update 2015-06-21 12:53:58,559 INFO [stdout] (EJB default - 9) Hibernate: select id from RequestInfo where id =? for update This shows that both threads can obtain the same job instance, as outlined in the related Hibernate issue: "Yes it does allow a slight possibility that a row might be updated or locked between the initial select (paging) and the subsequent lock attempt." from https://hibernate.atlassian.net/browse/HHH-1168?focusedCommentId=48846&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-48846
solution to is to provide configurable initial delay of executor work threads so they are not firing at exact same time, custom Oracle10gDialect that disables follow on locking used by hibernate by default. implemented on master with following commits jbpm master: https://github.com/droolsjbpm/jbpm/commit/5de80525e91db774b988718b05046cba64e40120 https://github.com/droolsjbpm/jbpm/commit/7bbe6d993f7e6d30bf1c95a71cf38a7cefd4b960
Maciej, I have just tested these changes on one-off patch (bug 1240665) and they seem to fix the issue. However, I am curious how we are going to propagate this in the future. Will all our customers be advised to use this new hibernate dialect? Should we do all our Oracle testing with this dialect? Because it will quite complicate testing process since we rely on default dialects. Is the new hibernate dialect the only way how to resolve this issue?
Tomas, I would say for these users how have high volume requirement on executor and Oracle they should use this dialect. Ideally it would be best for hibernate team to include this dialect in there to allow use of it as one of the default dialects. When you look at default settings where only single thread is running it will not cause issues. Increasing number of threads to a reasonable size like number of cores should still work well. So my opinion is it should not be used as default one and only if the default is not capable of handling the load.
Verified on BPMS 6.2.0 ER5 https://github.com/droolsjbpm/jbpm/pull/328
This should be documented since the fix means that customers should use org.jbpm.persistence.jpa.hibernate.DisabledFollowOnLockOracle10gDialect instead of org.hibernate.dialect.Oracle10gDialect if they want to avoid these problems with job executor on Oracle databases.