1234592 – Race condition with multiple job executor threads on Oracle

Bug 1234592 - Race condition with multiple job executor threads on Oracle

Summary: Race condition with multiple job executor threads on Oracle

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	JBoss BPMS Platform 6
Classification:	Retired
Component:	jBPM Core
Sub Component:
Version:	6.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	ER1
Target Release:	6.2.0
Assignee:	Alessandro Lazarotti
QA Contact:	Radovan Synek
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1234806 1240665
TreeView+	depends on / blocked

Reported:	2015-06-22 18:40 UTC by Martin Weiler
Modified:	2020-03-27 20:07 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Clones:	1234806 1240665 (view as bug list)
Environment:
Last Closed:	2020-03-27 20:07:28 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Description Martin Weiler 2015-06-22 18:40:03 UTC

Description of problem:
When using AsyncWorkItemHandler with Oracle, a job can be picked up by two threads when the thread pool size is larger than 1. For queries which use rownum + for update, Hibernate issues not a single query, but breaks this up in two. This allows for the same job being processed by different AvailableJobsExecutor threads:

2015-06-21 12:53:58,489 DEBUG [org.jbpm.executor.impl.AvailableJobsExecutor] (EJB default - 10) Executor Thread org.jbpm.executor.impl.AvailableJobsExecutor@434eb850 Waking Up!!!
2015-06-21 12:53:58,489 DEBUG [org.jbpm.executor.impl.AvailableJobsExecutor] (EJB default - 9) Executor Thread org.jbpm.executor.impl.AvailableJobsExecutor@6d182f77 Waking Up!!!
2015-06-21 12:53:58,490 INFO  [stdout] (EJB default - 10) Hibernate: select * from ( select requestinf0_.id...) ) where rownum <= ?

2015-06-21 12:53:58,492 INFO  [stdout] (EJB default - 9) Hibernate: select * from ( select requestinf0_.id ...) ) where rownum <= ?

2015-06-21 12:53:58,556 INFO  [stdout] (EJB default - 10) Hibernate: select id from RequestInfo where id =? for update

2015-06-21 12:53:58,559 INFO  [stdout] (EJB default - 9) Hibernate: select id from RequestInfo where id =? for update

This shows that both threads can obtain the same job instance, as outlined in the related Hibernate issue:
"Yes it does allow a slight possibility that a row might be updated or locked between the initial select (paging) and the subsequent lock attempt." from https://hibernate.atlassian.net/browse/HHH-1168?focusedCommentId=48846&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-48846

Comment 2 Maciej Swiderski 2015-07-07 12:48:26 UTC

solution to is to provide configurable initial delay of executor work threads so they are not firing at exact same time, custom Oracle10gDialect that disables follow on locking used by hibernate by default. 

implemented on master with following commits

jbpm
master:
https://github.com/droolsjbpm/jbpm/commit/5de80525e91db774b988718b05046cba64e40120
https://github.com/droolsjbpm/jbpm/commit/7bbe6d993f7e6d30bf1c95a71cf38a7cefd4b960

Comment 3 Tomas Livora 2015-07-10 14:52:36 UTC

Maciej,

I have just tested these changes on one-off patch (bug 1240665) and they seem to fix the issue. However, I am curious how we are going to propagate this in the future. Will all our customers be advised to use this new hibernate dialect? Should we do all our Oracle testing with this dialect? Because it will quite complicate testing process since we rely on default dialects. Is the new hibernate dialect the only way how to resolve this issue?

Comment 4 Maciej Swiderski 2015-07-10 15:59:37 UTC

Tomas,

I would say for these users how have high volume requirement on executor and Oracle they should use this dialect. 

Ideally it would be best for hibernate team to include this dialect in there to allow use of it as one of the default dialects.

When you look at default settings where only single thread is running it will not cause issues. Increasing number of threads to a reasonable size like number of cores should still work well. So my opinion is it should not be used as default one and only if the default is not capable of handling the load.

Comment 5 Tomas Livora 2015-11-09 18:03:26 UTC

Verified on BPMS 6.2.0 ER5

https://github.com/droolsjbpm/jbpm/pull/328

Comment 6 Tomas Livora 2015-11-16 13:44:20 UTC

This should be documented since the fix means that customers should use org.jbpm.persistence.jpa.hibernate.DisabledFollowOnLockOracle10gDialect instead of org.hibernate.dialect.Oracle10gDialect if they want to avoid these problems with job executor on Oracle databases.

Note You need to log in before you can comment on or make changes to this bug.