Bug 779738 (SOA-2100)

Summary: jBPM tests are failing with deadlock at the database level
Product: [JBoss] JBoss Enterprise SOA Platform 5 Reporter: Jiri Pechanec <jpechane>
Component: JBPM - within SOAAssignee: Alejandro Guizar <alex.guizar>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.0.0 GA   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: http://jira.jboss.org/jira/browse/SOA-2100
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
5.0.2 CR1
Last Closed: 2010-06-22 21:28:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
jbpmproblem.tar.gz none

Description Jiri Pechanec 2010-05-28 09:31:07 UTC
Date of First Response: 2010-05-31 22:41:21
project_key: SOA

I have executed jBPM tests using SOA-P 5.0.2 CR1 on different databases and JDKs

http://hudson.qa.jboss.com/hudson/view/SOA-Release/job/soa-jbpm-db-OpenJDK-maven/22/
http://hudson.qa.jboss.com/hudson/view/SOA-Release/job/soa-jbpm-db-jdk16-maven/91/

As you can see from the results tests are randomly failing with deadlock on database level - typical occurence is in locking test.

These results has to be thoroughly investigated before release.

Comment 1 Jiri Pechanec 2010-05-28 09:32:11 UTC
Attached are selected logs for different databases and JDKs - look for deadlock keyword.

Comment 2 Jiri Pechanec 2010-05-28 09:32:12 UTC
Attachment: Added: jbpmproblem.tar.gz


Comment 3 Jiri Pechanec 2010-05-28 09:34:47 UTC
It might be possible that the issue is limited to JobExecutor (which has only limited support) but it has to be confirmed.

Comment 4 Alejandro Guizar 2010-06-01 02:41:21 UTC
Re: http://hudson.qa.jboss.com/hudson/view/SOA-Release/job/soa-jbpm-db-OpenJDK-maven/22/
* JBPM522Test assumed that test resources were unpacked; I've fixed that in rXXXX
* Several tests (postgresql82, JBPM2375Test | JBPM1072Test | JBPM1135Test), (postgresql83, JBPM1135Test | JBPM2094Test | JBPM2812Test), (oracle10g, JBPM1135Test) throw an exception directly attributable to the remote data source. A quick search returned a (possibly) related issue: JBAS-7666
java.lang.IllegalAccessException: Failed to find connection: 1952576006
	at org.jboss.resource.adapter.jdbc.remote.WrapperDataSourceService.invoke(WrapperDataSourceService.java:212)
* Only a handful of tests (mysql, JBPM1071Test | JBPM2094Test | JBPM2489Test), (postgresql82, JBPM1071Test), (postgresql83, JBPM1071Test) are linked to deadlocks: I'll check those.


Comment 5 Alejandro Guizar 2010-06-01 02:43:01 UTC
Link: Added: This issue related JBAS-7666


Comment 6 Jiri Pechanec 2010-06-02 11:04:52 UTC
Link: Added: This issue related SOA-2106


Comment 7 Len DiMaggio 2010-06-02 12:15:58 UTC
Link: Added: This issue is related to JBAS-7666


Comment 8 Len DiMaggio 2010-06-02 12:16:00 UTC
Link: Added: This issue is related to SOA-2106


Comment 10 Len DiMaggio 2010-06-03 02:07:06 UTC
Link: Added: This issue related JBPAPP-3970


Comment 11 Len DiMaggio 2010-06-03 02:07:28 UTC
Link: Removed: This issue is related to SOA-2106 


Comment 12 Len DiMaggio 2010-06-03 02:07:37 UTC
Link: Removed: This issue is related to JBAS-7666 


Comment 18 Alejandro Guizar 2010-06-11 17:45:45 UTC
Come to think of it, we can still increase the jbpm.job.retries configuration entry from 3 to, say, 10, and see if that helps. While I can change the default configuration in the jbpm-3.2-soa branch, I believe the SOA-P build overwrites the jBPM configuration at some point. If so, the jbpm.job.retries entry has to be changed in the platform configuration as well.

Note that, as mentioned in IRC, the hudson jobs that connect via JDBC do not exhibit deadlock failures. Since they run in the same nodes and databases as the soa-jbpm jobs, the difference seems to lie in the remote data source.

Comment 20 Alejandro Guizar 2010-06-12 02:32:11 UTC
Checked in increase of jbpm.job.retries to 10 to branch jbpm-3.2-soa, r6406.

Comment 21 Jiri Pechanec 2010-06-14 13:49:21 UTC
Thanks for the feedback, just two more questions
1) Is it fix or workaround? Is it expected by the nature order of things that there will be deadlocks on the database level? Should be there deadlocks?
2) connection closed, probably a remoting issue - could you elaborate more on this? You mean JBoss Remoting, or what kind of remoting? This was never a problem with remote JNDI datasource

Comment 22 Alejandro Guizar 2010-06-15 20:36:56 UTC
1) Every app that permits concurrent updates to its database is subject to deadlocks. These are not a big deal, no bigger than a closed network connection or any other temporary database failure, except when its frequency affects throughput. jBPM mitigates the occurrence of deadlocks by relying on optimistic concurrency control for everything but joining tokens. Without pessimistic locks, the join node risks missing updates and leaving process instances stuck forever.  It is worth noting that, looking at the hudson jobs, these deadlocks only occur on MySQL, and then only in the most demanding concurrency tests (JBPM2094Test, JBPM2489Test, JBPM2787Test).
In sum, increasing the number of retries is a workaround for a mitigable though not completely avoidable problem.

2) I should have been clearer. I meant a problem with the remote data source. On a second look, the error message reads "Database is already closed (to disable automatic closing at VM shutdown, add ";DB_CLOSE_ON_EXIT=FALSE" to the db URL)". So this is not the problem with the data source returning the same connection to different threads described in JBAS-7811. How the database was closed is beyond me - jBPM never explicitly attempts to close the database.
I found this related issue that Kevin resolved: JBESB-1712. There he disabled the H2 shutdown hook, presumably by setting DB_CLOSE_ON_EXIT to FALSE to the connection URL. From what I read in the console output, the connection URL [ jdbc:h2:tcp://localhost:9092/jbpmDB;MVCC=TRUE ] does not specify this parameter. I do not know whether this applies to the current situation, though.
http://hudson.qa.jboss.com/hudson/view/SOA-Release/job/soa-jbpm-os-jdk16-maven/jdk=java16_default,label=sol9_sparc/45/consoleFull

Comment 23 Jiri Pechanec 2010-06-18 04:46:06 UTC
Please resolve the issue so I can close it - tests are no longer failing. I have seen once on Sparc 9 problem with Database is already closed but I will open new issue just for this one if it is going to be a real problem. Right now I consider this issue to be fixed and verified on SOA-P 5.0.2 CR2

Comment 24 Alejandro Guizar 2010-06-22 21:28:48 UTC
Resolving, increasing the number of job retries countered the deadlocks that high-concurrency test cases (JBPM-1071, JBPM-2094, JBPM-2489) experience under MySQL and, to a lesser degree, PostgreSQL.