Hide Forgot
Date of First Response: 2010-05-31 22:41:21 project_key: SOA I have executed jBPM tests using SOA-P 5.0.2 CR1 on different databases and JDKs http://hudson.qa.jboss.com/hudson/view/SOA-Release/job/soa-jbpm-db-OpenJDK-maven/22/ http://hudson.qa.jboss.com/hudson/view/SOA-Release/job/soa-jbpm-db-jdk16-maven/91/ As you can see from the results tests are randomly failing with deadlock on database level - typical occurence is in locking test. These results has to be thoroughly investigated before release.
Attached are selected logs for different databases and JDKs - look for deadlock keyword.
Attachment: Added: jbpmproblem.tar.gz
It might be possible that the issue is limited to JobExecutor (which has only limited support) but it has to be confirmed.
Re: http://hudson.qa.jboss.com/hudson/view/SOA-Release/job/soa-jbpm-db-OpenJDK-maven/22/ * JBPM522Test assumed that test resources were unpacked; I've fixed that in rXXXX * Several tests (postgresql82, JBPM2375Test | JBPM1072Test | JBPM1135Test), (postgresql83, JBPM1135Test | JBPM2094Test | JBPM2812Test), (oracle10g, JBPM1135Test) throw an exception directly attributable to the remote data source. A quick search returned a (possibly) related issue: JBAS-7666 java.lang.IllegalAccessException: Failed to find connection: 1952576006 at org.jboss.resource.adapter.jdbc.remote.WrapperDataSourceService.invoke(WrapperDataSourceService.java:212) * Only a handful of tests (mysql, JBPM1071Test | JBPM2094Test | JBPM2489Test), (postgresql82, JBPM1071Test), (postgresql83, JBPM1071Test) are linked to deadlocks: I'll check those.
Link: Added: This issue related JBAS-7666
Link: Added: This issue related SOA-2106
Link: Added: This issue is related to JBAS-7666
Link: Added: This issue is related to SOA-2106
Link: Added: This issue related JBPAPP-3970
Link: Removed: This issue is related to SOA-2106
Link: Removed: This issue is related to JBAS-7666
I have tried to re-run the tests with plain JDBC. It is better but there are still two issues present http://hudson.qa.jboss.com/hudson/view/SOA-Release/job/soa-jbpm-os-jdk16-maven/jdk=java16_default,label=sol9_sparc/45/ http://hudson.qa.jboss.com/hudson/view/SOA-Release/job/soa-jbpm-db-OpenJDK-maven/DATABASE=mysql51,jdk=openjdk-local,label=RHEL5_x86_64_res/24/
Come to think of it, we can still increase the jbpm.job.retries configuration entry from 3 to, say, 10, and see if that helps. While I can change the default configuration in the jbpm-3.2-soa branch, I believe the SOA-P build overwrites the jBPM configuration at some point. If so, the jbpm.job.retries entry has to be changed in the platform configuration as well. Note that, as mentioned in IRC, the hudson jobs that connect via JDBC do not exhibit deadlock failures. Since they run in the same nodes and databases as the soa-jbpm jobs, the difference seems to lie in the remote data source.
@Jiri regarding the two failing tests: http://hudson.qa.jboss.com/hudson/view/SOA-Release/job/soa-jbpm-os-jdk16-maven/jdk=java16_default,label=sol9_sparc/45/ <-- connection closed, probably a remoting issue http://hudson.qa.jboss.com/hudson/view/SOA-Release/job/soa-jbpm-db-OpenJDK-maven/DATABASE=mysql51,jdk=openjdk-local,label=RHEL5_x86_64_res/24/ <-- deadlock
Checked in increase of jbpm.job.retries to 10 to branch jbpm-3.2-soa, r6406.
Thanks for the feedback, just two more questions 1) Is it fix or workaround? Is it expected by the nature order of things that there will be deadlocks on the database level? Should be there deadlocks? 2) connection closed, probably a remoting issue - could you elaborate more on this? You mean JBoss Remoting, or what kind of remoting? This was never a problem with remote JNDI datasource
1) Every app that permits concurrent updates to its database is subject to deadlocks. These are not a big deal, no bigger than a closed network connection or any other temporary database failure, except when its frequency affects throughput. jBPM mitigates the occurrence of deadlocks by relying on optimistic concurrency control for everything but joining tokens. Without pessimistic locks, the join node risks missing updates and leaving process instances stuck forever. It is worth noting that, looking at the hudson jobs, these deadlocks only occur on MySQL, and then only in the most demanding concurrency tests (JBPM2094Test, JBPM2489Test, JBPM2787Test). In sum, increasing the number of retries is a workaround for a mitigable though not completely avoidable problem. 2) I should have been clearer. I meant a problem with the remote data source. On a second look, the error message reads "Database is already closed (to disable automatic closing at VM shutdown, add ";DB_CLOSE_ON_EXIT=FALSE" to the db URL)". So this is not the problem with the data source returning the same connection to different threads described in JBAS-7811. How the database was closed is beyond me - jBPM never explicitly attempts to close the database. I found this related issue that Kevin resolved: JBESB-1712. There he disabled the H2 shutdown hook, presumably by setting DB_CLOSE_ON_EXIT to FALSE to the connection URL. From what I read in the console output, the connection URL [ jdbc:h2:tcp://localhost:9092/jbpmDB;MVCC=TRUE ] does not specify this parameter. I do not know whether this applies to the current situation, though. http://hudson.qa.jboss.com/hudson/view/SOA-Release/job/soa-jbpm-os-jdk16-maven/jdk=java16_default,label=sol9_sparc/45/consoleFull
Please resolve the issue so I can close it - tests are no longer failing. I have seen once on Sparc 9 problem with Database is already closed but I will open new issue just for this one if it is going to be a real problem. Right now I consider this issue to be fixed and verified on SOA-P 5.0.2 CR2
Resolving, increasing the number of job retries countered the deadlocks that high-concurrency test cases (JBPM-1071, JBPM-2094, JBPM-2489) experience under MySQL and, to a lesser degree, PostgreSQL.