Created attachment 788628 [details] server.log Description of problem: The JMS when faced with many incoming messages at once then the StartProcessCommand fails on a NPE in ProcessInstanceImpl.startProcess(). See attached server log for the full stack.
This one blocks test execution as we cannot get precise performance readings because of it.
Marek, do you know the following: 1. Which runtime were you using? (Singleton/per process instance/per request) 2. Which persistence were you using? (H2? Postgresql?) If you have a git link for, or more info about the test that caused this (from jbossqe/brms? process-flood?), that would also be great. Thanks.
will retest with a supported db.
This bug is weird: Looking at the stack trace, we see the following: at org.drools.persistence.SingleSessionCommandService.execute(SingleSessionCommandService.java:395) [drools-persistence-jpa-6.0.0-redhat-1.jar:6.0.0-redhat-1] The SingleSessionCommandService.execute(..) method is synchronized and to paraphrase Goetz (and the JLS): synchronization guarantees memory visibility. In other words, any changes made to any shared variables or fields before the end of the synchronized block (by one thread) are guaranteed to be visible to all other threads that have access to said variables or fields. The NPE shown in the log happens because the ProcessInstanceImpl.kruntime field is null. This field is only set (and unset) inside the SSCS.execute block -- sort of. Long story short, this field is set to null after the commit in a (JTA) transaction synchronization afterCompletion() method. "Normally", the commit happens inside the synchronization block. However, somehow, the commit is happening outside of the synchronized execute method.
Finished organizational work and wrapped up test devel. Do you still require me to try it out on a production database?
Fixed with these commits: https://github.com/droolsjbpm/droolsjbpm-integration/commit/6e9301ec489e212555f42e6bbe562a9bad933157 https://github.com/droolsjbpm/droolsjbpm-integration/commit/d59bf29680cd60957538c2f53a01f7259ad1761e Marek, no need to try it out against a production database: I'm 99% sure that it's a race condition caused by the fact that the tx commit happened outside of KieSession control. However, the sooner you can verify that these commits fix this issue, the better.
Also, the following commit addresses the NPE referenced in the logs: https://github.com/droolsjbpm/jbpm/commit/750de38123f280e5fea737a2015457a2be0f836a
I don't know from the description how to reproduce the problem. It's probably the same cause as in similar verified bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1004375 https://bugzilla.redhat.com/show_bug.cgi?id=1004761 Verified in BPMS 6 ER4.