Bug 999149

Summary: JMS Api fails to start process under heavy load.
Product: [Retired] JBoss BPMS Platform 6 Reporter: Marek Baluch <mbaluch>
Component: jBPM CoreAssignee: Marco Rietveld <mrietvel>
Status: CLOSED CURRENTRELEASE QA Contact: Jiri Svitak <jsvitak>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.0.0CC: kverlaen, mbaluch, smcgowan
Target Milestone: ER4Keywords: TestBlocker
Target Release: 6.0.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-06 20:10:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
server.log none

Description Marek Baluch 2013-08-20 19:59:15 UTC
Created attachment 788628 [details]
server.log

Description of problem:

The JMS when faced with many incoming messages at once then the StartProcessCommand fails on a NPE in ProcessInstanceImpl.startProcess().

See attached server log for the full stack.

Comment 2 Marek Baluch 2013-08-20 20:01:07 UTC
This one blocks test execution as we cannot get precise performance readings because of it.

Comment 3 Marco Rietveld 2013-08-30 10:22:21 UTC
Marek, do you know the following: 

1. Which runtime were you using? (Singleton/per process instance/per request)
2. Which persistence were you using? (H2? Postgresql?)

If you have a git link for, or more info about the test that caused this (from jbossqe/brms? process-flood?), that would also be great. 

Thanks.

Comment 4 Marek Baluch 2013-08-30 10:36:22 UTC
will retest with a supported db.

Comment 5 Marco Rietveld 2013-09-03 09:19:53 UTC
This bug is weird: 

Looking at the stack trace, we see the following: 

at org.drools.persistence.SingleSessionCommandService.execute(SingleSessionCommandService.java:395) [drools-persistence-jpa-6.0.0-redhat-1.jar:6.0.0-redhat-1]

The SingleSessionCommandService.execute(..) method is synchronized and to paraphrase Goetz (and the JLS): synchronization guarantees memory visibility. 

In other words, any changes made to any shared variables or fields before the end of the synchronized block (by one thread) are guaranteed to be visible to all other threads that have access to said variables or fields. 

The NPE shown in the log happens because the ProcessInstanceImpl.kruntime field is null. This field is only set (and unset) inside the SSCS.execute block -- sort of. 

Long story short, this field is set to null after the commit in a (JTA) transaction synchronization afterCompletion() method. "Normally", the commit happens inside the synchronization block. However, somehow, the commit is happening outside of the synchronized execute method.

Comment 6 Marek Baluch 2013-09-03 09:26:05 UTC
Finished organizational work and wrapped up test devel. Do you still require me to try it out on a production database?

Comment 7 Marco Rietveld 2013-09-04 10:16:13 UTC
Fixed with these commits: 

https://github.com/droolsjbpm/droolsjbpm-integration/commit/6e9301ec489e212555f42e6bbe562a9bad933157
https://github.com/droolsjbpm/droolsjbpm-integration/commit/d59bf29680cd60957538c2f53a01f7259ad1761e

Marek, no need to try it out against a production database: I'm 99% sure that it's a race condition caused by the fact that the tx commit happened outside of KieSession control.  

However, the sooner you can verify that these commits fix this issue, the better.

Comment 8 Marco Rietveld 2013-09-05 12:29:20 UTC
Also, the following commit addresses the NPE referenced in the logs: 

https://github.com/droolsjbpm/jbpm/commit/750de38123f280e5fea737a2015457a2be0f836a

Comment 12 Jiri Svitak 2013-10-15 15:33:42 UTC
I don't know from the description how to reproduce the problem. It's probably the same cause as in similar verified bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1004375
https://bugzilla.redhat.com/show_bug.cgi?id=1004761

Verified in BPMS 6 ER4.