Bug 999149 - JMS Api fails to start process under heavy load.
Summary: JMS Api fails to start process under heavy load.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss BPMS Platform 6
Classification: Retired
Component: jBPM Core
Version: 6.0.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ER4
: 6.0.0
Assignee: Marco Rietveld
QA Contact: Jiri Svitak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-20 19:59 UTC by Marek Baluch
Modified: 2016-09-20 05:04 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-08-06 20:10:48 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
server.log (6.47 MB, text/x-log)
2013-08-20 19:59 UTC, Marek Baluch
no flags Details

Description Marek Baluch 2013-08-20 19:59:15 UTC
Created attachment 788628 [details]
server.log

Description of problem:

The JMS when faced with many incoming messages at once then the StartProcessCommand fails on a NPE in ProcessInstanceImpl.startProcess().

See attached server log for the full stack.

Comment 2 Marek Baluch 2013-08-20 20:01:07 UTC
This one blocks test execution as we cannot get precise performance readings because of it.

Comment 3 Marco Rietveld 2013-08-30 10:22:21 UTC
Marek, do you know the following: 

1. Which runtime were you using? (Singleton/per process instance/per request)
2. Which persistence were you using? (H2? Postgresql?)

If you have a git link for, or more info about the test that caused this (from jbossqe/brms? process-flood?), that would also be great. 

Thanks.

Comment 4 Marek Baluch 2013-08-30 10:36:22 UTC
will retest with a supported db.

Comment 5 Marco Rietveld 2013-09-03 09:19:53 UTC
This bug is weird: 

Looking at the stack trace, we see the following: 

at org.drools.persistence.SingleSessionCommandService.execute(SingleSessionCommandService.java:395) [drools-persistence-jpa-6.0.0-redhat-1.jar:6.0.0-redhat-1]

The SingleSessionCommandService.execute(..) method is synchronized and to paraphrase Goetz (and the JLS): synchronization guarantees memory visibility. 

In other words, any changes made to any shared variables or fields before the end of the synchronized block (by one thread) are guaranteed to be visible to all other threads that have access to said variables or fields. 

The NPE shown in the log happens because the ProcessInstanceImpl.kruntime field is null. This field is only set (and unset) inside the SSCS.execute block -- sort of. 

Long story short, this field is set to null after the commit in a (JTA) transaction synchronization afterCompletion() method. "Normally", the commit happens inside the synchronization block. However, somehow, the commit is happening outside of the synchronized execute method.

Comment 6 Marek Baluch 2013-09-03 09:26:05 UTC
Finished organizational work and wrapped up test devel. Do you still require me to try it out on a production database?

Comment 7 Marco Rietveld 2013-09-04 10:16:13 UTC
Fixed with these commits: 

https://github.com/droolsjbpm/droolsjbpm-integration/commit/6e9301ec489e212555f42e6bbe562a9bad933157
https://github.com/droolsjbpm/droolsjbpm-integration/commit/d59bf29680cd60957538c2f53a01f7259ad1761e

Marek, no need to try it out against a production database: I'm 99% sure that it's a race condition caused by the fact that the tx commit happened outside of KieSession control.  

However, the sooner you can verify that these commits fix this issue, the better.

Comment 8 Marco Rietveld 2013-09-05 12:29:20 UTC
Also, the following commit addresses the NPE referenced in the logs: 

https://github.com/droolsjbpm/jbpm/commit/750de38123f280e5fea737a2015457a2be0f836a

Comment 12 Jiri Svitak 2013-10-15 15:33:42 UTC
I don't know from the description how to reproduce the problem. It's probably the same cause as in similar verified bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1004375
https://bugzilla.redhat.com/show_bug.cgi?id=1004761

Verified in BPMS 6 ER4.


Note You need to log in before you can comment on or make changes to this bug.