Bug 1022587 - BPEL History requests time out with production-level historical data in db
BPEL History requests time out with production-level historical data in db
Status: VERIFIED
Product: JBoss Enterprise SOA Platform 5
Classification: JBoss
Component: riftsaw (Show other bugs)
5.3.1
Unspecified Unspecified
unspecified Severity unspecified
: GA
: ---
Assigned To: Gary Brown
:
Depends On:
Blocks: 1073034
  Show dependency treegraph
 
Reported: 2013-10-23 11:15 EDT by Rick Wagner
Modified: 2014-05-28 10:26 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1024013 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rick Wagner 2013-10-23 11:15:27 EDT
The BPEL History console times out and gives an error message when production-level amounts of data are in the database.

Reproducer:  

- Set up SOA-P, point it to a database other than the default.  (i.e. MySQL)
- Deploy bpel_hello_world quickstart
- Deploy bpel_esb_helloworld quickstart - Run 'ant deploy' ->  Using the BPEL console, check the history.  (Should show ok)
- Now, change bpel_esb_helloworld/src/../SendEsbMessage to send 1000 messages:
    	for(int i=0; i<1000; i++) {
        	esbMessage.getBody().add(args[2] + " NUMBER #" + i);        	
            Message respMessage = new ServiceInvoker(args[0], args[1]).deliverSync(esbMessage, 5000);        	
            System.out.println("REPLY: "+respMessage.getBody().get());
        }
- Check history again (should fail)

----------------------------------------------------------------------------
Alternative reproducer:  Use SOAPUI to make a 'testcase' with > 1k repeats.


The problem seems to be in queries to the EVENT_DATA_TABLE:

Enabled SQL logging shows the following events:
1. DefaultBPAFDataSource.getInstanceEvents() method calls the following SQL:
29193:2013-10-23 15:10:39,322 DEBUG [org.hibernate.SQL] (http-127.0.0.1-8080-1) select event0_.EID as EID132_, event0_.ACTIVITY_DEFINITION_ID as ACTIVITY2_132_, event0_.ACTIVITY_INSTANCE_ID as ACTIVITY3_132_, event0_.ACTIVITY_NAME as ACTIVITY4_132_, event0_.CU
RRENT_STATE as CURRENT5_132_, event0_.PREVIOUS_STATE as PREVIOUS6_132_, event0_.PROCESS_DEFINITION_ID as PROCESS7_132_, event0_.PROCESS_INSTANCE_ID as PROCESS8_132_, event0_.PROCESS_NAME as PROCESS9_132_, event0_.SERVER_ID as SERVER10_132_, event0_.TIMESTAMP a
s TIMESTAMP132_ from BPAF_EVENT event0_, BPAF_EVENT event1_ where event0_.PROCESS_DEFINITION_ID=event1_.PROCESS_DEFINITION_ID and event0_.PROCESS_INSTANCE_ID=event1_.PROCESS_INSTANCE_ID and (event0_.CURRENT_STATE=? and event1_.CURRENT_STATE=? or event1_.CURREN
T_STATE=? and event0_.CURRENT_STATE=?) and (event0_.ACTIVITY_DEFINITION_ID is null) and (event1_.ACTIVITY_DEFINITION_ID is null) and event0_.PROCESS_DEFINITION_ID='{http://www.jboss.org/bpel/examples}HelloWorld-1' and event0_.TIMESTAMP>=? and event1_.TIMESTAMP
<=? order by event0_.EID
29194:2013-10-23 15:10:39,324 DEBUG [org.hibernate.type.EnumType] (http-127.0.0.1-8080-1) Binding 'Open_Running' to parameter: 1
29195:2013-10-23 15:10:39,324 DEBUG [org.hibernate.type.EnumType] (http-127.0.0.1-8080-1) Binding 'Closed_Completed' to parameter: 2
29196:2013-10-23 15:10:39,324 DEBUG [org.hibernate.type.EnumType] (http-127.0.0.1-8080-1) Binding 'Open_Running' to parameter: 3
29197:2013-10-23 15:10:39,324 DEBUG [org.hibernate.type.EnumType] (http-127.0.0.1-8080-1) Binding 'Closed_Completed' to parameter: 4
29198:2013-10-23 15:10:39,324 TRACE [org.hibernate.type.LongType] (http-127.0.0.1-8080-1) binding '1044054000000' to parameter: 5
29199:2013-10-23 15:10:39,324 TRACE [org.hibernate.type.LongType] (http-127.0.0.1-8080-1) binding '1382527990209' to parameter: 6

and returns in ~1 second => OK

2. ProcessHistoryPluginImpl then seems to call the following SQL in a loop (~20.000 times for ~1000 instances):
DEBUG [org.hibernate.SQL] (http-127.0.0.1-8080-1) select dataelemen0_.EVENT_ID as EVENT4_132_1_, dataelemen0_.TID as TID1_, dataelemen0_.TID as TID133_0_, dataelemen0_.EVENT_ID as EVENT4_133_0_, dataelemen0_.NAME as NAME133_0_, da
taelemen0_.VALUE as VALUE133_0_ from BPAF_EVENT_DATA dataelemen0_ where dataelemen0_.EVENT_ID=?

Step 2 above takes a long time, resulting in a timeout at the BPEL Console.
Comment 1 Gary Brown 2013-10-29 13:10:25 EDT
Although the query has been simplified a bit, the nature of the information being retrieved makes it difficult to do a simple optimised query. Therefore the main solution to this problem has been to introduce pagination in the BPEL console to reduce the number of process instances retrieved for each request.

The fix is available in the following component/versions:

http://anonsvn.jboss.org/repos/riftsaw/tags/RiftSaw-2.3.9.Final/
http://anonsvn.jboss.org/repos/riftsaw/tags/RiftSaw-ODE-2.3.9.Final/
https://github.com/bpmc/bpaf-explorer/releases/tag/1.0.2.Final
https://github.com/bpmc/bpm-console/releases/tag/2.2.12.Final
Comment 2 Rick Wagner 2014-03-05 11:24:57 EST
This BZ was thought to be included in Roll up SOA-P_5.3.1_4_2013, but it was not.  Bumping to SOA-P_5.3.1_1_2014.
Comment 3 Gary Brown 2014-03-05 12:22:32 EST
Hi Rick,

I'm unclear what the problem is, as 1024013 (which is essentially the same issue but for FSW6) has been QA'ed and marked verified.

Are you sure the fix was included in the version you tested? i.e. was the pagination present in the BPEL console?

If so, do you have the SQL debug info to indicate how many queries it was attempting to run as described in step 2 of the original problem description?
Comment 4 Rick Wagner 2014-03-05 14:34:42 EST
(In reply to Gary Brown from comment #3)

Hi Gary,

Thanks for the quick response.

This fix was supposed to have been included in the previous 5.3.1 Roll up, but we've been informed it didn't make it.  (The FSW fix is orthogonal to this.)

So I think the answer to your question about inclusion is 'No, we made a mistake and did not include it'.

I think we just need to make sure the code you cited in comment #1 above is included in this next 5.3.1 Roll up.  No new code is requested this time around.  

Thanks,

Rick
Comment 5 Gary Brown 2014-03-06 04:12:12 EST
Ah ok - thanks for the clarification Rick.

As the verified status had been set to FailedQA I thought possibly there was a problem with the fix :)

Should I set the status to modified? Or will someone else do that when the next roll-up is being built?
Comment 6 Rick Wagner 2014-03-06 08:40:35 EST
Hi Gary,

I got it.  Thanks for checking up.

Rick


(In reply to Gary Brown from comment #5)
> Ah ok - thanks for the clarification Rick.
> 
> As the verified status had been set to FailedQA I thought possibly there was
> a problem with the fix :)
> 
> Should I set the status to modified? Or will someone else do that when the
> next roll-up is being built?
Comment 7 ppecka 2014-05-28 10:26:54 EDT
Issue has been VERIFIED with bits from BZ1073034

Note You need to log in before you can comment on or make changes to this bug.