Bug 802755 - Performance regression when using coincides, during fusion operators in simple rules.
Performance regression when using coincides, during fusion operators in simpl...
Product: JBoss Enterprise BRMS Platform 5
Classification: JBoss
Component: BRE (Expert, Fusion) (Show other bugs)
BRMS 5.3.0.GA
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Edson Tirelli
Petr Široký
: Regression
Depends On:
  Show dependency treegraph
Reported: 2012-03-13 08:49 EDT by Petr Široký
Modified: 2013-10-07 22:00 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Reproducer (simple maven project). (12.70 KB, application/zip)
2012-03-13 08:49 EDT, Petr Široký
no flags Details

  None (edit)
Description Petr Široký 2012-03-13 08:49:30 EDT
Created attachment 569669 [details]
Reproducer (simple maven project).

Description of problem:
Knowledge base is constructed from simple rules that contains coincides and during fusion operators. When inserting events into knowledge session and then firing rules, the time needed for theses operations is much higher when using 5.3.0 ER4 comparing to 5.2 GA. This has been tested on three different machines.

Example: time with 5.2 GA:  2958ms
         time with 5.3 ER4: 3956ms
The difference is approximately 34 %.

See the attached maven project for details.

Version-Release number of selected component (if applicable):
5.3.0 ER4

Steps to Reproduce:
1. Run the test in attached Maven project (mvn test).
2. Change the brms.version property in pom.xml to 5.3.0.BRMS-ER4 and run the tests again.

Actual results:
The time needed to insert the events and fire rules is much higher when using 5.3 ER4 version.

Expected results:
The time is lower or at least same with 5.3 ER4 comparing to 5.2 GA.
Comment 3 Edson Tirelli 2012-03-15 09:52:54 EDT
The actual operator implementation has not changed since December, 2008:


So this regression is a side effect of something else. 

I think this is worth investigating, but if it is not affecting anything else, it should have a low priority. Temporal operators other than "before" and "after" are rarely used.
Comment 9 Edson Tirelli 2012-08-27 18:48:39 EDT

I investigated this issue and as I mentioned before, it is not specifically related to the "coincides" operator, but a side effect of a larger change we did in 5.3. 

Before 5.3, Drools only supported constraints in the form:

<field> <operator> <value>

This was extremely limiting for real use cases, but it was what we could do initially, as Drools would resolve the <field> and the <value> independently and apply the <operator> to them. 

In 5.3 and forward, we decided to support any valid JAVA/MVEL boolean expressions as constraints, what we call free-form constraints. It basically means that users can write more expressive constraints, with formulas in it, without having to use eval() blocks to escape it. For instance, until 5.2, users could not write something like:

$variable + 10 == someField

This change was a conscious design decision we made, and the solution we adopted was to use MVEL to resolve these constraints. This obviously adds overhead, but the added expressiveness of the language and the overall benefits to the users pay off for the performance slow down.

The performance loss is more noticeable in pluggable operators because in order to keep supporting them (and all fusion operators are pluggable operators), we now have to convert those operators into method calls. So, an expression of the form:

<val1> <operator> <val2>

Becomes after compilation:

<operator>.eval( <val1>, <val2> )

This is the reason the performance loss was noticed with the use of "coincides", but not so much with the use of, for instance, ==, that is indexed and native to MVEL.

Finally, our final goal is to not have these expressions to be "interpreted" by MVEL, but instead, to JIT them in runtime. This was achieved in Drools 5.5 and the results show as you can see bellow. 

The provided test is a good specific performance test case, but it does not reflect real case scenarios. In real case scenarios, the use of pluggable operators is a small percentage of all the constraints used, while in the provided test it corresponds to 25% of all the constraints. Also, the other 75% of the constraints in the test are indexed == constraints, that have constant O(1) performance. That is the reason you found 30% performance degradation, while in real use cases, the performance degradation should not be more than 10% in the worst cases.

Nevertheless, we will continue to work on improving the overall performance of the engine, but for 5.3, we don't feel that we can do much more as it is a case of using more resources because it is doing more than it was doing before. For 5.5, as can be seen below, things are already better and we will continue improving.

Test Results using my machine (Avg/Med):

5.2 => 976/976  
5.3 => 1283/1291 
5.5 => 1175/1176 

Hope this clarifies the subject.
Comment 12 Edson Tirelli 2012-08-31 17:13:47 EDT
Petr, thank you for the additional tests and data you provided. After additional investigation I found a problem in 5.2 code that was incorrectly setting the expiration offset of the TransactionCreatedEvent to 0 (meaning immediate expiration) while in 5.3 it is being correctly set to -1 (no expiration). This way, 5.3 is keeping more events longer in memory, and that is the reason it runs slower than 5.2 when using coincides/during operators.

Nevertheless, during my investigation I found that the Propagation node was not being properly shared, and by fixing that, it actually improved the performance of 5.3.x, getting better numbers than even 5.2.x.

In my machine, using the tests you provided:

* Using after/before:
5.2.0 => 6597ms
5.3.0 => 3689ms
5.3.x => 2329ms

* Using coincides/during:
5.2.0 => 2419ms
5.3.0 => 3692ms
5.3.x => 2341ms

* Using starts/started by:
5.2.0 => 501ms
5.3.0 => 61ms
5.3.x => 61ms

Take a look and let me know if you think it is ok now.
Comment 13 Petr Široký 2012-09-03 10:47:31 EDT
Hi Edson,

it looks more than ok, thanks!. I will verify the "fix" with next build and close this bug if everything is ok.

Comment 14 Petr Široký 2012-09-15 07:34:54 EDT
Verified in 5.3.1-ER1. Temporal operators in our use cases are much faster now.

Note You need to log in before you can comment on or make changes to this bug.