Red Hat Bugzilla – Bug 802755
Performance regression when using coincides, during fusion operators in simple rules.
Last modified: 2013-10-07 22:00:11 EDT
Created attachment 569669 [details]
Reproducer (simple maven project).
Description of problem:
Knowledge base is constructed from simple rules that contains coincides and during fusion operators. When inserting events into knowledge session and then firing rules, the time needed for theses operations is much higher when using 5.3.0 ER4 comparing to 5.2 GA. This has been tested on three different machines.
Example: time with 5.2 GA: 2958ms
time with 5.3 ER4: 3956ms
The difference is approximately 34 %.
See the attached maven project for details.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Run the test in attached Maven project (mvn test).
2. Change the brms.version property in pom.xml to 5.3.0.BRMS-ER4 and run the tests again.
The time needed to insert the events and fire rules is much higher when using 5.3 ER4 version.
The time is lower or at least same with 5.3 ER4 comparing to 5.2 GA.
The actual operator implementation has not changed since December, 2008:
So this regression is a side effect of something else.
I think this is worth investigating, but if it is not affecting anything else, it should have a low priority. Temporal operators other than "before" and "after" are rarely used.
I investigated this issue and as I mentioned before, it is not specifically related to the "coincides" operator, but a side effect of a larger change we did in 5.3.
Before 5.3, Drools only supported constraints in the form:
<field> <operator> <value>
This was extremely limiting for real use cases, but it was what we could do initially, as Drools would resolve the <field> and the <value> independently and apply the <operator> to them.
In 5.3 and forward, we decided to support any valid JAVA/MVEL boolean expressions as constraints, what we call free-form constraints. It basically means that users can write more expressive constraints, with formulas in it, without having to use eval() blocks to escape it. For instance, until 5.2, users could not write something like:
$variable + 10 == someField
This change was a conscious design decision we made, and the solution we adopted was to use MVEL to resolve these constraints. This obviously adds overhead, but the added expressiveness of the language and the overall benefits to the users pay off for the performance slow down.
The performance loss is more noticeable in pluggable operators because in order to keep supporting them (and all fusion operators are pluggable operators), we now have to convert those operators into method calls. So, an expression of the form:
<val1> <operator> <val2>
Becomes after compilation:
<operator>.eval( <val1>, <val2> )
This is the reason the performance loss was noticed with the use of "coincides", but not so much with the use of, for instance, ==, that is indexed and native to MVEL.
Finally, our final goal is to not have these expressions to be "interpreted" by MVEL, but instead, to JIT them in runtime. This was achieved in Drools 5.5 and the results show as you can see bellow.
The provided test is a good specific performance test case, but it does not reflect real case scenarios. In real case scenarios, the use of pluggable operators is a small percentage of all the constraints used, while in the provided test it corresponds to 25% of all the constraints. Also, the other 75% of the constraints in the test are indexed == constraints, that have constant O(1) performance. That is the reason you found 30% performance degradation, while in real use cases, the performance degradation should not be more than 10% in the worst cases.
Nevertheless, we will continue to work on improving the overall performance of the engine, but for 5.3, we don't feel that we can do much more as it is a case of using more resources because it is doing more than it was doing before. For 5.5, as can be seen below, things are already better and we will continue improving.
Test Results using my machine (Avg/Med):
5.2 => 976/976
5.3 => 1283/1291
5.5 => 1175/1176
Hope this clarifies the subject.
Petr, thank you for the additional tests and data you provided. After additional investigation I found a problem in 5.2 code that was incorrectly setting the expiration offset of the TransactionCreatedEvent to 0 (meaning immediate expiration) while in 5.3 it is being correctly set to -1 (no expiration). This way, 5.3 is keeping more events longer in memory, and that is the reason it runs slower than 5.2 when using coincides/during operators.
Nevertheless, during my investigation I found that the Propagation node was not being properly shared, and by fixing that, it actually improved the performance of 5.3.x, getting better numbers than even 5.2.x.
In my machine, using the tests you provided:
* Using after/before:
5.2.0 => 6597ms
5.3.0 => 3689ms
5.3.x => 2329ms
* Using coincides/during:
5.2.0 => 2419ms
5.3.0 => 3692ms
5.3.x => 2341ms
* Using starts/started by:
5.2.0 => 501ms
5.3.0 => 61ms
5.3.x => 61ms
Take a look and let me know if you think it is ok now.
it looks more than ok, thanks!. I will verify the "fix" with next build and close this bug if everything is ok.
Verified in 5.3.1-ER1. Temporal operators in our use cases are much faster now.