Bug 1284026 - TailSwapMove experiences 30% performance loss.
TailSwapMove experiences 30% performance loss.
Status: ASSIGNED
Product: JBoss BRMS Platform 6
Classification: JBoss
Component: OptaPlanner (Show other bugs)
6.3.0
Unspecified Unspecified
high Severity medium
: DR1
: 6.3.0
Assigned To: Geoffrey De Smet
Jiri Locker
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-20 10:14 EST by jvahala
Modified: 2016-04-28 01:08 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-07 09:57:16 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description jvahala 2015-11-20 10:14:09 EST
Description of problem:
TailSwapMove experiences 30% performance loss which means that it's not a good pick for chained solutions in general. This might be related to little speed degradation on all chained solutions.

This is not as urgent as 1273783 BZ
Comment 2 Geoffrey De Smet 2015-11-23 04:58:09 EST
For others reading this, please note that the 30% perf leak is the worst case scenario on a specific use case on a specific dataset, without taking the perf gain on the other moves into account (which make up for this loss).

Nevertheless, this is a big loss and must be fixed because it's a lost opportunity for a good perf gain if not done. Looking into it. Thanks for finding this perf loss, Jiri :)
Comment 3 Geoffrey De Smet 2015-11-23 08:44:39 EST
It seems that homberger_1000_RC110_1 isn't affected
CR1: Solving ended: time spent (121164), best score (0hard/-51501723soft), average calculate count per second (191962), environment mode (REPRODUCIBLE).
6.4.0-SNAPSHOT: Solving ended: time spent (121411), best score (0hard/-51501723soft), average calculate count per second (191255), environment mode (REPRODUCIBLE).
100.3%
Comment 4 Geoffrey De Smet 2015-11-23 08:56:38 EST
On my machine, with belgium-tw-n2750-k55:
CR1: Solving ended: time spent (122084), best score (0hard/-402316soft), average calculate count per second (41742), environment mode (REPRODUCIBLE).
6.4-S: Solving ended: time spent (122445), best score (0hard/-401887soft), average calculate count per second (46339), environment mode (REPRODUCIBLE).
90% = 1/111%, so the new version is 11% faster
Comment 5 Geoffrey De Smet 2015-11-23 10:00:15 EST
Nor using <acceptedCountLimit>1000000</acceptedCountLimit>, nor using JDK 8 reproduces it
Comment 6 jvahala 2015-11-23 13:02:58 EST
here comes fast analysis of tailSwapMove using TSP problem. 

 - I am using very simple solution initializer (!!! - there is difference between versions and it is scoreDirector.triggerVariableListeners();) but I've checked output solution and it's the same (circle around entities) 
 
 - results: 

1 000 doable moves

CR1:
Benchmark                                          (dataset)  Mode  Cnt     Score     Error  Units
TSPTailChainSwapMoveSelectorBenchmark.benchmark       LU_980    ss   10   285.534 ±  24.174  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  USA_CA_2716    ss   10   795.526 ±  30.503  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  GREECE_9882    ss   10  3644.275 ± 183.555  ms/op

SNAPSHOT:
Benchmark                                          (dataset)  Mode  Cnt     Score    Error  Units
TSPTailChainSwapMoveSelectorBenchmark.benchmark       LU_980    ss   10   400.763 ± 17.447  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  USA_CA_2716    ss   10  1135.014 ± 26.212  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  GREECE_9882    ss   10  5013.708 ± 90.055  ms/op

------------------------------------

10 000 doable moves
CR1:
Benchmark                                          (dataset)  Mode  Cnt      Score      Error  Units
TSPTailChainSwapMoveSelectorBenchmark.benchmark       LU_980    ss    6   2668.098 ±  403.768  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  USA_CA_2716    ss    6   8065.210 ±  404.990  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  GREECE_9882    ss    6  36048.139 ± 3225.749  ms/op

SNAPSHOT:
Benchmark                                          (dataset)  Mode  Cnt      Score      Error  Units
TSPTailChainSwapMoveSelectorBenchmark.benchmark       LU_980    ss    6   3761.815 ±  380.438  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  USA_CA_2716    ss    6  11452.173 ±  399.867  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  GREECE_9882    ss    6  49502.664 ± 7603.110  ms/op

-----------------------------------

we can see no diff in scalability between 1000 and 10000 moves. 

how to reproduce: 
1. download optaplanner perf repo, link can be found on optaplaner Hub, 
2. easiest way: install jmh plugin into Idea https://plugins.jetbrains.com/plugin/7529?pr=, then it's possible to run single benchmark methods from IDE
3. mvn clean install
4. right click on TSPTailChainSwapMoveSelectorBenchmark -> Run
5. you get results for SNAPSHOT
6. git checkout bfd7a3c783f41f8f014ccd47d348bca2dc9ed820
7. step 3 and 4
8. you get results for CR1. 

here is diff commit: https://gitlab.mw.lab.eng.bos.redhat.com/jvahala/optaplanner-perf/commit/ae764b9dce37af2c0efcd775f27122fbf8b1efbb
Comment 7 Geoffrey De Smet 2015-11-24 11:56:03 EST
I can reproduce on TSP with lu 980.
CR1: Solving ended: time spent (687287), best score (-14251771), average calculate count per second (5222), environment mode (REPRODUCIBLE).
master: Solving ended: time spent (687232), best score (-14284861), average calculate count per second (3896), environment mode (REPRODUCIBLE).
master/CR1 = 75%

I 'll profile TSP now to find the cause of <tailChainSwapMoveSelector/>'s loss and fix it. Hopefully that will fix it for VRPTW too. Note that TSP is a toy, VRPTW is the real customer value, so reproducing on VRPTW would be nicer to have.
Comment 8 jvahala 2015-11-24 12:55:05 EST
I know, but we see tailSwapMove issue in all chaining solutions, so finding it for TSP should resolve it for VRP. 

btw, you can ofc reproduce it the same, just use different benchmark test. I don't have it here right now, but it should be called "VRPTWTailChainSwapMoveSelectorBenchmark.
Comment 9 Geoffrey De Smet 2015-11-25 04:07:20 EST
I suspect the new reverseChain() to be the bottleneck. I am investigating that is this new jira I've spin-off:
  https://issues.jboss.org/browse/PLANNER-479
Comment 10 Geoffrey De Smet 2015-11-25 10:18:40 EST
reverseChain() is probably innocent.

But I did found a way to improve TSP by around 30%... :)
https://issues.jboss.org/browse/PLANNER-480

That won't affect VRP though, so I need to figure out how to reproduce this issue on VRP with tailchainswapmoves.
Comment 11 Geoffrey De Smet 2015-11-25 10:37:37 EST
IMPORTANT: PLANNER-480 is fixed on master, but not yet backported to 6.3.x. I need a blocker tag to do that. Because it only affects TSP in theory, and few (or none) customers will benefit from PLANNER-480, and there is some risk related to that change, I don't favor backporting it - I would just let it bake in community first.


The workaround is to use @InverseRelationShadowVariable as in the VRP example.
Comment 12 Geoffrey De Smet 2015-12-01 09:16:33 EST
Fixing this issue requires fixing 2 community jira's:
  https://issues.jboss.org/browse/PLANNER-480 (fixed on master, not backported to 6.3.x due to risk)
  https://issues.jboss.org/browse/PLANNER-482 (Not fixed. Might be rejected as the added complexity cost might not be worth the gain. I don't want to rush this into the design at this point)

I propose to CLOSE this issue as incomplete because no further work will be delivered now: PLANNER-480 is fixed on master and PLANNER-482 is postponed. We've already proven that in BRMS 6.2, there are no significant performance regressions in general (the 30% loss on TSP/VRPTW is compensated by a 35% gain from the other move selectors).
Comment 13 jvahala 2016-03-07 09:57:16 EST
Closing since there is no way to fix this now.
Comment 14 Geoffrey De Smet 2016-03-30 09:38:36 EDT
Although there have been several improvements that reduce the perf loss (it's not 30% any more IIRC), there is still a reasonable loss.

I believe this issue can fix this:
  https://issues.jboss.org/browse/PLANNER-482
We need to in the 7.0 roadmap.
Comment 15 Geoffrey De Smet 2016-03-30 09:50:58 EDT
Reopening this, so we can track it. I suggest we target this for 7.0.

Note You need to log in before you can comment on or make changes to this bug.