Bug 1284026

Summary:	TailSwapMove experiences 30% performance loss.
Product:	[Retired] JBoss BRMS Platform 6	Reporter:	jvahala
Component:	OptaPlanner	Assignee:	Geoffrey De Smet <gdesmet>
Status:	CLOSED EOL	QA Contact:	Jiri Locker <jlocker>
Severity:	medium	Docs Contact:
Priority:	high
Version:	6.3.0	CC:	kverlaen, lpetrovi
Target Milestone:	DR1	Keywords:	Reopened
Target Release:	6.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-03-27 19:04:50 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description jvahala 2015-11-20 15:14:09 UTC

Description of problem:
TailSwapMove experiences 30% performance loss which means that it's not a good pick for chained solutions in general. This might be related to little speed degradation on all chained solutions.

This is not as urgent as 1273783 BZ

Comment 2 Geoffrey De Smet 2015-11-23 09:58:09 UTC

For others reading this, please note that the 30% perf leak is the worst case scenario on a specific use case on a specific dataset, without taking the perf gain on the other moves into account (which make up for this loss).

Nevertheless, this is a big loss and must be fixed because it's a lost opportunity for a good perf gain if not done. Looking into it. Thanks for finding this perf loss, Jiri :)

Comment 3 Geoffrey De Smet 2015-11-23 13:44:39 UTC

It seems that homberger_1000_RC110_1 isn't affected
CR1: Solving ended: time spent (121164), best score (0hard/-51501723soft), average calculate count per second (191962), environment mode (REPRODUCIBLE).
6.4.0-SNAPSHOT: Solving ended: time spent (121411), best score (0hard/-51501723soft), average calculate count per second (191255), environment mode (REPRODUCIBLE).
100.3%

Comment 4 Geoffrey De Smet 2015-11-23 13:56:38 UTC

On my machine, with belgium-tw-n2750-k55:
CR1: Solving ended: time spent (122084), best score (0hard/-402316soft), average calculate count per second (41742), environment mode (REPRODUCIBLE).
6.4-S: Solving ended: time spent (122445), best score (0hard/-401887soft), average calculate count per second (46339), environment mode (REPRODUCIBLE).
90% = 1/111%, so the new version is 11% faster

Comment 5 Geoffrey De Smet 2015-11-23 15:00:15 UTC

Nor using <acceptedCountLimit>1000000</acceptedCountLimit>, nor using JDK 8 reproduces it

Comment 6 jvahala 2015-11-23 18:02:58 UTC

here comes fast analysis of tailSwapMove using TSP problem. 

 - I am using very simple solution initializer (!!! - there is difference between versions and it is scoreDirector.triggerVariableListeners();) but I've checked output solution and it's the same (circle around entities) 
 
 - results: 

1 000 doable moves

CR1:
Benchmark                                          (dataset)  Mode  Cnt     Score     Error  Units
TSPTailChainSwapMoveSelectorBenchmark.benchmark       LU_980    ss   10   285.534 ±  24.174  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  USA_CA_2716    ss   10   795.526 ±  30.503  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  GREECE_9882    ss   10  3644.275 ± 183.555  ms/op

SNAPSHOT:
Benchmark                                          (dataset)  Mode  Cnt     Score    Error  Units
TSPTailChainSwapMoveSelectorBenchmark.benchmark       LU_980    ss   10   400.763 ± 17.447  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  USA_CA_2716    ss   10  1135.014 ± 26.212  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  GREECE_9882    ss   10  5013.708 ± 90.055  ms/op

------------------------------------

10 000 doable moves
CR1:
Benchmark                                          (dataset)  Mode  Cnt      Score      Error  Units
TSPTailChainSwapMoveSelectorBenchmark.benchmark       LU_980    ss    6   2668.098 ±  403.768  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  USA_CA_2716    ss    6   8065.210 ±  404.990  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  GREECE_9882    ss    6  36048.139 ± 3225.749  ms/op

SNAPSHOT:
Benchmark                                          (dataset)  Mode  Cnt      Score      Error  Units
TSPTailChainSwapMoveSelectorBenchmark.benchmark       LU_980    ss    6   3761.815 ±  380.438  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  USA_CA_2716    ss    6  11452.173 ±  399.867  ms/op
TSPTailChainSwapMoveSelectorBenchmark.benchmark  GREECE_9882    ss    6  49502.664 ± 7603.110  ms/op

-----------------------------------

we can see no diff in scalability between 1000 and 10000 moves. 

how to reproduce: 
1. download optaplanner perf repo, link can be found on optaplaner Hub, 
2. easiest way: install jmh plugin into Idea https://plugins.jetbrains.com/plugin/7529?pr=, then it's possible to run single benchmark methods from IDE
3. mvn clean install
4. right click on TSPTailChainSwapMoveSelectorBenchmark -> Run
5. you get results for SNAPSHOT
6. git checkout bfd7a3c783f41f8f014ccd47d348bca2dc9ed820
7. step 3 and 4
8. you get results for CR1. 

here is diff commit: https://gitlab.mw.lab.eng.bos.redhat.com/jvahala/optaplanner-perf/commit/ae764b9dce37af2c0efcd775f27122fbf8b1efbb

Comment 7 Geoffrey De Smet 2015-11-24 16:56:03 UTC

I can reproduce on TSP with lu 980.
CR1: Solving ended: time spent (687287), best score (-14251771), average calculate count per second (5222), environment mode (REPRODUCIBLE).
master: Solving ended: time spent (687232), best score (-14284861), average calculate count per second (3896), environment mode (REPRODUCIBLE).
master/CR1 = 75%

I 'll profile TSP now to find the cause of <tailChainSwapMoveSelector/>'s loss and fix it. Hopefully that will fix it for VRPTW too. Note that TSP is a toy, VRPTW is the real customer value, so reproducing on VRPTW would be nicer to have.

Comment 8 jvahala 2015-11-24 17:55:05 UTC

I know, but we see tailSwapMove issue in all chaining solutions, so finding it for TSP should resolve it for VRP. 

btw, you can ofc reproduce it the same, just use different benchmark test. I don't have it here right now, but it should be called "VRPTWTailChainSwapMoveSelectorBenchmark.

Comment 9 Geoffrey De Smet 2015-11-25 09:07:20 UTC

I suspect the new reverseChain() to be the bottleneck. I am investigating that is this new jira I've spin-off:
  https://issues.jboss.org/browse/PLANNER-479

Comment 10 Geoffrey De Smet 2015-11-25 15:18:40 UTC

reverseChain() is probably innocent.

But I did found a way to improve TSP by around 30%... :)
https://issues.jboss.org/browse/PLANNER-480

That won't affect VRP though, so I need to figure out how to reproduce this issue on VRP with tailchainswapmoves.

Comment 11 Geoffrey De Smet 2015-11-25 15:37:37 UTC

IMPORTANT: PLANNER-480 is fixed on master, but not yet backported to 6.3.x. I need a blocker tag to do that. Because it only affects TSP in theory, and few (or none) customers will benefit from PLANNER-480, and there is some risk related to that change, I don't favor backporting it - I would just let it bake in community first.


The workaround is to use @InverseRelationShadowVariable as in the VRP example.

Comment 12 Geoffrey De Smet 2015-12-01 14:16:33 UTC

Fixing this issue requires fixing 2 community jira's:
  https://issues.jboss.org/browse/PLANNER-480 (fixed on master, not backported to 6.3.x due to risk)
  https://issues.jboss.org/browse/PLANNER-482 (Not fixed. Might be rejected as the added complexity cost might not be worth the gain. I don't want to rush this into the design at this point)

I propose to CLOSE this issue as incomplete because no further work will be delivered now: PLANNER-480 is fixed on master and PLANNER-482 is postponed. We've already proven that in BRMS 6.2, there are no significant performance regressions in general (the 30% loss on TSP/VRPTW is compensated by a 35% gain from the other move selectors).

Comment 13 jvahala 2016-03-07 14:57:16 UTC

Closing since there is no way to fix this now.

Comment 14 Geoffrey De Smet 2016-03-30 13:38:36 UTC

Although there have been several improvements that reduce the perf loss (it's not 30% any more IIRC), there is still a reasonable loss.

I believe this issue can fix this:
  https://issues.jboss.org/browse/PLANNER-482
We need to in the 7.0 roadmap.

Comment 15 Geoffrey De Smet 2016-03-30 13:50:58 UTC

Reopening this, so we can track it. I suggest we target this for 7.0.