Red Hat Bugzilla – Bug 968327
OptaPlanner ends in infinite loop when using only SubChainSwapMoveSelector configured with appropriate subchain length
Last modified: 2014-08-06 16:17:50 EDT
Using OptaPlanner on domain with chained planning entity. Solution is initialized by custom phase command, local search contains only one move selector config:
This selector seems to produce only one concrete move in the loop. Problem is the move is undoable (left subchain contains entit(y/ies) which the right subchain contains too), so OptaPlanner ends in infinite loop.
The fact min and max values are the same is not the necessary condition to this issue.
here is a pull request with the reproducer:
I realized that another necessary condition for this issue is probably solution initialized as a single one chain of entities.
Verified issue: Undoable moves should be filtered in the selector, not later. That way
1) the JIT bailOut detection kicks in (by default 10 times the maximum size of number of possible moves - this might seem similar though at first)
2) cacheType PHASE or STEP can be used to avoid the JIT bailOut detection and just only select the exact number of distinct, doable moves (0 in this case).
Note: in practice, the configuration in the regression test is always doomed, because it only has the course grained moves (which is an anti-pattern). Specifically, because it only has a (subchain) swap move, any vehicle that doesn't have an chain yet, will never get one. Of course - the regression test is perfect for what it's designed to do: prove that the current behavior is wrong.
Just added support for cacheType=STEP on subChainSwapMoveSelector
As expected, when cached it doesn't go into the infinite loop.
This bug exposed a bigger issue. It makes no sense that only this move has been selected:
TRACE Move index (0) not doable, ignoring move ([2(after 1), 3(after 2), 4(after 3), 5(after 4), 6(after 5), 7(after 6), 8(after 7), 9(after 8), 10(after 9), 11(after 10)] <=> [2(after 1), 3(after 2)]).
The left side is normal: there is only 1 subchain of 10 entities long in a chain of 10 entities.
The right side however, is broken. There are multiple subchains of 2 entities long in a chain of 10 entities, yet only 1 is selected.
I've extracted this new issue here and I 'll solve that first:
Fixed for 6.0.0.Beta4. I am not allowed to push the fix to github (due to ongoing release) - probably until next Tuesday. Regression test added in DefaultSubChainSelectorTest.
This bugzilla == PLANNER-165.
One of my presumptions was false: that there were only 10 entities in play (while there are 30+ in play). That means there are indeed doable moves - planner never selected them due to the bug PLANNER-165.
Here's how the output looks like of rsynek's regression test:
2013-05-31 15:46:46,400 [main] INFO Loaded: data/vehiclerouting/unsolved/A-n32-k5.xml
2013-05-31 15:46:47,779 [main] INFO Solving started: time spend (66), score (null), new best score (0hard/0soft), random seed (not fixed).
2013-05-31 15:46:47,803 [main] DEBUG Step index (0), time spend (90), score (-310hard/-1927soft), new best score (-310hard/-1927soft).
2013-05-31 15:46:47,804 [main] INFO Phase (0) custom ended: step total (1), time spend (91), best score (-310hard/-1927soft).
2013-05-31 15:46:47,817 [main] TRACE Move index (0) not doable, ignoring move ([19(after 18), 20(after 19), 21(after 20), 22(after 21), 23(after 22), 24(after 23), 25(after 24), 26(after 25), 27(after 26), 28(after 27)] <=> [25(after 24), 26(after 25)]).
2013-05-31 15:46:47,823 [main] TRACE Move index (1), score (-310hard/-2060soft), accepted (true) for move ([12(after 11), 13(after 12), 14(after 13), 15(after 14), 16(after 15), 17(after 16), 18(after 17), 19(after 18), 20(after 19), 21(after 20)] <=> [2(after 1), 3(after 2)]).
2013-05-31 15:46:47,826 [main] DEBUG Step index (0), time spend (113), score (-310hard/-2060soft), best score (-310hard/-1927soft), accepted/selected move count (1/1) for picked step ([12(after 11), 13(after 12), 14(after 13), 15(after 14), 16(after 15), 17(after 16), 18(after 17), 19(after 18), 20(after 19), 21(after 20)] <=> [2(after 1), 3(after 2)]).
2013-05-31 15:46:47,826 [main] INFO Phase (1) localSearch ended: step total (1), time spend (113), best score (-310hard/-1927soft).
2013-05-31 15:46:47,826 [main] INFO Solving ended: time spend (113), best score (-310hard/-1927soft), average calculate count per second (53).
Note: I originally confused this issue with having "an infinite loop until termination if no doable moves are selectable", which I've extracted here:
Verified on BRMS 6.0.0.ER2