Bug 1201700

Summary: When all next moves are not doable, optaplanner get stuck in step and if termination is not based on time, it cycles forever
Product: [Retired] JBoss BRMS Platform 6 Reporter: jvahala
Component: OptaPlannerAssignee: Geoffrey De Smet <gdesmet>
Status: CLOSED EOL QA Contact: Jiri Locker <jlocker>
Severity: high Docs Contact:
Priority: high    
Version: 6.1.0CC: kverlaen, lpetrovi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-27 19:11:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
reproducer none

Description jvahala 2015-03-13 09:56:39 UTC
Consider TSP with one domicile and one entity. CH builds best solution and then local search is performed. 

Say that local search termination is configured like this: 

<termination>
  <stepCountLimit>1</stepCountLimit>
</termination>

obviously, there is no move what planner could do, but that is a problem. If planner is in state, when there is no way out, it will stuck forever and nothing can terminate it except time termination.

Comment 2 Geoffrey De Smet 2015-03-19 14:48:26 UTC
There's no intrinsic requirement that a step should finish within x amount of time. But it would indeed be helpful that LocalSearch recognizes that there are no doable moves. I think it does if you configure a cacheType PHASE or STEP, instead of the default of JIT.

It's not just doable moves, but also filtered moves that can cause this issue (although filtering has a bail-out). That case too only applies to JIT selection as far as I know.

Jiri, could you verify that cacheType PHASE or STEP move selection don't suffer from this issue? That would worry me and I 'd fix that asap. As for JIT (the default cacheType), I currently don't see a reasonable way of fixing it (without killing the scalability gain of JIT selection).

Comment 3 jvahala 2015-03-20 10:38:44 UTC
Geoffrey, 

I think the easies way how to provide reproducer is just make little alternations on any TSP example you have. 

1. take any TSP Solution instance and get rid of all locations, except two. So there will be only one Domicile and one Visit.

2. use this localSearch

<localSearch>
   <termination>
      <stepCountLimit>1</stepCountLimit>
    </termination>
   <changeMoveSelector>
     <cacheType>PHASE</cacheType>
   </changeMoveSelector>
</localSearch>

3. run solver. 

I hope this is enough to reproduce the problem.

Comment 4 Geoffrey De Smet 2015-03-25 08:31:39 UTC
Jiri, have you tried if it reproduces with cacheType PHASE or STEP? That would be a high priority bug.
The fact that it reproduces with JIT selection, is a less important known issue (because it's intrinsic to JIT selection and any fix might be worse than the problem).

Comment 5 jvahala 2015-03-31 11:54:34 UTC
Created attachment 1008979 [details]
reproducer

just run mvn clean install 

there is 10 seconds timeout. One step on so little problem should take less than 10 seconds.