Bug 1201700

Summary:

When all next moves are not doable, optaplanner get stuck in step and if termination is not based on time, it cycles forever

Product:

[Retired] JBoss BRMS Platform 6

Reporter:

jvahala

Component:

OptaPlanner

Assignee:

Geoffrey De Smet <gdesmet>

Status:

CLOSED EOL

QA Contact:

Jiri Locker <jlocker>

Severity:

high

Docs Contact:

Priority:

high

Version:

6.1.0

CC:

kverlaen, lpetrovi

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-03-27 19:11:32 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
reproducer	none

Description jvahala 2015-03-13 09:56:39 UTC

Consider TSP with one domicile and one entity. CH builds best solution and then local search is performed. 

Say that local search termination is configured like this: 

<termination>
  <stepCountLimit>1</stepCountLimit>
</termination>

obviously, there is no move what planner could do, but that is a problem. If planner is in state, when there is no way out, it will stuck forever and nothing can terminate it except time termination.

Comment 2 Geoffrey De Smet 2015-03-19 14:48:26 UTC

There's no intrinsic requirement that a step should finish within x amount of time. But it would indeed be helpful that LocalSearch recognizes that there are no doable moves. I think it does if you configure a cacheType PHASE or STEP, instead of the default of JIT.

It's not just doable moves, but also filtered moves that can cause this issue (although filtering has a bail-out). That case too only applies to JIT selection as far as I know.

Jiri, could you verify that cacheType PHASE or STEP move selection don't suffer from this issue? That would worry me and I 'd fix that asap. As for JIT (the default cacheType), I currently don't see a reasonable way of fixing it (without killing the scalability gain of JIT selection).

Comment 3 jvahala 2015-03-20 10:38:44 UTC

Geoffrey, 

I think the easies way how to provide reproducer is just make little alternations on any TSP example you have. 

1. take any TSP Solution instance and get rid of all locations, except two. So there will be only one Domicile and one Visit.

2. use this localSearch

<localSearch>
   <termination>
      <stepCountLimit>1</stepCountLimit>
    </termination>
   <changeMoveSelector>
     <cacheType>PHASE</cacheType>
   </changeMoveSelector>
</localSearch>

3. run solver. 

I hope this is enough to reproduce the problem.

Comment 4 Geoffrey De Smet 2015-03-25 08:31:39 UTC

Jiri, have you tried if it reproduces with cacheType PHASE or STEP? That would be a high priority bug.
The fact that it reproduces with JIT selection, is a less important known issue (because it's intrinsic to JIT selection and any fix might be worse than the problem).

Comment 5 jvahala 2015-03-31 11:54:34 UTC

Created attachment 1008979 [details]
reproducer

just run mvn clean install 

there is 10 seconds timeout. One step on so little problem should take less than 10 seconds.