Hide Forgot
Description of problem: At some occasions, the task data can get into a state that prevent the system to recover properly from inconsistent state. How reproducible: Rarely Steps to Reproduce: 1. kill -9 dynflow executor while some task is running 2. foreman-tasks restart Actual results: the task still acting as being run by killed executor Expected results: the task marked as paused on running by new executor
Bug fixed upstream with https://github.com/Dynflow/dynflow/pull/197 In case the system is in the invalid state, the foreman-tasks service might need to be restarted twice after applying the fix service foreman-tasks restart echo "we need to restart it twice to converge to better state, waiting a minute for tasks to start" sleep 60 service foreman-tasks restart
How reproducible: It is easier to reproduce this on a weaker machine (1 or 2 cores, 4GBs of ram seems reasonable for this) 1) Start a long running task (e.g. repository synchronization for a new repository) 2) ASAP run the following command to kill all foreman-tasks processes. The key to reproducing this issue is killing the dynflow executor when the task is in its run phase for pid in $(ps -eo pid,args | grep dynflow_executor | grep -v grep | awk ' { print $1 } '); do kill -9 $pid done 2) Navigate to the details of the task, go to the raw tab and copy the external id 3) Run the following command to put the task into a wrong state, replace << EXTERNAL_TASK_ID_HERE >> with the id retrieved in step 2 export EXTERNAL_TASK_ID="<< EXTERNAL_TASK_ID_HERE >>" foreman-rake console <<END task_id="$EXTERNAL_TASK_ID" plan = ForemanTasks.dynflow.world.persistence.load_execution_plan(task_id) plan.state = :stopped plan.save END 4) Restart the foreman-tasks service systemctl restart foreman-tasks 4) Watch production.log for lines looking like [foreman-tasks/dynflow] [E] invalid worlds found {"d18522e8-943e-4486-8ca4-38befa405d30"=>"invalid state transition stopped >> paused in #<Dynflow::ExecutionPlan:0x000 00004a16fb8>" Note: The "invalid worlds found" error will show up even when using the patched version, but all the values will be either :valid or :invalidated and won't mention invalid state transition
Yes, as the fix for 1390933 should cover both cases *** This bug has been marked as a duplicate of bug 1390933 ***