Bug 1379820

Summary: invalid state transition stopped >> paused on foreman-tasks restart
Product: Red Hat Satellite Reporter: Ivan Necas <inecas>
Component: Tasks PluginAssignee: Ivan Necas <inecas>
Status: CLOSED DUPLICATE QA Contact: Renzo Nuccitelli <rnuccite>
Severity: high Docs Contact:
Priority: high    
Version: 6.2.0CC: anerurka, aruzicka, bbuckingham, bkearney, chrobert, jcallaha, kdixon, rnuccite
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-01 22:37:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ivan Necas 2016-09-27 18:08:48 UTC
Description of problem:
At some occasions, the task data can get into a state that prevent the system to recover properly from inconsistent state.

How reproducible:
Rarely

Steps to Reproduce:
1. kill -9 dynflow executor while some task is running
2. foreman-tasks restart

Actual results:
the  task still acting as being run by killed executor

Expected results:
the task marked as paused on running by new executor

Comment 1 Ivan Necas 2016-09-27 18:11:28 UTC
Bug fixed upstream with https://github.com/Dynflow/dynflow/pull/197

In case the system is in the invalid state, the foreman-tasks service might need to be restarted twice after applying the fix
           
service foreman-tasks restart
echo "we need to restart it twice to converge to better state, waiting a minute for tasks to start"
sleep 60
service foreman-tasks restart

Comment 7 Adam Ruzicka 2016-11-28 13:42:07 UTC
How reproducible:

It is easier to reproduce this on a weaker machine (1 or 2 cores, 4GBs of ram seems reasonable for this)

1) Start a long running task (e.g. repository synchronization for a new repository)

2) ASAP run the following command to kill all foreman-tasks processes. The key to reproducing this issue is killing the dynflow executor when the task is in its run phase

for pid in $(ps -eo pid,args | grep dynflow_executor | grep -v grep | awk ' { print $1 } '); do
    kill -9 $pid
done

2) Navigate to the details of the task, go to the raw tab and copy the external id

3) Run the following command to put the task into a wrong state, replace << EXTERNAL_TASK_ID_HERE >> with the id retrieved in step 2

export EXTERNAL_TASK_ID="<< EXTERNAL_TASK_ID_HERE >>"
foreman-rake console <<END
task_id="$EXTERNAL_TASK_ID"
plan = ForemanTasks.dynflow.world.persistence.load_execution_plan(task_id)
plan.state = :stopped
plan.save
END

4) Restart the foreman-tasks service

systemctl restart foreman-tasks

4) Watch production.log for lines looking like

[foreman-tasks/dynflow] [E] invalid worlds found {"d18522e8-943e-4486-8ca4-38befa405d30"=>"invalid state transition stopped >> paused in #<Dynflow::ExecutionPlan:0x000
00004a16fb8>"

Note: The "invalid worlds found" error will show up even when using the patched version, but all the values will be either :valid or :invalidated and won't mention invalid state transition

Comment 9 Ivan Necas 2016-12-01 22:37:26 UTC
Yes, as the fix for 1390933  should cover both cases

*** This bug has been marked as a duplicate of bug 1390933 ***