Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1379820

Summary: invalid state transition stopped >> paused on foreman-tasks restart
Product: Red Hat Satellite Reporter: Ivan Necas <inecas>
Component: Tasks PluginAssignee: Ivan Necas <inecas>
Status: CLOSED DUPLICATE QA Contact: Renzo Nuccitelli <rnuccite>
Severity: high Docs Contact:
Priority: high    
Version: 6.2.0CC: anerurka, aruzicka, bbuckingham, bkearney, chrobert, jcallaha, kdixon, rnuccite
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-01 22:37:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ivan Necas 2016-09-27 18:08:48 UTC
Description of problem:
At some occasions, the task data can get into a state that prevent the system to recover properly from inconsistent state.

How reproducible:
Rarely

Steps to Reproduce:
1. kill -9 dynflow executor while some task is running
2. foreman-tasks restart

Actual results:
the  task still acting as being run by killed executor

Expected results:
the task marked as paused on running by new executor

Comment 1 Ivan Necas 2016-09-27 18:11:28 UTC
Bug fixed upstream with https://github.com/Dynflow/dynflow/pull/197

In case the system is in the invalid state, the foreman-tasks service might need to be restarted twice after applying the fix
           
service foreman-tasks restart
echo "we need to restart it twice to converge to better state, waiting a minute for tasks to start"
sleep 60
service foreman-tasks restart

Comment 7 Adam Ruzicka 2016-11-28 13:42:07 UTC
How reproducible:

It is easier to reproduce this on a weaker machine (1 or 2 cores, 4GBs of ram seems reasonable for this)

1) Start a long running task (e.g. repository synchronization for a new repository)

2) ASAP run the following command to kill all foreman-tasks processes. The key to reproducing this issue is killing the dynflow executor when the task is in its run phase

for pid in $(ps -eo pid,args | grep dynflow_executor | grep -v grep | awk ' { print $1 } '); do
    kill -9 $pid
done

2) Navigate to the details of the task, go to the raw tab and copy the external id

3) Run the following command to put the task into a wrong state, replace << EXTERNAL_TASK_ID_HERE >> with the id retrieved in step 2

export EXTERNAL_TASK_ID="<< EXTERNAL_TASK_ID_HERE >>"
foreman-rake console <<END
task_id="$EXTERNAL_TASK_ID"
plan = ForemanTasks.dynflow.world.persistence.load_execution_plan(task_id)
plan.state = :stopped
plan.save
END

4) Restart the foreman-tasks service

systemctl restart foreman-tasks

4) Watch production.log for lines looking like

[foreman-tasks/dynflow] [E] invalid worlds found {"d18522e8-943e-4486-8ca4-38befa405d30"=>"invalid state transition stopped >> paused in #<Dynflow::ExecutionPlan:0x000
00004a16fb8>"

Note: The "invalid worlds found" error will show up even when using the patched version, but all the values will be either :valid or :invalidated and won't mention invalid state transition

Comment 9 Ivan Necas 2016-12-01 22:37:26 UTC
Yes, as the fix for 1390933  should cover both cases

*** This bug has been marked as a duplicate of bug 1390933 ***