Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1379820 - invalid state transition stopped >> paused on foreman-tasks restart
Summary: invalid state transition stopped >> paused on foreman-tasks restart
Keywords:
Status: CLOSED DUPLICATE of bug 1390933
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Tasks Plugin
Version: 6.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: Unspecified
Assignee: Ivan Necas
QA Contact: Renzo Nuccitelli
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-27 18:08 UTC by Ivan Necas
Modified: 2019-12-16 06:55 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-01 22:37:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ivan Necas 2016-09-27 18:08:48 UTC
Description of problem:
At some occasions, the task data can get into a state that prevent the system to recover properly from inconsistent state.

How reproducible:
Rarely

Steps to Reproduce:
1. kill -9 dynflow executor while some task is running
2. foreman-tasks restart

Actual results:
the  task still acting as being run by killed executor

Expected results:
the task marked as paused on running by new executor

Comment 1 Ivan Necas 2016-09-27 18:11:28 UTC
Bug fixed upstream with https://github.com/Dynflow/dynflow/pull/197

In case the system is in the invalid state, the foreman-tasks service might need to be restarted twice after applying the fix
           
service foreman-tasks restart
echo "we need to restart it twice to converge to better state, waiting a minute for tasks to start"
sleep 60
service foreman-tasks restart

Comment 7 Adam Ruzicka 2016-11-28 13:42:07 UTC
How reproducible:

It is easier to reproduce this on a weaker machine (1 or 2 cores, 4GBs of ram seems reasonable for this)

1) Start a long running task (e.g. repository synchronization for a new repository)

2) ASAP run the following command to kill all foreman-tasks processes. The key to reproducing this issue is killing the dynflow executor when the task is in its run phase

for pid in $(ps -eo pid,args | grep dynflow_executor | grep -v grep | awk ' { print $1 } '); do
    kill -9 $pid
done

2) Navigate to the details of the task, go to the raw tab and copy the external id

3) Run the following command to put the task into a wrong state, replace << EXTERNAL_TASK_ID_HERE >> with the id retrieved in step 2

export EXTERNAL_TASK_ID="<< EXTERNAL_TASK_ID_HERE >>"
foreman-rake console <<END
task_id="$EXTERNAL_TASK_ID"
plan = ForemanTasks.dynflow.world.persistence.load_execution_plan(task_id)
plan.state = :stopped
plan.save
END

4) Restart the foreman-tasks service

systemctl restart foreman-tasks

4) Watch production.log for lines looking like

[foreman-tasks/dynflow] [E] invalid worlds found {"d18522e8-943e-4486-8ca4-38befa405d30"=>"invalid state transition stopped >> paused in #<Dynflow::ExecutionPlan:0x000
00004a16fb8>"

Note: The "invalid worlds found" error will show up even when using the patched version, but all the values will be either :valid or :invalidated and won't mention invalid state transition

Comment 9 Ivan Necas 2016-12-01 22:37:26 UTC
Yes, as the fix for 1390933  should cover both cases

*** This bug has been marked as a duplicate of bug 1390933 ***


Note You need to log in before you can comment on or make changes to this bug.