Bug 1687771

Summary: restarting dynflowd with a task in planning phase can leave the task "planning" forever
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: Tasks PluginAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Peter Ondrejka <pondrejk>
Severity: high Docs Contact:
Priority: medium    
Version: 6.4.2CC: aruzicka, baitken, bkearney, inecas, mmccune, pcreech, pdragun, satellite6-bugs
Target Milestone: 6.7.0Keywords: Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-14 13:24:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Moravec 2019-03-12 10:50:20 UTC
Description of problem:
There are scenarios (based on a race condition or maybe call flow used for triggering the task?) where a task is planning for a while, and if dynflowd is restarted that time, the task sits "planning" forever.

Particular example (very visible without fix for bz1673447): see reproducer steps.

It is assumed https://github.com/Dynflow/dynflow/pull/303 fixes this.


Version-Release number of selected component (if applicable):
6.4.2 (or anything older)


How reproducible:
very likely (scale the test more to have better chance)


Steps to Reproduce:
1. Have more repos, more LEs and few Capsules

2. Create and publish many CVs with even identical content (one or two small repos, e.g.)

3. Promote many CVs to next LE, e.g. via:

for i in $(seq 1 20); do 
  hammer content-view version promote --content-view CV_${i} --organization-id 1 --from-lifecycle-environment-id 1 --to-lifecycle-environment-id 2 --async &
  sleep 1
done

4. monitor tasks status summary e.g. via:

sudo su - postgres -c "psql -d foreman -c 'select label,count(label),state,result from foreman_tasks_tasks where state <> '\''stopped'\'' group by label,state,result ORDER BY label;'"

5. Once there will be more Actions::Katello::CapsuleContent::Sync tasks in planning state, restart dynflowd:

service dynflowd restart

6. Monitor the tasks status summary until all Capsule Sync tasks terminate


Actual results:
6. is waiting for Godot


Expected results:
6. all Sync tasks successfully complete after a reasonable time


Additional info:

Comment 4 Mike McCune 2019-04-23 21:07:31 UTC
Created redmine issue https://projects.theforeman.org/issues/26666 from this bug

Comment 8 Peter Ondrejka 2020-01-03 15:38:03 UTC
Verified on sat 6.7 snap 7 using reproduction steps from the problem description. The planned taks are cleaned up properly after the service restart.

Comment 11 errata-xmlrpc 2020-04-14 13:24:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1454