1557067 – [RFE] Have a Mechanism to Proactively Detect and Clean "orphaned" Dynflow Tasks

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1557067 - [RFE] Have a Mechanism to Proactively Detect and Clean "orphaned" Dynflow Tasks

Summary: [RFE] Have a Mechanism to Proactively Detect and Clean "orphaned" Dynflow Tasks

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Tasks Plugin
Sub Component:
Version:	6.3.0
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	6.4.0
Assignee:	Adam Ruzicka
QA Contact:	Ivan Necas
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1583894 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-15 23:53 UTC by Christian Marineau
Modified:	2022-03-13 14:46 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-10-16 15:30:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	22915	Normal	Closed	Have a Mechanism to Proactively Detect and Clean "orphaned" Dynflow Tasks	2021-02-03 22:03:38 UTC
Red Hat Knowledge Base (Solution)	2755731	None	None	None	2018-05-30 07:27:02 UTC
Red Hat Product Errata	RHSA-2018:2927	None	None	None	2018-10-16 15:31:52 UTC

Internal Links: 1684051

Description Christian Marineau 2018-03-15 23:53:25 UTC

Description of problem:
-----------------------
If some running tasks have been destroyed in the past via foreman-rake console, it is very possible that some "orphaned" dynflow tasks are still inside the database. 

Also, as in older versions of Satellite 6, there was no "foreman_tasks:cleanup" rake, so most likely some tasks were destroyed with a similar command at some point during the existence of Satellite:

e.g.:
# foreman-rake console
> ForemanTasks::Task.where(:state => :running).destroy_all

This situation is known to cause important issues over time, especially with "Register Host" Tasks which can take more and more time until a timeout is reached by the clients. These dynflow entries will stay in the database from time to time and survives Satellite upgrades.


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
6.3


How reproducible:
-----------------
100%


Steps to Reproduce:
-------------------
1. Triggers a lot of task. One easy way is launch a Sync Plan which will trigger multiple Repositories Syncing task.

2. While the task are running, destroy them with the following:
# foreman-rake console
> ForemanTasks::Task.where(:state => :running).where(:label => "Actions::Katello::Repository::Sync").destroy_all


Actual results:
---------------
=> There is currently no proactive mechanism that detects this situation and it makes the troubleshooting more fastidious as a lot of aspect will be investigated before we come across this problem.

=> Also, there is currently no built-in mechanism to clean these entries. 


Additional info:
----------------
=> We can manually detect these "orphaned" task with the following command:
# su - postgres -c 'psql -d foreman -c\
     "SELECT foreman_tasks_tasks.label,
      count(foreman_tasks_tasks.id) tasks_total,
      count(dynflow_actions.id) actions_total
      FROM dynflow_actions
      LEFT JOIN foreman_tasks_tasks
      ON (foreman_tasks_tasks.external_id = dynflow_actions.execution_plan_uuid)
      GROUP BY foreman_tasks_tasks.label ORDER BY actions_total DESC LIMIT 30"'

=> Then, look the the entry where the label is empty:
e.g.:
                        label                         | tasks_total | actions_total 
------------------------------------------------------+-------------+---------------
                                                      |           0 |          5839


=> To fix the current situation, the known workaround is the following
# cat <<EOF | foreman-rake console
persistence = ForemanTasks.dynflow.world.persistence
adapter = persistence.adapter

batch_size = 5
total = adapter.db.fetch("select count(dynflow_execution_plans.uuid) from dynflow_execution_plans left join foreman_tasks_tasks on (dynflow_execution_plans.uuid = foreman_tasks_tasks.external_id) where foreman_tasks_tasks.id IS NULL").first[:count]
deleted = 0
puts "about to delete #{total} execution plans"
while (plans_without_tasks = adapter.db.fetch("select dynflow_execution_plans.uuid from dynflow_execution_plans left join foreman_tasks_tasks on (dynflow_execution_plans.uuid = foreman_tasks_tasks.external_id) where foreman_tasks_tasks.id IS NULL LIMIT #{batch_size}").all.map { |x| x[:uuid] }) && !plans_without_tasks.empty?
  persistence.delete_execution_plans({ 'uuid' => plans_without_tasks }, batch_size)
  deleted += plans_without_tasks.count
  puts "deleted #{deleted} out of #{total}"
end
EOF

Comment 1 Adam Ruzicka 2018-03-16 09:30:36 UTC

Created redmine issue http://projects.theforeman.org/issues/22915 from this bug

Comment 2 Satellite Program 2018-03-16 10:18:19 UTC

Upstream bug assigned to aruzicka

Comment 3 Satellite Program 2018-03-16 10:18:22 UTC

Upstream bug assigned to aruzicka

Comment 5 Satellite Program 2018-03-22 00:18:53 UTC

Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/22915 has been resolved.

Comment 6 Adam Ruzicka 2018-05-30 07:27:02 UTC

*** Bug 1583894 has been marked as a duplicate of this bug. ***

Comment 13 errata-xmlrpc 2018-10-16 15:30:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2927

Note You need to log in before you can comment on or make changes to this bug.