Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1557067

Summary: [RFE] Have a Mechanism to Proactively Detect and Clean "orphaned" Dynflow Tasks
Product: Red Hat Satellite Reporter: Christian Marineau <cmarinea>
Component: Tasks PluginAssignee: Adam Ruzicka <aruzicka>
Status: CLOSED ERRATA QA Contact: Ivan Necas <inecas>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.3.0CC: aruzicka, dgross, hyu, inecas
Target Milestone: 6.4.0Keywords: FutureFeature, Triaged
Target Release: Unused   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-16 15:30:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Christian Marineau 2018-03-15 23:53:25 UTC
Description of problem:
-----------------------
If some running tasks have been destroyed in the past via foreman-rake console, it is very possible that some "orphaned" dynflow tasks are still inside the database. 

Also, as in older versions of Satellite 6, there was no "foreman_tasks:cleanup" rake, so most likely some tasks were destroyed with a similar command at some point during the existence of Satellite:

e.g.:
# foreman-rake console
> ForemanTasks::Task.where(:state => :running).destroy_all

This situation is known to cause important issues over time, especially with "Register Host" Tasks which can take more and more time until a timeout is reached by the clients. These dynflow entries will stay in the database from time to time and survives Satellite upgrades.


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
6.3


How reproducible:
-----------------
100%


Steps to Reproduce:
-------------------
1. Triggers a lot of task. One easy way is launch a Sync Plan which will trigger multiple Repositories Syncing task.

2. While the task are running, destroy them with the following:
# foreman-rake console
> ForemanTasks::Task.where(:state => :running).where(:label => "Actions::Katello::Repository::Sync").destroy_all


Actual results:
---------------
=> There is currently no proactive mechanism that detects this situation and it makes the troubleshooting more fastidious as a lot of aspect will be investigated before we come across this problem.

=> Also, there is currently no built-in mechanism to clean these entries. 


Additional info:
----------------
=> We can manually detect these "orphaned" task with the following command:
# su - postgres -c 'psql -d foreman -c\
     "SELECT foreman_tasks_tasks.label,
      count(foreman_tasks_tasks.id) tasks_total,
      count(dynflow_actions.id) actions_total
      FROM dynflow_actions
      LEFT JOIN foreman_tasks_tasks
      ON (foreman_tasks_tasks.external_id = dynflow_actions.execution_plan_uuid)
      GROUP BY foreman_tasks_tasks.label ORDER BY actions_total DESC LIMIT 30"'

=> Then, look the the entry where the label is empty:
e.g.:
                        label                         | tasks_total | actions_total 
------------------------------------------------------+-------------+---------------
                                                      |           0 |          5839


=> To fix the current situation, the known workaround is the following
# cat <<EOF | foreman-rake console
persistence = ForemanTasks.dynflow.world.persistence
adapter = persistence.adapter

batch_size = 5
total = adapter.db.fetch("select count(dynflow_execution_plans.uuid) from dynflow_execution_plans left join foreman_tasks_tasks on (dynflow_execution_plans.uuid = foreman_tasks_tasks.external_id) where foreman_tasks_tasks.id IS NULL").first[:count]
deleted = 0
puts "about to delete #{total} execution plans"
while (plans_without_tasks = adapter.db.fetch("select dynflow_execution_plans.uuid from dynflow_execution_plans left join foreman_tasks_tasks on (dynflow_execution_plans.uuid = foreman_tasks_tasks.external_id) where foreman_tasks_tasks.id IS NULL LIMIT #{batch_size}").all.map { |x| x[:uuid] }) && !plans_without_tasks.empty?
  persistence.delete_execution_plans({ 'uuid' => plans_without_tasks }, batch_size)
  deleted += plans_without_tasks.count
  puts "deleted #{deleted} out of #{total}"
end
EOF

Comment 1 Adam Ruzicka 2018-03-16 09:30:36 UTC
Created redmine issue http://projects.theforeman.org/issues/22915 from this bug

Comment 2 Satellite Program 2018-03-16 10:18:19 UTC
Upstream bug assigned to aruzicka

Comment 3 Satellite Program 2018-03-16 10:18:22 UTC
Upstream bug assigned to aruzicka

Comment 5 Satellite Program 2018-03-22 00:18:53 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/22915 has been resolved.

Comment 6 Adam Ruzicka 2018-05-30 07:27:02 UTC
*** Bug 1583894 has been marked as a duplicate of this bug. ***

Comment 13 errata-xmlrpc 2018-10-16 15:30:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2927