Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1557067 - [RFE] Have a Mechanism to Proactively Detect and Clean "orphaned" Dynflow Tasks
[RFE] Have a Mechanism to Proactively Detect and Clean "orphaned" Dynflow Tasks
Status: CLOSED ERRATA
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Tasks Plugin (Show other bugs)
6.3.0
All Linux
unspecified Severity high (vote)
: Beta
: Unused
Assigned To: Adam Ruzicka
Ivan Necas
: FutureFeature, Triaged
: 1583894 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-03-15 19:53 EDT by Christian Marineau
Modified: 2018-10-16 11:31 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-10-16 11:30:28 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2755731 None None None 2018-05-30 03:27 EDT
Foreman Issue Tracker 22915 None None None 2018-03-16 05:30 EDT
Red Hat Product Errata RHSA-2018:2927 None None None 2018-10-16 11:31 EDT

  None (edit)
Description Christian Marineau 2018-03-15 19:53:25 EDT
Description of problem:
-----------------------
If some running tasks have been destroyed in the past via foreman-rake console, it is very possible that some "orphaned" dynflow tasks are still inside the database. 

Also, as in older versions of Satellite 6, there was no "foreman_tasks:cleanup" rake, so most likely some tasks were destroyed with a similar command at some point during the existence of Satellite:

e.g.:
# foreman-rake console
> ForemanTasks::Task.where(:state => :running).destroy_all

This situation is known to cause important issues over time, especially with "Register Host" Tasks which can take more and more time until a timeout is reached by the clients. These dynflow entries will stay in the database from time to time and survives Satellite upgrades.


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
6.3


How reproducible:
-----------------
100%


Steps to Reproduce:
-------------------
1. Triggers a lot of task. One easy way is launch a Sync Plan which will trigger multiple Repositories Syncing task.

2. While the task are running, destroy them with the following:
# foreman-rake console
> ForemanTasks::Task.where(:state => :running).where(:label => "Actions::Katello::Repository::Sync").destroy_all


Actual results:
---------------
=> There is currently no proactive mechanism that detects this situation and it makes the troubleshooting more fastidious as a lot of aspect will be investigated before we come across this problem.

=> Also, there is currently no built-in mechanism to clean these entries. 


Additional info:
----------------
=> We can manually detect these "orphaned" task with the following command:
# su - postgres -c 'psql -d foreman -c\
     "SELECT foreman_tasks_tasks.label,
      count(foreman_tasks_tasks.id) tasks_total,
      count(dynflow_actions.id) actions_total
      FROM dynflow_actions
      LEFT JOIN foreman_tasks_tasks
      ON (foreman_tasks_tasks.external_id = dynflow_actions.execution_plan_uuid)
      GROUP BY foreman_tasks_tasks.label ORDER BY actions_total DESC LIMIT 30"'

=> Then, look the the entry where the label is empty:
e.g.:
                        label                         | tasks_total | actions_total 
------------------------------------------------------+-------------+---------------
                                                      |           0 |          5839


=> To fix the current situation, the known workaround is the following
# cat <<EOF | foreman-rake console
persistence = ForemanTasks.dynflow.world.persistence
adapter = persistence.adapter

batch_size = 5
total = adapter.db.fetch("select count(dynflow_execution_plans.uuid) from dynflow_execution_plans left join foreman_tasks_tasks on (dynflow_execution_plans.uuid = foreman_tasks_tasks.external_id) where foreman_tasks_tasks.id IS NULL").first[:count]
deleted = 0
puts "about to delete #{total} execution plans"
while (plans_without_tasks = adapter.db.fetch("select dynflow_execution_plans.uuid from dynflow_execution_plans left join foreman_tasks_tasks on (dynflow_execution_plans.uuid = foreman_tasks_tasks.external_id) where foreman_tasks_tasks.id IS NULL LIMIT #{batch_size}").all.map { |x| x[:uuid] }) && !plans_without_tasks.empty?
  persistence.delete_execution_plans({ 'uuid' => plans_without_tasks }, batch_size)
  deleted += plans_without_tasks.count
  puts "deleted #{deleted} out of #{total}"
end
EOF
Comment 1 Adam Ruzicka 2018-03-16 05:30:36 EDT
Created redmine issue http://projects.theforeman.org/issues/22915 from this bug
Comment 2 pm-sat@redhat.com 2018-03-16 06:18:19 EDT
Upstream bug assigned to aruzicka@redhat.com
Comment 3 pm-sat@redhat.com 2018-03-16 06:18:22 EDT
Upstream bug assigned to aruzicka@redhat.com
Comment 5 pm-sat@redhat.com 2018-03-21 20:18:53 EDT
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/22915 has been resolved.
Comment 6 Adam Ruzicka 2018-05-30 03:27:02 EDT
*** Bug 1583894 has been marked as a duplicate of this bug. ***
Comment 13 errata-xmlrpc 2018-10-16 11:30:28 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2927

Note You need to log in before you can comment on or make changes to this bug.