Red Hat Bugzilla – Bug 1557067
[RFE] Have a Mechanism to Proactively Detect and Clean "orphaned" Dynflow Tasks
Last modified: 2018-10-16 11:31:52 EDT
Description of problem: ----------------------- If some running tasks have been destroyed in the past via foreman-rake console, it is very possible that some "orphaned" dynflow tasks are still inside the database. Also, as in older versions of Satellite 6, there was no "foreman_tasks:cleanup" rake, so most likely some tasks were destroyed with a similar command at some point during the existence of Satellite: e.g.: # foreman-rake console > ForemanTasks::Task.where(:state => :running).destroy_all This situation is known to cause important issues over time, especially with "Register Host" Tasks which can take more and more time until a timeout is reached by the clients. These dynflow entries will stay in the database from time to time and survives Satellite upgrades. Version-Release number of selected component (if applicable): ------------------------------------------------------------- 6.3 How reproducible: ----------------- 100% Steps to Reproduce: ------------------- 1. Triggers a lot of task. One easy way is launch a Sync Plan which will trigger multiple Repositories Syncing task. 2. While the task are running, destroy them with the following: # foreman-rake console > ForemanTasks::Task.where(:state => :running).where(:label => "Actions::Katello::Repository::Sync").destroy_all Actual results: --------------- => There is currently no proactive mechanism that detects this situation and it makes the troubleshooting more fastidious as a lot of aspect will be investigated before we come across this problem. => Also, there is currently no built-in mechanism to clean these entries. Additional info: ---------------- => We can manually detect these "orphaned" task with the following command: # su - postgres -c 'psql -d foreman -c\ "SELECT foreman_tasks_tasks.label, count(foreman_tasks_tasks.id) tasks_total, count(dynflow_actions.id) actions_total FROM dynflow_actions LEFT JOIN foreman_tasks_tasks ON (foreman_tasks_tasks.external_id = dynflow_actions.execution_plan_uuid) GROUP BY foreman_tasks_tasks.label ORDER BY actions_total DESC LIMIT 30"' => Then, look the the entry where the label is empty: e.g.: label | tasks_total | actions_total ------------------------------------------------------+-------------+--------------- | 0 | 5839 => To fix the current situation, the known workaround is the following # cat <<EOF | foreman-rake console persistence = ForemanTasks.dynflow.world.persistence adapter = persistence.adapter batch_size = 5 total = adapter.db.fetch("select count(dynflow_execution_plans.uuid) from dynflow_execution_plans left join foreman_tasks_tasks on (dynflow_execution_plans.uuid = foreman_tasks_tasks.external_id) where foreman_tasks_tasks.id IS NULL").first[:count] deleted = 0 puts "about to delete #{total} execution plans" while (plans_without_tasks = adapter.db.fetch("select dynflow_execution_plans.uuid from dynflow_execution_plans left join foreman_tasks_tasks on (dynflow_execution_plans.uuid = foreman_tasks_tasks.external_id) where foreman_tasks_tasks.id IS NULL LIMIT #{batch_size}").all.map { |x| x[:uuid] }) && !plans_without_tasks.empty? persistence.delete_execution_plans({ 'uuid' => plans_without_tasks }, batch_size) deleted += plans_without_tasks.count puts "deleted #{deleted} out of #{total}" end EOF
Created redmine issue http://projects.theforeman.org/issues/22915 from this bug
Upstream bug assigned to aruzicka@redhat.com
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/22915 has been resolved.
*** Bug 1583894 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2927