Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
DescriptionChristian Marineau
2018-03-15 23:53:25 UTC
Description of problem:
-----------------------
If some running tasks have been destroyed in the past via foreman-rake console, it is very possible that some "orphaned" dynflow tasks are still inside the database.
Also, as in older versions of Satellite 6, there was no "foreman_tasks:cleanup" rake, so most likely some tasks were destroyed with a similar command at some point during the existence of Satellite:
e.g.:
# foreman-rake console
> ForemanTasks::Task.where(:state => :running).destroy_all
This situation is known to cause important issues over time, especially with "Register Host" Tasks which can take more and more time until a timeout is reached by the clients. These dynflow entries will stay in the database from time to time and survives Satellite upgrades.
Version-Release number of selected component (if applicable):
-------------------------------------------------------------
6.3
How reproducible:
-----------------
100%
Steps to Reproduce:
-------------------
1. Triggers a lot of task. One easy way is launch a Sync Plan which will trigger multiple Repositories Syncing task.
2. While the task are running, destroy them with the following:
# foreman-rake console
> ForemanTasks::Task.where(:state => :running).where(:label => "Actions::Katello::Repository::Sync").destroy_all
Actual results:
---------------
=> There is currently no proactive mechanism that detects this situation and it makes the troubleshooting more fastidious as a lot of aspect will be investigated before we come across this problem.
=> Also, there is currently no built-in mechanism to clean these entries.
Additional info:
----------------
=> We can manually detect these "orphaned" task with the following command:
# su - postgres -c 'psql -d foreman -c\
"SELECT foreman_tasks_tasks.label,
count(foreman_tasks_tasks.id) tasks_total,
count(dynflow_actions.id) actions_total
FROM dynflow_actions
LEFT JOIN foreman_tasks_tasks
ON (foreman_tasks_tasks.external_id = dynflow_actions.execution_plan_uuid)
GROUP BY foreman_tasks_tasks.label ORDER BY actions_total DESC LIMIT 30"'
=> Then, look the the entry where the label is empty:
e.g.:
label | tasks_total | actions_total
------------------------------------------------------+-------------+---------------
| 0 | 5839
=> To fix the current situation, the known workaround is the following
# cat <<EOF | foreman-rake console
persistence = ForemanTasks.dynflow.world.persistence
adapter = persistence.adapter
batch_size = 5
total = adapter.db.fetch("select count(dynflow_execution_plans.uuid) from dynflow_execution_plans left join foreman_tasks_tasks on (dynflow_execution_plans.uuid = foreman_tasks_tasks.external_id) where foreman_tasks_tasks.id IS NULL").first[:count]
deleted = 0
puts "about to delete #{total} execution plans"
while (plans_without_tasks = adapter.db.fetch("select dynflow_execution_plans.uuid from dynflow_execution_plans left join foreman_tasks_tasks on (dynflow_execution_plans.uuid = foreman_tasks_tasks.external_id) where foreman_tasks_tasks.id IS NULL LIMIT #{batch_size}").all.map { |x| x[:uuid] }) && !plans_without_tasks.empty?
persistence.delete_execution_plans({ 'uuid' => plans_without_tasks }, batch_size)
deleted += plans_without_tasks.count
puts "deleted #{deleted} out of #{total}"
end
EOF
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2018:2927