1600920 – Insights Plan Execution is slow at scale

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1600920 - Insights Plan Execution is slow at scale

Summary: Insights Plan Execution is slow at scale

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Ansible - Configuration Management
Sub Component:
Version:	6.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	6.4.0
Assignee:	Marek Hulan
QA Contact:	sbadhwar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-13 11:40 UTC by sbadhwar
Modified:	2019-11-05 23:28 UTC (History)
CC List:	6 users (show)
Fixed In Version:	foreman_ansible-2.2.7-1,foreman_remote_execution-1.5.6-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-10-16 19:14:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	24262	0	Normal	Closed	Insights Plan Execution is slow at scale	2020-05-05 12:12:42 UTC
Foreman Issue Tracker	24866	0	Normal	Closed	Caching for job invocation macros	2020-05-05 12:12:42 UTC

Internal Links: 1601155

Description sbadhwar 2018-07-13 11:40:17 UTC

Description of problem:
When trying to run an Insights plan with 1 rule for "Kernel vulnerable to local privilege escalation via DCCP module (CVE-2017-6074)" on nearly 6K hosts, the execution of the plan seems to be very slow.

While this plan is executing, there are no other kind of jobs that are running in parallel(Repo sync, client reg, etc.)

Currently, the job seems to have executed on only 300 hosts in an hour and taking a look at the database, it seems the planning for task is very slow. Only 301 tasks have been planned.

SELECT COUNT(*) FROM foreman_tasks_tasks WHERE parent_task_id='0f814ce4-2a8e-4be6-83a5-4359bcbb98eb';
-[ RECORD 1 ]
count | 301

Not sure, if this issue should go under which component, so marking it for tasks plugin. Please do correct it if necessary.

Version-Release number of selected component (if applicable):
Satellite 6.4 Snap 10

How reproducible:
Always


Steps to Reproduce:
1. Create a insights plan with applicability to about 6K hosts.
2. Execute the plan by clicking on Run Playbook.

Actual results:
The planning of the tasks for execution plan appears to be slow and the tasks seem to run very slowly(nearly serialized - If there are 2 tasks A and B, then B will start only after A has finished).


Additional info:
Foreman debug: http://debugs.theforeman.org/foreman-debug-vX0MA.tar.xz

Comment 3 Adam Ruzicka 2018-07-16 12:07:52 UTC

The code causing the slowness lives in foreman-ansible, changing the component, even though the fix for this may land elsewhere.

Notes:
The slowness is caused by the way how we render the template to run for insights. When retrieving the playbook from the insights service, we get a single uber-playbook which should remediate all the issues for all the hosts in the plan. When rendering a template for a single host we retrieve the uber-playbook, pick out the parts relevant to the host and run it.

The issue occurs when there's a lot of hosts. We render the template for each host separately, and therefore for N hosts retrieve the uber-playbook N-times. As the number of hosts in the insights plan grows, the uber-playbook and the time needed to retrieve it grows as well. In the testing setup it took +-15 seconds for 6k hosts.

Assuming the uber-playbook won't change within a single job invocation and for a single plan, we should try to use these two keys to cache the uber-playbook.

Comment 4 Adam Ruzicka 2018-07-16 12:10:19 UTC

Created redmine issue http://projects.theforeman.org/issues/24262 from this bug

Comment 5 Marek Hulan 2018-08-06 14:22:47 UTC

As discussed with Adam, it would be good to start caching the playbook for each host since in our case it's always the same.

Comment 6 Satellite Program 2018-09-07 16:07:03 UTC

Upstream bug assigned to mhulan

Comment 7 Satellite Program 2018-09-07 16:07:07 UTC

Upstream bug assigned to mhulan

Comment 8 Satellite Program 2018-09-13 08:06:56 UTC

Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/24262 has been resolved.

Comment 9 Marek Hulan 2018-09-13 09:08:07 UTC

https://github.com/theforeman/foreman-packaging/pull/2980 - rex
https://github.com/theforeman/foreman-packaging/pull/2982 - ansible

Comment 10 Patrick Creech 2018-09-21 00:42:39 UTC

both package versions or greater are downstream

Comment 12 Bryan Kearney 2018-10-16 19:14:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2927

Note You need to log in before you can comment on or make changes to this bug.