Bug 1600920

Summary: Insights Plan Execution is slow at scale
Product: Red Hat Satellite Reporter: sbadhwar
Component: Ansible - Configuration ManagementAssignee: Marek Hulan <mhulan>
Status: CLOSED ERRATA QA Contact: sbadhwar
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.4CC: aruzicka, inecas, jhutar, mhulan, pcreech, psuriset
Target Milestone: 6.4.0Keywords: Performance, Triaged, UserExperience
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: foreman_ansible-2.2.7-1,foreman_remote_execution-1.5.6-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-16 19:14:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description sbadhwar 2018-07-13 11:40:17 UTC
Description of problem:
When trying to run an Insights plan with 1 rule for "Kernel vulnerable to local privilege escalation via DCCP module (CVE-2017-6074)" on nearly 6K hosts, the execution of the plan seems to be very slow.

While this plan is executing, there are no other kind of jobs that are running in parallel(Repo sync, client reg, etc.)

Currently, the job seems to have executed on only 300 hosts in an hour and taking a look at the database, it seems the planning for task is very slow. Only 301 tasks have been planned.

SELECT COUNT(*) FROM foreman_tasks_tasks WHERE parent_task_id='0f814ce4-2a8e-4be6-83a5-4359bcbb98eb';
-[ RECORD 1 ]
count | 301

Not sure, if this issue should go under which component, so marking it for tasks plugin. Please do correct it if necessary.

Version-Release number of selected component (if applicable):
Satellite 6.4 Snap 10

How reproducible:
Always


Steps to Reproduce:
1. Create a insights plan with applicability to about 6K hosts.
2. Execute the plan by clicking on Run Playbook.

Actual results:
The planning of the tasks for execution plan appears to be slow and the tasks seem to run very slowly(nearly serialized - If there are 2 tasks A and B, then B will start only after A has finished).


Additional info:
Foreman debug: http://debugs.theforeman.org/foreman-debug-vX0MA.tar.xz

Comment 3 Adam Ruzicka 2018-07-16 12:07:52 UTC
The code causing the slowness lives in foreman-ansible, changing the component, even though the fix for this may land elsewhere.

Notes:
The slowness is caused by the way how we render the template to run for insights. When retrieving the playbook from the insights service, we get a single uber-playbook which should remediate all the issues for all the hosts in the plan. When rendering a template for a single host we retrieve the uber-playbook, pick out the parts relevant to the host and run it.

The issue occurs when there's a lot of hosts. We render the template for each host separately, and therefore for N hosts retrieve the uber-playbook N-times. As the number of hosts in the insights plan grows, the uber-playbook and the time needed to retrieve it grows as well. In the testing setup it took +-15 seconds for 6k hosts.

Assuming the uber-playbook won't change within a single job invocation and for a single plan, we should try to use these two keys to cache the uber-playbook.

Comment 4 Adam Ruzicka 2018-07-16 12:10:19 UTC
Created redmine issue http://projects.theforeman.org/issues/24262 from this bug

Comment 5 Marek Hulan 2018-08-06 14:22:47 UTC
As discussed with Adam, it would be good to start caching the playbook for each host since in our case it's always the same.

Comment 6 Satellite Program 2018-09-07 16:07:03 UTC
Upstream bug assigned to mhulan

Comment 7 Satellite Program 2018-09-07 16:07:07 UTC
Upstream bug assigned to mhulan

Comment 8 Satellite Program 2018-09-13 08:06:56 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/24262 has been resolved.

Comment 10 Patrick Creech 2018-09-21 00:42:39 UTC
both package versions or greater are downstream

Comment 12 Bryan Kearney 2018-10-16 19:14:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2927