Bug 1509965 - very slow publishing of a content view with filters containing many errata
Summary: very slow publishing of a content view with filters containing many errata
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Content Views
Version: Unspecified
Hardware: All
OS: Linux
high
high
Target Milestone: Unspecified
Assignee: Partha Aji
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-06 13:12 UTC by Pavel Moravec
Modified: 2021-09-09 12:48 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-21 16:54:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 21727 0 Normal Closed very slow publishing of a content view with filters containing many errata 2021-02-15 14:50:56 UTC
Red Hat Knowledge Base (Solution) 3232951 0 None None None 2017-11-06 13:30:41 UTC

Description Pavel Moravec 2017-11-06 13:12:09 UTC
Description of problem:
Having a Content View with a filter that includes or excludes thousands of errata, an attempt to publish the CV takes too much time (i.e. 2+ minutes per each CV's repo  with the filter applied).

As an example, having a CV with 10 repos with such filters, it takes approx. 30 minutes of planning the task (and then just few minutes to execute it, incld. CopyRpm or DistributorPublish).

That is bad from two reasons:
1) overall performance is bad (because planning a task takes several times more than executing the task)
2) user sees practically nothing for most of the task lifecycle. When interested why the task publish takes so long, task details are empty (since task is still in planning).

Particular code that takes so long:

https://github.com/Katello/katello/blob/master/app/lib/actions/katello/repository/clone_yum_content.rb#L17 (clause_gen.generate)

that is this method:

https://github.com/Katello/katello/blob/master/app/lib/katello/util/filter_clause_generator.rb#L9-L12

The method is inefficient for arguments with thousands of errata in it.


Today, there exists a workaround in using opposite filtering (i.e. instead of "include all errata older than month ago", use "exclude any newer errata" (and deal with pkgs outside errata). However this workaround will be less and less applicable as the overall number of errata in a repo will grow over time.


Version-Release number of selected component (if applicable):
Sat 6.2.12


How reproducible:
100%


Steps to Reproduce:
1. Have synced several bigger repos with many errata
2. Create a CV, add there all the repos.
3. Add a filter "include all errata older than date X.Y." such that the date is just a month old / to include most of errata in the CV
4. Click to publish the CV
5. Check how long it will take to publish the CV (and when the task will leave planning phase / will start executing the very first step)


Actual results:
5. CV publish takes 30+ minutes, most of the time is spent in planning (very first dynflow step is kicked off after a long time)


Expected results:
5. Some reasonable lower planning phase


Additional info:
Just add some debugging statements just around the line https://github.com/Katello/katello/blob/master/app/lib/actions/katello/repository/clone_yum_content.rb#L17 to see the delay is right there.

Comment 4 Pavel Moravec 2017-11-06 15:22:51 UTC
Also worth to know, that the list of errata works quite inefficiently when having multiple repos.

Assume I have a big repo (RHEL6 6Server, e.g.) and several small repos and apply errata filter "include every errata older than today". Then - after adding the debugs per "Additional info" - one can see that publishing that CV spends *same* (surprisingly high) time in the method

https://github.com/Katello/katello/blob/master/app/lib/actions/katello/repository/clone_yum_content.rb#L17

for *each and every* repo where errata are applied to. Even if the repo has just few errata. So the calculation is somehow disproportional for small repos (if there is a big repo as well) and it seems the calculation is repeated for each and every repo in the CV once again.

Comment 6 Magnus Glantz 2017-11-10 16:43:23 UTC
I can replicate this and I have also several customers who are seeing this.
This is turning Satellite 6 into something which is not very useful.

Comment 10 Partha Aji 2017-11-21 19:29:49 UTC
Connecting redmine issue http://projects.theforeman.org/issues/21727 from this bug

Comment 11 Satellite Program 2017-11-29 21:12:28 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/21727 has been resolved.

Comment 12 Justin Sherrill 2017-12-07 16:59:14 UTC
Re-proposing for 6.3, as this has a high impact.

Comment 15 Zach Huntington-Meath 2017-12-13 22:05:48 UTC
Partha would you mind taking a look at cherry-picking this.

Comment 17 Satellite Program 2018-02-21 16:54:37 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> > 
> > For information on the advisory, and where to find the updated files, follow the link below.
> > 
> > If the solution does not work for you, open a new bug report.
> > 
> > https://access.redhat.com/errata/RHSA-2018:0336


Note You need to log in before you can comment on or make changes to this bug.