Bug 1593480

Summary: IndexContent step can take 20+ minutes during initial sync of a large repo
Product: Red Hat Satellite Reporter: Chris Duryee <cduryee>
Component: RepositoriesAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: vijsingh
Severity: medium Docs Contact:
Priority: high    
Version: 6.3.1CC: zhunting
Target Milestone: 6.6.0Keywords: FieldEngineering, Performance, PrioBumpField, PrioBumpQA, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-22 12:46:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Duryee 2018-06-20 23:54:51 UTC
Description of problem:

If you on-demand sync a large repo like RHEL7 Server x86_64, the Pulp sync will take about 15-20 minutes. However, the IndexContent step will take even longer, perhaps 20-30 minutes.

It looks like most of the time is spent in import_all (logging added by me):

2018-06-20T21:53:50 [W|app|] begin import_all Katello::Pulp::Rpm
2018-06-20T22:06:47 [W|app|] end import_all Katello::Pulp::Rpm
2018-06-20T22:06:49 [W|app|] begin import_all Katello::Pulp::Srpm
2018-06-20T22:06:49 [W|app|] end import_all Katello::Pulp::Srpm
2018-06-20T22:06:51 [W|app|] begin import_all Katello::Pulp::Erratum
2018-06-20T22:18:21 [W|app|] end import_all Katello::Pulp::Erratum
2018-06-20T22:18:21 [W|app|] begin import_all Katello::Pulp::PackageGroup
2018-06-20T22:18:24 [W|app|] end import_all Katello::Pulp::PackageGroup

I think most of the time is spent in the loop in https://github.com/Katello/katello/blob/master/app/models/katello/concerns/pulp_database_unit.rb#L51-L57, where each unit is loaded individually. Resyncs do not take nearly as long.

Syncing one or more large repos is extremely common when setting up a Katello installation, so any time savings here would be a big deal.


Version-Release number of selected component (if applicable): 6.3.1


How reproducible: every time


Steps to Reproduce:
1. load a manifest
2. enable rhel 7 server repo
3. sync repo

Actual results: IndexContent step takes 20-30 minutes

Expected results: IndexContent step takes 10 minutes or less

Comment 3 Satellite Program 2018-12-06 21:12:00 UTC
Upstream bug assigned to cfouant

Comment 4 Satellite Program 2018-12-06 21:12:03 UTC
Upstream bug assigned to cfouant

Comment 5 Satellite Program 2018-12-11 03:11:23 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/24024 has been resolved.

Comment 8 vijsingh 2019-06-04 09:48:00 UTC
ON_QA Verified

@Satellite 6.6.0 snap 5.0

Steps:

 1. Enabled 'RHEL 7Server' and 'RHEL 6Server' repos.
 2. Sync repos

Observation:

 1. 'IndexContent' step takes time between 4-5mins 


> Repo 7Server step execution details 
~~~~~~~~~~~~~~~~~
6: Actions::Katello::Repository::IndexContent (success) [ 281.98s / 281.98s ]
Started at: 2019-06-04 07:50:52 UTC

Ended at: 2019-06-04 07:55:34 UTC

Real time: 281.98s

Execution time (excluding suspended state): 281.98s
~~~~~~~~~~~~~~~~~

> Repo 6Server step execution details
~~~~~~~~~~~~~~~~~
Started at: 2019-06-04 08:32:07 UTC

Ended at: 2019-06-04 08:36:29 UTC

Real time: 262.09s

Execution time (excluding suspended state): 262.09s
~~~~~~~~~~~~~~~~~

Comment 10 vijsingh 2019-06-04 10:00:36 UTC
Repos download policy was 'on-demand'.

Comment 13 errata-xmlrpc 2019-10-22 12:46:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3172