Bug 2030434

Summary: Repository sync download all metadata files on every sync, even when there is no new packages
Product: Red Hat Satellite Reporter: Joniel Pasqualetto <jpasqual>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Sam Bible <sbible>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.10.1CC: ahumbe, dalley, ggainey, jsherril, pcreech, rchan, ttereshc, wpinheir
Target Milestone: 6.11.0Keywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-05 14:31:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joniel Pasqualetto 2021-12-08 18:46:51 UTC
Description of problem:
Every time a repository sync is executed, pulp is downloading all the metadata files from the upstream repository. This generates big downloads, depending on the size of the repo, and a lot of time is wasted processing the same repodata again.

Version-Release number of selected component (if applicable):

satellite-6.10.1-2.el7sat.noarch

How reproducible:

Always

Steps to Reproduce:
1. Enable any repository (used Red Hat Enterprise Linux 8 for x86_64 - BaseOS RPMs 8 in my reproducer)
2. Sync it once
3. Sync it again. Check the details of the task from the second sync. Note the step "Downloading Metadata Files"

~~~
  - message: Downloading Metadata Files
    code: sync.downloading.metadata
    state: completed
    done: 10                     <===== downloaded 10 files
~~~ 

Actual results:
Multiple metadata files downloaded on every sync, consuming bandwidth and resources to process it.

Expected results:
Download all the metadata only if there were changes on the repository, keeping the downloading and processing time to the minimum possible.

Additional info:

Comment 1 Daniel Alley 2021-12-08 20:00:46 UTC
A little more background:

Pulp tries to avoid doing this by using a couple of heuristics. In pulp_rpm 3.14 which is shipped with Satellite 6.10, the heuristics were as follows:

* If no repository version has been created since the one created by the last sync
* If the repomd.xml is the same
* If the remote has not been changed since the last sync.

THEN the sync is skipped.

This is fine in isolation, but in the Katello context, the remote is *always* updated just prior to the sync in order to refresh the TLS client certificates. Thus the heuristic is violated and the sync always occurs.

Comment 3 pulp-infra@redhat.com 2021-12-08 20:09:45 UTC
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.

Comment 4 pulp-infra@redhat.com 2021-12-08 20:09:46 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 5 pulp-infra@redhat.com 2021-12-08 21:08:11 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 6 Sam Bible 2022-01-11 20:36:30 UTC
verified on 7.0.0 - 4

Steps to Reproduce:
1. Enable any repository (Red Hat Enterprise Linux 8 for x86_64 - BaseOS RPMs was used in testing)
2. Sync it once
3. Sync it again. 

Actual results:
10 Files synced the first time, none the next time. 

Expected results:
Download all the metadata only if there were changes on the repository

Comment 9 errata-xmlrpc 2022-07-05 14:31:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5498