Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1932735

Summary: Improve the speed of syncing repository
Product: Red Hat Satellite Reporter: Hao Chang Yu <hyu>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED CURRENTRELEASE QA Contact: Lai <ltran>
Severity: medium Docs Contact:
Priority: high    
Version: 6.8.0CC: ahumbe, bmbouter, ggainey, gscarbor, ipanova, jjansky, jjeffers, ktordeur, mmccune, patalber, pmoravec, rchan, sboyron, ttereshc, wpinheir
Target Milestone: UnspecifiedKeywords: PrioBumpGSS
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1943255 1952609 (view as bug list) Environment:
Last Closed: 2021-05-24 14:17:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hao Chang Yu 2021-02-25 05:22:00 UTC
Description of problem:
Since we have been facing more and more slow capsule sync issues, I had decided to take some time to fix the pulp 2 codes to improve this as we still have some time before migrating to pulp 3.

Below are the changes I made. With the change, RHEL 7 Server rpm repository will sync 40% - 50% quicker. Some small repositories like satellite tools and RHEL 7 extras repositories which currently taking a few minutes to sync will only take 1 minute or less to finish.

The codes fixes the following issues:
- Avoid reading unwanted repo metadata fields (such as Primary.xml, Updateinfo.xml) while determining units to download
- Avoid reading unwanted repo metadata fields (such as Primary.xml, Updateinfo.xml) when removing missing units (Mirror on Sync)
- Improve the query to purge duplicate units. Previously, it read the whole units_rpm collection (the largest collection). Once the collection reached millions of records it become very slow.
- Skipping repository publishing if Errata, Yum repo metadata and Comps are not changed. Previously, it would be triggered on every full sync.

Comment 1 Pavel Moravec 2021-02-25 11:37:36 UTC
Requesting a HF or at least inclusion in 6.8.5. The customer behind suffers by slow Capsule Sync where Full Caps takes 3+ days and Optimised one many hours (while syncing few repos at the end).


Particular reproducer:
- have multiple CVs with RHEL7 7Server repo (optionally with some filters), promoted to multiple LEs
  - the RHEL7 repo is the best example here; just have that one repo in a CV
- Sync the content to a Capsule
- check that sync tasks take e.g. 10+ minutes each (for the customer, it is even 20-30 minutes, just for the sync itself)
- mongo logs huge aggregate response times to /var/log/messages, like:

Feb 24 12:04:11 pmoravec-caps68-rhev mongod.27017[10550]: [conn481] command pulp_database.units_rpm command: aggregate { aggregate: "units_rpm", pipeline: [ { $sort: { release: 1, epoch: 1, version: 1, arch: 1, name: 1 } }, { $project: { release: 1, epoch: 1, version: 1, arch: 1, name: 1 } } ], cursor: { batchSize: 5 }, allowDiskUse: true } planSummary: COLLSCAN cursorid:188994508839 keysExamined:0 docsExamined:67979 hasSortStage:1 numYields:1753 nreturned:5 reslen:893 locks:{ Global: { acquireCount: { r: 3526 } }, Database: { acquireCount: { r: 1763 } }, Collection: { acquireCount: { r: 1762 } } } protocol:op_query 40484ms


Applying the Hao++ patches, sync times decreased few times, and the overall duration was faster by 50%.

Comment 3 pulp-infra@redhat.com 2021-03-02 13:05:37 UTC
The Pulp upstream bug status is at POST. Updating the external tracker on this bug.

Comment 4 pulp-infra@redhat.com 2021-03-02 13:05:39 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 5 Pavel Moravec 2021-03-08 09:36:35 UTC
Yet another customer hitting the same performance bug:

Mar  8 05:33:58 capsuleObfuscated mongod.27017[29197]: [ID - user.info] [conn76] command pulp_database.units_rpm command: aggregate { aggregate: "units_rpm", pipeline: [ { $sort: { release: 1, epoch: 1, version: 1, arch: 1, name: 1 } }, { $project: { release: 1, epoch: 1, version: 1, arch: 1, name: 1 } } ], cursor: { batchSize: 5 }, allowDiskUse: true } planSummary: COLLSCAN cursorid:206312856255 keysExamined:0 docsExamined:213683 hasSortStage:1 numYields:3487 nreturned:5 reslen:884 locks:{ Global: { acquireCount: { r: 7022 } }, Database: { acquireCount: { r: 3511 }, acquireWaitCount: { r: 1180 }, timeAcquiringMicros: { r: 3897482 } }, Collection: { acquireCount: { r: 3510 } } } protocol:op_query 79828ms

Comment 7 pulp-infra@redhat.com 2021-03-16 14:12:12 UTC
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

Comment 8 pulp-infra@redhat.com 2021-03-16 15:11:10 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 9 pulp-infra@redhat.com 2021-03-16 22:08:47 UTC
The Pulp upstream bug status is at POST. Updating the external tracker on this bug.

Comment 10 James Jeffers 2021-04-01 13:30:25 UTC
*** Bug 1943255 has been marked as a duplicate of this bug. ***

Comment 11 Gary Scarborough 2021-04-09 15:10:20 UTC
Can we get a set of patches for 6.7.5?  IHAC who has really slow syncs that are causing production problems.

Comment 12 Gary Scarborough 2021-04-09 15:11:22 UTC
Sorry, the case mentioned above was 02744080.

Comment 14 Mike McCune 2021-04-09 21:53:11 UTC
Created attachment 1770805 [details]
pulp-rpm-plugins-2.21.3.3-2.HFRHBZ1932735.el7sat.noarch.rpm

Comment 15 Mike McCune 2021-04-12 17:15:38 UTC
Created attachment 1771385 [details]
pulp-server-2.21.3.3-2.HFRHBZ1932735.el7sat.noarch.rpm

Comment 16 pulp-infra@redhat.com 2021-04-15 14:11:48 UTC
The Pulp upstream bug status is at CLOSED - WONTFIX. Updating the external tracker on this bug.

Comment 22 Brad Buckingham 2021-05-24 14:17:12 UTC
Closing as CLOSED:CURRENTRELEASE as the solution is available in 6.9.2.