Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
Since we have been facing more and more slow capsule sync issues, I had decided to take some time to fix the pulp 2 codes to improve this as we still have some time before migrating to pulp 3.
Below are the changes I made. With the change, RHEL 7 Server rpm repository will sync 40% - 50% quicker. Some small repositories like satellite tools and RHEL 7 extras repositories which currently taking a few minutes to sync will only take 1 minute or less to finish.
The codes fixes the following issues:
- Avoid reading unwanted repo metadata fields (such as Primary.xml, Updateinfo.xml) while determining units to download
- Avoid reading unwanted repo metadata fields (such as Primary.xml, Updateinfo.xml) when removing missing units (Mirror on Sync)
- Improve the query to purge duplicate units. Previously, it read the whole units_rpm collection (the largest collection). Once the collection reached millions of records it become very slow.
- Skipping repository publishing if Errata, Yum repo metadata and Comps are not changed. Previously, it would be triggered on every full sync.
Requesting a HF or at least inclusion in 6.8.5. The customer behind suffers by slow Capsule Sync where Full Caps takes 3+ days and Optimised one many hours (while syncing few repos at the end).
Particular reproducer:
- have multiple CVs with RHEL7 7Server repo (optionally with some filters), promoted to multiple LEs
- the RHEL7 repo is the best example here; just have that one repo in a CV
- Sync the content to a Capsule
- check that sync tasks take e.g. 10+ minutes each (for the customer, it is even 20-30 minutes, just for the sync itself)
- mongo logs huge aggregate response times to /var/log/messages, like:
Feb 24 12:04:11 pmoravec-caps68-rhev mongod.27017[10550]: [conn481] command pulp_database.units_rpm command: aggregate { aggregate: "units_rpm", pipeline: [ { $sort: { release: 1, epoch: 1, version: 1, arch: 1, name: 1 } }, { $project: { release: 1, epoch: 1, version: 1, arch: 1, name: 1 } } ], cursor: { batchSize: 5 }, allowDiskUse: true } planSummary: COLLSCAN cursorid:188994508839 keysExamined:0 docsExamined:67979 hasSortStage:1 numYields:1753 nreturned:5 reslen:893 locks:{ Global: { acquireCount: { r: 3526 } }, Database: { acquireCount: { r: 1763 } }, Collection: { acquireCount: { r: 1762 } } } protocol:op_query 40484ms
Applying the Hao++ patches, sync times decreased few times, and the overall duration was faster by 50%.
Comment 3pulp-infra@redhat.com
2021-03-02 13:05:37 UTC
The Pulp upstream bug status is at POST. Updating the external tracker on this bug.
Comment 4pulp-infra@redhat.com
2021-03-02 13:05:39 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.