Red Hat Bugzilla – Bug 1315326
Capsule sync redundantly generates metadata for all repos
Last modified: 2017-08-22 07:31:17 EDT
Description of problem:
Invoking capsule sync:
1) Sat orders capsule to sync _all_ repositories in _all_ content views / lifecycle environments to the capsule
2) for every such repository, pulp on the capsule generates new repo metadata
Assume a use case where Sat having few hundreds of repositories in different content views, and just some repo might need to be synced (as there are doubts if it was synced properly). A capsule sync would do so so much redundant work taking hours.
To see the scope of ridiculous work being done:
- assume a use case where large repos (say rhel5-7 base) are present in many content views as a base - repo metadata takes nontrivial time to be computed, multiply it by # of repos..
- particular example: a customer behind this bug has 1250 repos and capsule sync takes 3-4 hours(!) doing nothing
Please optimize either 1 or 2. While I understand Sat does not know what repo needs and what does not need to be synced to the capsule (i.e. 1 sounds legit), there should be an option to e.g. fetch metadata from Sat to Caps, compare if they are the same and if so, do nothing (and if differ or missing on Caps, then do the sync).
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Have more pulp repos enabled (e.g. have sat61-tools repo in 10 published content views)
2. do repeatedly capsule sync, without any repo / content view manipulation meantime
3. Check times of execution of 1st sync and other synces
4. ll /var/lib/pulp/published/yum/https/repos/Default_Organization/Library/ContentViewName1/content/dist/rhel/server/7/7Server/x86_64/sat-tools/6.1/os/repodata/
3. synces takes the same/very similar time, nontrivial
4. repodata recalculated every time
3. sync takes small time, if no work to be done
4. repodata not updated / recalculated every time (if there is no need to)
After discussion with Ina Panova from Pulp team we found out that satellite creates all repository distributors with auto-publish set to true. That results in the publish action (which re-generates the metadata) being executed on every sync.
We should be able to fix the issue with turning auto-publish off and publishing repos from satellite side only if there is some content synced.
Created redmine issue http://projects.theforeman.org/issues/14807 from this bug
Upstream bug component is Repositories
I'm removing the linked upstream issue. Optimization on Pulp's side have been made and metadata generation process is much faster now. Therefore it's not necessary to skip it in upstream.
Moving to POST since upstream bug http://projects.theforeman.org/issues/14807 has been closed
Hi Tomas, It appears that the upstream PR has been closed. Will there be additional changes to address this issue in Satellite 6.2? If not, should we move this to ON_QA or CLOSED?
Hi Brad, this is only sat 6.1 issue. Upstream and 6.2 are not affected. That's why I closed the upstream PR. I didn't know about optimizations that had been done in Pulp by the time I was writing the upstream patch. Please see the discussion in the upstream PR for details.
This change makes sense only in sat 6.1 where there's older Pulp and different approach to how we handle capsule content synchronizations. I also don't think we should track this as sat-6.2.0+.
Tomas, Thanks! Based on the feedback, I am going to close this bug on 6.2. The fix/plans for 6.1.z will be tracked in the associated clone bug 1327338.