Bug 1315326

Summary: Capsule sync redundantly generates metadata for all repos
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: RepositoriesAssignee: Tomas Strachota <tstrachota>
Status: CLOSED NEXTRELEASE QA Contact: Katello QA List <katello-qa-list>
Severity: high Docs Contact:
Priority: medium    
Version: 6.1.6CC: bbuckingham, bkearney, ktordeur, oshtaier, sthirugn, tstrachota
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1327338 (view as bug list) Environment:
Last Closed: 2016-05-06 18:48:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1327338    

Description Pavel Moravec 2016-03-07 13:35:40 UTC
Description of problem:
Invoking capsule sync:
1) Sat orders capsule to sync _all_ repositories in _all_ content views / lifecycle environments to the capsule
2) for every such repository, pulp on the capsule generates new repo metadata

Assume a use case where Sat having few hundreds of repositories in different content views, and just some repo might need to be synced (as there are doubts if it was synced properly). A capsule sync would do so so much redundant work taking hours.

To see the scope of ridiculous work being done:
- assume a use case where large repos (say rhel5-7 base) are present in many content views as a base - repo metadata takes nontrivial time to be computed, multiply it by # of repos..
- particular example: a customer behind this bug has 1250 repos and capsule sync takes 3-4 hours(!) doing nothing

Please optimize either 1 or 2. While I understand Sat does not know what repo needs and what does not need to be synced to the capsule (i.e. 1 sounds legit), there should be an option to e.g. fetch metadata from Sat to Caps, compare if they are the same and if so, do nothing (and if differ or missing on Caps, then do the sync).


Version-Release number of selected component (if applicable):
Sat 6.1.7


How reproducible:
100%


Steps to Reproduce:
1. Have more pulp repos enabled (e.g. have sat61-tools repo in 10 published content views)
2. do repeatedly capsule sync, without any repo / content view manipulation meantime
3. Check times of execution of 1st sync and other synces
4. ll /var/lib/pulp/published/yum/https/repos/Default_Organization/Library/ContentViewName1/content/dist/rhel/server/7/7Server/x86_64/sat-tools/6.1/os/repodata/


Actual results:
3. synces takes the same/very similar time, nontrivial
4. repodata recalculated every time


Expected results:
3. sync takes small time, if no work to be done
4. repodata not updated / recalculated every time (if there is no need to)


Additional info:

Comment 2 Tomas Strachota 2016-04-06 09:15:14 UTC
After discussion with Ina Panova from Pulp team we found out that satellite creates all repository distributors with auto-publish set to true. That results in the publish action (which re-generates the metadata) being executed on every sync.
We should be able to fix the issue with turning auto-publish off and publishing repos from satellite side only if there is some content synced.

Comment 4 Tomas Strachota 2016-04-26 07:56:56 UTC
Created redmine issue http://projects.theforeman.org/issues/14807 from this bug

Comment 5 Bryan Kearney 2016-04-26 08:14:44 UTC
Upstream bug component is Repositories

Comment 7 Tomas Strachota 2016-04-29 15:22:52 UTC
I'm removing the linked upstream issue. Optimization on Pulp's side have been made and metadata generation process is much faster now. Therefore it's not necessary to skip it in upstream.

Comment 8 Bryan Kearney 2016-04-29 16:13:26 UTC
Moving to POST since upstream bug http://projects.theforeman.org/issues/14807 has been closed

Comment 9 Brad Buckingham 2016-05-03 13:52:00 UTC
Hi Tomas,  It appears that the upstream PR has been closed.  Will there be additional changes to address this issue in Satellite 6.2?  If not, should we move this to ON_QA or CLOSED?

Comment 10 Tomas Strachota 2016-05-06 10:19:21 UTC
Hi Brad, this is only sat 6.1 issue. Upstream and 6.2 are not affected. That's why I closed the upstream PR. I didn't know about optimizations that had been done in Pulp by the time I was writing the upstream patch. Please see the discussion in the upstream PR for details.

This change makes sense only in sat 6.1 where there's older Pulp and different approach to how we handle capsule content synchronizations. I also don't think we should track this as sat-6.2.0+.

Comment 11 Brad Buckingham 2016-05-06 18:48:07 UTC
Tomas,  Thanks!  Based on the feedback, I am going to close this bug on 6.2.  The fix/plans for 6.1.z will be tracked in the associated clone bug 1327338.