Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2249913 - [Improvement] RefreshRepos step in Capsule Sync to refresh just repos to sync
Summary: [Improvement] RefreshRepos step in Capsule Sync to refresh just repos to sync
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Capsule - Content
Version: 6.14.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: 6.15.0
Assignee: Pavel Moravec
QA Contact: Vladimír Sedmík
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-11-15 22:00 UTC by Pavel Moravec
Modified: 2024-04-23 17:15 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2260525 2260526 (view as bug list)
Environment:
Last Closed: 2024-04-23 17:15:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 36926 0 Normal Ready For Testing [Improvement] RefreshRepos step in Capsule Sync to refresh just repos to sync 2023-11-16 17:46:51 UTC
Red Hat Issue Tracker SAT-21327 0 None None None 2023-11-15 22:02:05 UTC
Red Hat Issue Tracker SAT-21386 0 None None None 2023-11-17 19:17:55 UTC
Red Hat Product Errata RHSA-2024:2010 0 None None None 2024-04-23 17:15:47 UTC

Description Pavel Moravec 2023-11-15 22:00:30 UTC
Description of problem:
Actions::Pulp3::Orchestration::Repository::RefreshRepos can be invoked to all Capsule's repositories during Caps sync, despite just a few repos are needed for that. This redundantly delays Capsule sync time (up to a few minutes).

In scenario (seen in a huge scale at linked customer case):
- having "Sync Capsules after content view promotion" disabled
- having many hundreds to thousands of repos already synced on the Capsule
- promoting just a small CV as the only new content to be synced to Caps

Then invoking Capsule sync triggers RefreshRepos "unscoped" (not restricted to CV or LE or repo), causing all hundreds to thousands of Remotee objects are checked and updated on the Capsule. That can span many minutes, even.


Idea of particular fix: current ordering of dynflow steps is:

3: Actions::Pulp3::ContentGuard::Refresh (success) [ 0.14s / 0.14s ]
5: Actions::Pulp3::Orchestration::Repository::RefreshRepos (success) [ 7.05s / 7.05s ]
7: Actions::Katello::CapsuleContent::SyncCapsule (success) [ 0.16s / 0.16s ]
  9: Actions::Pulp3::CapsuleContent::Sync (success) [ 2.26s / 0.59s ]
  11: Actions::Pulp3::CapsuleContent::GenerateMetadata (success) [ 0.16s / 0.16s ]
  13: Actions::Pulp3::CapsuleContent::RefreshDistribution (success) [ 1.64s / 0.68s ] 

(The tripple 9,11,13 is repeated for every repo required to sync)

That is defined in app/lib/actions/katello/capsule_content/sync.rb :

          sequence do
            if smart_proxy.has_feature?(SmartProxy::PULP3_FEATURE)
              plan_action(Actions::Pulp3::ContentGuard::Refresh, smart_proxy)
              plan_action(Actions::Pulp3::Orchestration::Repository::RefreshRepos, smart_proxy, refresh_options)
            end
            plan_action(SyncCapsule, smart_proxy, refresh_options)

Can't we move the RefreshRepos *into* SyncCapsule (respecting PULP3_FEATURE test), to:

        def plan(smart_proxy, options = {})
          plan_self(:smart_proxy_id => smart_proxy.id)
          action_subject(smart_proxy)
          environment = options[:environment]
          content_view = options[:content_view]
          repository = options[:repository]
          skip_metadata_check = options.fetch(:skip_metadata_check, false)
          sequence do
            repos = repos_to_sync(smart_proxy, environment, content_view, repository, skip_metadata_check)
            return nil if repos.empty?

            HERE CALL RefreshRepos !

            repos.in_groups_of(Setting[:foreman_proxy_content_batch_size], false) do |repo_batch|
              ..


We just have to extend RefreshRepos to accept options[:repository_list] additionally to options[:repository] ..?


Version-Release number of selected component (if applicable):
Sat 6.12 or newer, incl. 6.14


How reproducible:
100%


Steps to Reproduce:
1. Have "Sync Capsules after content view promotion" disabled
2. Have many CVs in many LEs with many repos, all synced to a Capsule
3. Have a CV with even one repo, also synced to the Capsule; now publish+promote a new version (such that Remote is *updated*, not created)
4. Sync the Capsule
5. Check Actions::Pulp3::Orchestration::Repository::RefreshRepos dynflow step: how many pulp_tasks: will be there (alternatively, monitor /var/log/httpd/rhsm-pulpcore-https-443_access_ssl.log on the Capsule for pairs of requests:

1.2.3.4 - - [15/Nov/2023:22:38:39 +0100] "PATCH /pulp/api/v3/remotes/rpm/rpm/141295ff-9731-417d-a6e5-0359f527fa51/ HTTP/1.1" 202 67 "-" "OpenAPI-Generator/3.19.6/ruby"
1.2.3.4 - - [15/Nov/2023:22:38:41 +0100] "GET /pulp/api/v3/remotes/rpm/rpm/?name=1-cv_onerepo-DEV-02c5c6a5-228d-456b-81da-c7c4ffa9f62e HTTP/1.1" 200 6432 "-" "OpenAPI-Generator/3.19.6/ruby"

each pair corresponds to one pulp_task from the dynflow step.


Actual results:
5. You see as many pulp tasks (or pairs of requests) as many repos on the Capsule are. Overall duration of the RefreshRepos step linearly grows with # repos on Capsule.


Expected results:
5. Just one pair of requests / one pulp task to be triggered. RefreshRepos runs in constant time regardless of # of repos already present on the Capsule.


Additional info:

Comment 1 Bryan Kearney 2023-11-16 20:02:57 UTC
Upstream bug assigned to pmoravec

Comment 2 Bryan Kearney 2023-11-16 20:03:00 UTC
Upstream bug assigned to pmoravec

Comment 3 Bryan Kearney 2023-11-27 16:02:41 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/36926 has been resolved.

Comment 5 Vladimír Sedmík 2024-01-19 15:05:05 UTC
Verified on 6.15.0 snap 6.0

For verification I used two identical setups, one with (6.15.0 snap 6) and one without (6.14.2 snap 1) the fix, to compare the improvement.

Steps to verify:
1. Have a Satellite with registered Capsule (unassigned LCE right now).
2. Prepare the content to sync
  a) Sync several repos, I used these for example:
     ansible-2.9-for-rhel-8-x86_64-rpms                 
     rhel-7-server-ansible-2.9-rpms             
     rhel-7-server-kickstart                             
     rhel-7-server-rpms                                     
     rhel-8-for-x86_64-baseos-kickstart                   
     rhel-8-for-x86_64-baseos-rpms                                 
     rhel-9-for-x86_64-baseos-kickstart                   
     rhel-9-for-x86_64-baseos-rpms                                 
     rhel-6-server-els-satellite-client-6-rpms  
     rhel-7-server-satellite-client-6-rpms     
     satellite-client-6-for-rhel-8-x86_64-rpms         
     satellite-client-6-for-rhel-9-x86_64-rpms    
  b) Create 10 LCEs.
  c) Create 10 CVs, add all the repos, publish and promote to all LCEs from 2b).
3. Assign the Capsule with Library and all the LCEs from 2b).
4. Sync the Capsule.
5. Prepare one more CV with only one repo added, publish and promote to all LCEs.
6. Sync the Capsule, check Dynflow and measure the sync time.

Result:
On the fixed instance the sync task took 21.6 seconds, out of which 5.95 seconds was consumed by Actions::Pulp3::Orchestration::Repository::RefreshRepos subtask and only repos from point 5 were refreshed.
On the unfixed instance the sync task took 353 seconds, out of which 340 seconds was consumed by Actions::Pulp3::Orchestration::Repository::RefreshRepos subtask and all repos were refreshed.

The improvement of the sync time in this scenario is huge - 57x faster ReposRefresh and 16x faster the whole sync. For even bigger setup with more repos the impact will be even higher.

Comment 11 errata-xmlrpc 2024-04-23 17:15:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.15.0 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:2010


Note You need to log in before you can comment on or make changes to this bug.