Bug 2284027
| Summary: | Update Content Counts task does not scale at all | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Pavel Moravec <pmoravec> |
| Component: | Capsule - Content | Assignee: | satellite6-bugs <satellite6-bugs> |
| Status: | CLOSED MIGRATED | QA Contact: | Satellite QE Team <sat-qe-bz-list> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.15.0 | CC: | iballou, sajha, saydas, wpinheir |
| Target Milestone: | Unspecified | Keywords: | MigratedToJIRA |
| Target Release: | Unused | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-06-06 17:40:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Pavel, Has it been discovered which part of the content count updating process is taking up time? The decision to update all repositories was made because of the originating reason for the feature -- users were worried that their smart proxy repositories in Pulp were changing unexpectedly between syncs, so all counts are recalculated. The fetching of content counts from Pulp should be very lightweight. Only a count of content units is being fetched from Pulp, which should be much faster than listing actual content units since we (as far as I remember) pass a per_page limit of 1 to Pulp. We also made the content count updating asynchronous to not block current syncing tasks, but if the content count tasks are getting stuck I can see that bogging down Dynflow. If it is found that the amount of data to collect from Pulp really is too large to update with every sync, we could consider removing the automated updating of content counts and rely on users to start the content count updating via a button click on the smart proxy UI. I'd like to see first though which part of the update is taking up all the time. Is it the API call to Pulp, the database update, or perhaps something else? Or, if the middle-ground solution of updating only the repositories that are being synced is fast enough, we can go that route as well. I see - let me investigate where the slowness comes from, there can be numerous sources (e.g. dynflow workers count, pulp api workers count, psql slow, http limits, network latency,..). I see in the attached case the customer runs frequent CV promotes that trigger Caps sync that trigger these tasks - and the tasks take 20-38 minutes (my quickly prepared reproducer has 12 seconds, still redundant for "just one repo bumped" scenario). Anyway if the intent of the feature is to double-check content on Capsule of most probably untouched repos, I would expect the feature to be optional (or maybe have option "after Caps sync, check nothing OR just affected repos OR everything"?). Let see what I will come up with from my investigation.. OK, the huge durations were partially caused by overloaded single sidekiq worker (more workers helped to some extent), BUT still the feature does not scale in environments with latency between Satellite and Capsule. Assume the latency between geographically separated Sat and Capsule is 0.5s . Since one repository needs 2 requests like: GET /pulp/api/v3/repositories/rpm/rpm/?limit=2000&name=1-cv_zoo-PROD-ab7d93a3-94dc-4f1d-bf26-74a1ba5b03a7&offset=0 GET /pulp/api/v3/repositories/rpm/rpm/018fcae0-61c3-72f2-a903-c2300e5835e4/versions/1/ we are on 1s per one repo. Having just 600 repos on a Capsule, the task runs for 10 minutes. (and it does need a sidekiq worker most of the time, waiting for the responses, I *guess* - or is the waiting asynchronous as well?). We might make these "get me counts of a repo" concurrent (e.g. like Caps synces repos concurrently in batches of 100) - but that would require too complex orchestration? We should have the feature optional (or maybe with the "just for updated repos" option?). Any other idea how to improve it (esp. for deployments with many repos and/or Sat<->Caps latency)? Glad to see having more workers helped, but yeah there is certainly room for improvement. My thought overall would be to:
1) Have the automated count updating be made optional via a setting
2) Introduce a global button for refreshing counts
-> There are currently buttons for per-cv updating which update all counts.
-> These buttons were made for the expected future improvement of being able to update counts per-cv
3) Update the counts for only the repositories / CVs that are being synced to the content view
I think improvement (1) is the one that would be best for a backportable solution. Users can use the existing buttons to update the counts manually.
Improvements (2) and (3) are essentially RFEs, so they'd be better suited to a new y-release of Satellite rather than a backport.
Making the content count fetching concurrent would be more possible once we have the code in place to be able to update counts for repos / CVs alone. Multiple Dynflow actions could perhaps be dispatched to work on a different set of repos or CVs. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "SAT-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |
Description of problem: After a successful Capsule sync, Update Content Counts task is triggered. It updates content statistics of each and every repository on the Capsule. That does not scale at all. If one promotes a small CV to a LE, a new Capsule Sync is triggered (by default) which sycnhronizes to the Capsule just the repos from the CV/LE. But content stats of *all* repos are updated. Version-Release number of selected component (if applicable): Sat 6.15 How reproducible: 100% Steps to Reproduce: 1. Have multiple CVs promoted to an LV associated to a Capsule 2. Have the Capsule fully synchronized 3. Promote a new CV to the LE and let automatically sync the content to the Capsule 4. See duration of the "Update Content Counts" task. 5. Run Complete Capsule Sync, just for the sake of comparing the time of "Update Content Counts" task duration. 6. Optionally, add to app/models/katello/concerns/smart_proxy_extensions.rb one extra log (at the beginning of whole reproducer): def update_content_counts! # {:content_view_versions=>{87=>{:repositories=>{1=>{:metadata=>{},:counts=>{:rpms=>98, :module_streams=>9898}}}}} new_content_counts = { content_view_versions: {} } smart_proxy_helper = ::Katello::SmartProxyHelper.new(self) repos = smart_proxy_helper.repositories_available_to_capsule Rails.logger.info("XXX update_content_counts for Capsule #{self}: repos:#{repos.pluck(:id)}") Actual results: Duration of "Update Content Counts" task is the same in cases 4. and 5 (and is linear to # repos on Capsule, not # of repos newly synced). Optional step 6 shows that despite Capsule Sync updated just one repo, all repos are recalculated. Expected results: "Update Content Counts" task duration relevant to the size of repos just synced to the Capsule. Additional info: required fix: 1) app/lib/actions/katello/capsule_content/sync_capsule.rb : execution_plan_hooks.use :update_content_counts, :on => :success needs some enhancement to pass "repos = repos_to_sync(..) list from planning step as an argument 2) That argument needs to pass via app/lib/actions/katello/capsule_content/update_content_counts.rb task to app/models/katello/concerns/smart_proxy_extensions.rb, method update_content_counts!