Bug 2251200
| Summary: | getting versions of an ansible collections does not scale | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Pavel Moravec <pmoravec> |
| Component: | Pulp | Assignee: | satellite6-bugs <satellite6-bugs> |
| Status: | CLOSED ERRATA | QA Contact: | Gaurav Talreja <gtalreja> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.13.6 | CC: | ahumbe, dalley, dkliban, ggainey, gubben, hakon.gislason, ikaur, lvrtelov, pmendezh, rchan, rlavi, shwsingh, smajumda, vijsingh, zhunting |
| Target Milestone: | 6.15.0 | Keywords: | Triaged |
| Target Release: | Unused | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | pulp-ansible-0.18.0 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-04-23 17:15:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Pavel Moravec
2023-11-23 12:17:37 UTC
Seems very relevant https://github.com/pulp/pulp_ansible/issues/1410 , maybe sufficient improvement via https://github.com/pulp/pulp_ansible/pull/1476 . >I understand there is usually no need to sync all collections and we should use Requirements to filter collections of interest, but then state in documentation (https://access.redhat.com/documentation/en-us/red_hat_satellite/6.13/html-single/managing_configurations_using_ansible_integration_in_red_hat_satellite/index#synchronizing-ansible-collections_ansible) we support only filtered content.
Usually, maybe.
In our case, synchronizing individual collections and/or versions is not an option. We can't maintain a list of collections users want to use or expect users to create a ticket asking for collection.xyz version 1.2.3 to be added to the mirror. Imagine doing that with RPM packages.
The majority of our systems have no internet access and rely entirely on Satellite (and other servers) for content like RPM packages, ansible collections, etc...
Even systems with internet access are set to prefer the internal mirror due to:
1) not wanting to hit rate-limits on galaxy.ansible.com
2) keeping content installation (rpm/collections/etc..) as fast as possible.
(think: ci/cd pipelines using containers that may execute multiple 'ansible-galaxy collection install' in each job for each stage of the pipeline, resulting in 10+ 'ansible-galaxy collection install community.general' per pipeline run. Expectations would be that it should take seconds, a similar amount of time like 'dnf install htop', not minutes)
Our ansible lint/test/deploy pipelines would take 10 minutes in total (2 minutes to lint, 8 minutes to test+deploy) before hitting this issue, but now take 45-60 minutes.
These bits from #1476 will *definitely* help in this bugzilla report: https://github.com/pulp/pulp_ansible/blame/main/pulp_ansible/app/galaxy/v3/views.py#L943-L955 : queryset = queryset.only( "pk", "content_ptr_id", "marks", "namespace", "name", "version", "pulp_created", "pulp_last_updated", "requires_ansible", "collection", ) limiting the queryset to just some relevant fields. Since at the end, the API response contains a few fields per an item, not whole `ansible_collectionversion` record at all (which is huge). (In reply to Pavel Moravec from comment #3) > These bits from #1476 will *definitely* help in this bugzilla report: > > https://github.com/pulp/pulp_ansible/blame/main/pulp_ansible/app/galaxy/v3/ > views.py#L943-L955 : > > queryset = queryset.only( > "pk", > "content_ptr_id", > "marks", > "namespace", > "name", > "version", > "pulp_created", > "pulp_last_updated", > "requires_ansible", > "collection", > ) > > limiting the queryset to just some relevant fields. Since at the end, the > API response contains a few fields per an item, not whole > `ansible_collectionversion` record at all (which is huge). Just plain adding this to /usr/lib/python3.9/site-packages/pulp_ansible/app/galaxy/v3/views.py : def list(self, request, *args, **kwargs): """ Returns paginated CollectionVersions list. """ import logging log = logging.getLogger(__name__) log.info(f"PavelM: collection list, args={args}") queryset = self.filter_queryset(self.get_queryset()) queryset = queryset.only( "pk", "content_ptr_id", "marks", "namespace", "name", "version", "pulp_created", "pulp_last_updated", "requires_ansible", "collection", ) log.info(f"PavelM: collection list, args={args} queryset={queryset}") queryset = sorted( queryset, key=lambda obj: semantic_version.Version(obj.version), reverse=True ) log.info(f"PavelM: collection list, sorted") does drastically help: from 22s to 0.38s, from 3GB memory to no memory increase spotted. (this change alone can have harmful impact to other requests so I discourage to patching a system with it, though, it is just an observation where the culprit is) maybe this one is related: https://bugzilla.redhat.com/show_bug.cgi?id=2247864 > maybe this one is related: https://bugzilla.redhat.com/show_bug.cgi?id=2247864 I don't see how that would be related @gerrod Do you think that https://github.com/pulp/pulp_ansible/pull/1476/ covers this BZ fully? Are there other recent optimization PRs that may also help? And is it possible to backport them, if so? Yes, https://github.com/pulp/pulp_ansible/pull/1476/ covers the majority of this BZ. Maybe some extra optimizations on the pulp content api could cover the first part on the katello index failing (they could also use the `fields` option to limit which fields they are bringing in). What version would this need to be backported to? #1476 has a migration in it so it is not fully backportable. It's reported against Satellite 6.13, so I believe the backport would be needed to pulp_ansible 0.15 and 0.16. Pavel mentioned that cherrypicking a small portion of the patch helps some queries and hurts others - hopefully it is possible to get the speedup without doing any harm? I'm not sure, maybe that relies on the migration. (In reply to Daniel Alley from comment #9) > It's reported against Satellite 6.13, so I believe the backport would be > needed to pulp_ansible 0.15 and 0.16. > > Pavel mentioned that cherrypicking a small portion of the patch helps some > queries and hurts others - hopefully it is possible to get the speedup > without doing any harm? I'm not sure, maybe that relies on the migration. Cherrypickying the "queryset = queryset.only( .." bit might hurt elsewhere, if the current downstream code relies on some other fields outside the projection. I expect it is a false suspicion, but somebody more knowledgable of the code should bless it - I have seen pulp ansible code last week for the first time :). I'm marking this as POST because it ought to be fixed in Stream / future-6.15 currently. However we will likely still want to backport, and it might be a somewhat labor intensive backport as we can't backport the migration. So @pmoravec I'm not sure if we want a separate BZ filed for that, or if we can just do standard clones? (In reply to Daniel Alley from comment #11) > I'm marking this as POST because it ought to be fixed in Stream / > future-6.15 currently. However we will likely still want to backport, and > it might be a somewhat labor intensive backport as we can't backport the > migration. So @pmoravec I'm not sure if we want a separate BZ > filed for that, or if we can just do standard clones? I would vote for regular z-stream clones and limited backport there. I.e. backport just some part(s) of the big upstream fix, without the migration. Maybe just the https://bugzilla.redhat.com/show_bug.cgi?id=2251200#c3 is sufficient to backport, *if* they are really safe. Verified. Tested on Satellite 6.15.0 Snap 9.0 Version: rubygem-pulp_ansible_client-0.20.2-1.el8sat.noarch Steps: 1. Sync repo of type ansible collection, with upstream URL https://galaxy.ansible.com/api/ or https://old-galaxy.ansible.com/api/ 2. Monitor sidekiq + pulp's gunicorn memory usage using ps command below and check monitor_ps.log # nohup bash -c 'while true; do date; ps aux | sort -nk6 | tail; sleep 10; done' > monitor_ps.log 2>&1 & Observation: To reproduce and verify difference with 6.14.2, syncing task with 6.14 took 4+ hours and in 6.15 it took ~1 hour to complete, and for memory usage, sidekiq process took >~5GB (5662096) memory and pulp gunicorn process took ~1.3G (1337144) memory, where as for 6.15.0 sidekiq process took ~0.78G (781636) and pulp gunicorn isn't captured in monitor_ps.log guessing due to sort -nk6, which looks excellent optimization and reduction in memory usage. And, tested the mentioned endpoints to check the response time for pulp_ansible with fix and it also show great reduction in response time, # time curl -L -k 'https://localhost/pulp_ansible/galaxy/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/api/v3/collections/community/general/versions/?limit=100&offset=0' {"meta":{"count":0},"links":{"first":"/pulp_ansible/galaxy/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/api/v3/plugin/ansible/content/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/collections/index/community/general/versions/?limit=100&offset=0","previous":null,"next":null,"last":"/pulp_ansible/galaxy/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/api/v3/plugin/ansible/content/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/collections/index/community/general/versions/?limit=100&offset=0"},"data":[]} real 0m0.103s user 0m0.011s sys 0m0.005s # time curl -L -k 'https://localhost/pulp_ansible/galaxy/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/api/v3/collections/community/general/versions/?limit=1' {"meta":{"count":0},"links":{"first":"/pulp_ansible/galaxy/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/api/v3/plugin/ansible/content/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/collections/index/community/general/versions/?limit=1&offset=0","previous":null,"next":null,"last":"/pulp_ansible/galaxy/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/api/v3/plugin/ansible/content/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/collections/index/community/general/versions/?limit=1&offset=0"},"data":[]} real 0m0.147s user 0m0.007s sys 0m0.009s # time curl -L -k 'https://localhost/pulp_ansible/galaxy/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/api/v3/collections/community/general/versions/?limit=1&ordering=is_highest' {"meta":{"count":0},"links":{"first":"/pulp_ansible/galaxy/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/api/v3/plugin/ansible/content/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/collections/index/community/general/versions/?limit=1&offset=0&ordering=is_highest","previous":null,"next":null,"last":"/pulp_ansible/galaxy/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/api/v3/plugin/ansible/content/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/collections/index/community/general/versions/?limit=1&offset=0&ordering=is_highest"},"data":[]} real 0m0.083s user 0m0.010s sys 0m0.007s # time curl -L -k 'https://localhost/pulp_ansible/galaxy/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/api/v3/collections/community/general/versions/?limit=1&is_highest=true' {"meta":{"count":0},"links":{"first":"/pulp_ansible/galaxy/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/api/v3/plugin/ansible/content/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/collections/index/community/general/versions/?is_highest=true&limit=1&offset=0","previous":null,"next":null,"last":"/pulp_ansible/galaxy/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/api/v3/plugin/ansible/content/Default_Organization/Library/custom/test_ansible_product/test_ansible_repo/collections/index/community/general/versions/?is_highest=true&limit=1&offset=0"},"data":[]} real 0m0.073s user 0m0.010s sys 0m0. Also, error mentioned by Imaan, is related to upstream issue https://issues.redhat.com/browse/AAH-2836, and here it suggests to use old-galaxy https://old-galaxy.ansible.com/api/ as workaround, so we've used old-galaxy for above test. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Satellite 6.15.0 release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:2010 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |