Bug 1812031
Summary: | Improve regenerate applicability tasks performance by querying NEVRA only data from repo_content_units | |||
---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Pavel Moravec <pmoravec> | |
Component: | Pulp | Assignee: | satellite6-bugs <satellite6-bugs> | |
Status: | CLOSED ERRATA | QA Contact: | Lai <ltran> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 6.7.0 | CC: | ahumbe, bmbouter, daviddavis, dkliban, egolov, ehelms, ggainey, hyu, ipanova, ktordeur, ltran, mmccune, phess, rchan, ttereshc, wclark | |
Target Milestone: | 6.8.0 | Keywords: | Patch, Performance, Triaged | |
Target Release: | Unused | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | pulp-rpm-2.21.2 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1832572 (view as bug list) | Environment: | ||
Last Closed: | 2020-10-27 13:00:31 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Attachments: |
Description
Pavel Moravec
2020-03-10 12:07:00 UTC
Tested few approaches on the customer data: 1) original pulp + mongo content 2) Hao's patch + index added 3) additionally, applied improved (and working) patch "query NEVRA+modular info only" (patch follows) 4) additionally, remove modularity profiles for non-RHEL8 systems (roughly speaking) 2) is in #c4 + #c6 3) is patch based on my original idea but less monkey/intrusive approach that really counts applicability (orig.patch didnt): diff -rup a/usr/lib/python2.7/site-packages/pulp/plugins/conduits/profiler.py b/usr/lib/python2.7/site-packages/pulp/plugins/conduits/profiler.py --- a/usr/lib/python2.7/site-packages/pulp/plugins/conduits/profiler.py 2020-03-11 08:58:31.212958350 +0100 +++ b/usr/lib/python2.7/site-packages/pulp/plugins/conduits/profiler.py 2020-03-11 08:57:53.226133703 +0100 @@ -34,7 +34,7 @@ class ProfilerConduit(MultipleRepoUnitsM bindings = manager.find_by_consumer(consumer_id) return [b['repo_id'] for b in bindings] - def get_repo_units(self, repo_id, content_type_id, additional_unit_fields=None): + def get_repo_units(self, repo_id, content_type_id, additional_unit_fields=None, only_unit_fields=None): """ Searches for units in the given repository with given content type and returns a plugin unit containing unit id, unit key and any additional @@ -55,7 +55,10 @@ class ProfilerConduit(MultipleRepoUnitsM """ additional_unit_fields = additional_unit_fields or [] try: - unit_key_fields = units_controller.get_unit_key_fields_for_type(content_type_id) + if only_unit_fields is None: + unit_key_fields = units_controller.get_unit_key_fields_for_type(content_type_id) + else: + unit_key_fields = only_unit_fields serializer = units_controller.get_model_serializer_for_type(content_type_id) # Query repo association manager to get all units of given type diff -rup a/usr/lib/python2.7/site-packages/pulp_rpm/plugins/profilers/yum.py b/usr/lib/python2.7/site-packages/pulp_rpm/plugins/profilers/yum.py --- a/usr/lib/python2.7/site-packages/pulp_rpm/plugins/profilers/yum.py 2020-03-11 08:58:25.396985197 +0100 +++ b/usr/lib/python2.7/site-packages/pulp_rpm/plugins/profilers/yum.py 2020-03-11 08:58:06.792071081 +0100 @@ -288,7 +288,7 @@ class YumProfiler(Profiler): # Create lookup table of available RPMs for errata applicability, find applicable RPMs # and modules. additional_unit_fields = ['is_modular'] - rpms = conduit.get_repo_units(bound_repo_id, TYPE_ID_RPM, additional_unit_fields) + rpms = conduit.get_repo_units(bound_repo_id, TYPE_ID_RPM, additional_unit_fields, NVREA_KEYS) available_rpm_nevras = {'modular': set(), 'non-modular': set()} for rpm in rpms: 4) there are consumer profiles for e.g. RHEL6 or RHEL7 for modularity, that are empty but taken into account, like: consumer_unit_profiles: { "_id" : ObjectId("5e66464fb6dd526718ac1d61"), "profile" : [ ], "_ns" : "consumer_unit_profiles", "profile_hash" : "4f53cda18c2baa0c0354bb5f9a3ecbe5ed12ab4d8e11ba873c2f11161202b945", "consumer_id" : "c3ad7949-834b-4dc5-9839-95f39c85924c", "content_type" : "modulemd", "id" : "5e66464fb6dd526718ac1d61" } see empty profile and content_type = modulemd. Removing those profiles via: db.consumer_unit_profiles.remove({'profile': []}) was my trick. Additionally, consumer_unit_profiles for nonexisting consumers (consumer_id not seen in consumers collection) can be deleted - the attached case has clean_orphaned_consumer_profiles.sh script for that. In 4), I cleaned consumer unit profiles from those two types of orphans. Testbed used on the customer data: - reg.app. of "RHEL7 software collections" repo (1243 batch applicabilities tasks invoked) - concurrently, run reg.app. of 200 consumers Results from this testbed: #tasks sum_time avg_time =============================================================== orig:regenerate_applicability_for_consumers 200 2954.68 14.7734 orig:batch_regenerate_applicability 1243 29841.8 24.0079 =============================================================== hao:regenerate_applicability_for_consumers 200 1947.58 9.73789 hao:batch_regenerate_applicability 1243 26499.1 21.3187 =============================================================== hao+pmoravec-nevra:regenerate_applicability_for_consumers 200 1682.96 8.4148 hao+pmoravec-nevra:batch_regenerate_applicability 1234 25435.2 20.612 =============================================================== hao+pmoravec-NEVRA+orphans:regenerate_applicability_for_consumers 200 1634.87 8.17437 hao+pmoravec-NEVRA+orphans:batch_regenerate_applicability 1234 25196.9 20.4189 So overall improvement: - reg.app. of consumers improved by 44% - reg.app. of the repo improved by 15.5% Created attachment 1669696 [details]
clean_orphaned_consumer_profiles.sh
The bash script to clean profiles for non-existing consumers (cf. point 4).
The Pulp upstream bug status is at NEW. Updating the external tracker on this bug. The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug. Continuing in tests from #c7 against improved Hao's patch per https://github.com/pulp/pulp_rpm/pull/1640 : cumulative results (my patch + 2 cleanups + Hao's improved): hao-improved:regenerate_applicability_for_consumers 200 1589.71 7.94857 hao-improved:batch_regenerate_applicability 1232 15331.8 12.4446 So, accumulative improvement is almost 50%, hugely due to Hao's patch. Kudos! The patches have been merged upstream. The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug. To make the HF efficient, it was confirmed one index is needed to add to mongo, as per PR https://github.com/pulp/pulp_rpm/pull/1659 : mongo pulp_database --eval "db.erratum_pkglists.createIndex( { repo_id: 1 } )" Comparing performance of reg.app. tasks with the index, then without it and after adding it again (on the HF packages over 6.5.3): =============================================================== index:regenerate_applicability_for_consumers 200 1430.48 7.15242 index:batch_regenerate_applicability 1237 12530.3 10.1296 =============================================================== NoIndex:regenerate_applicability_for_consumers 200 1729.54 8.64772 NoIndex:batch_regenerate_applicability 1243 12700.3 10.2175 =============================================================== indexAgain:regenerate_applicability_for_consumers 200 1255.74 6.2787 indexAgain:batch_regenerate_applicability 1237 12319.4 9.95908 Just using the index: - repo reg.app. (batch reg.app.) is improved by 1-3% - consumer reg.app. is improved by 17% - 27% Hotfix is available for Satellite 6.6.2 INSTALLATION INSTRUCTIONS: 1. Make a backup or snapshot of Satellite server 2. Add the following index to MongoDB to improve performance of some queries # mongo pulp_database --eval "db.erratum_pkglists.createIndex( { repo_id: 1 } )" 3. Download attached files and copy them to Satellite server: pulp-server-2.19.1.1-2.HOTFIXRHBZ1812031.el7sat.noarch.rpm pulp-rpm-plugins-2.19.1.1-3.HOTFIXRHBZ1812031.el7sat.noarch.rpm 4. Install the packages # yum update pulp-server-2.19.1.1-2.HOTFIXRHBZ1812031.el7sat.noarch.rpm pulp-rpm-plugins-2.19.1.1-3.HOTFIXRHBZ1812031.el7sat.noarch.rpm --disableplugin=foreman-protector 5. Restart pulp services (ideally when no pulp task in in progress) # for i in pulp_celerybeat pulp_resource_manager pulp_streamer pulp_workers; do service $i restart; done Created attachment 1674130 [details]
pulp-server hotfix RPM for Satellite 6.6.2
Created attachment 1674131 [details]
pulp-rpm-plugins hotfix RPM for Satellite 6.6.2
Hotfix is available for Satellite 6.5.3 INSTALLATION INSTRUCTIONS: 1. Make a backup or snapshot of Satellite server 2. Add the following index to MongoDB to improve performance of some queries # mongo pulp_database --eval "db.erratum_pkglists.createIndex( { repo_id: 1 } )" 3. Download attached files and copy them to Satellite server: pulp-server-2.18.1.1-2.HOTFIXRHBZ1812031.el7sat.noarch.rpm pulp-rpm-plugins-2.18.1.6-2.HOTFIXRHBZ1812031.el7sat.noarch.rpm 4. Install the packages # yum update pulp-server-2.18.1.1-2.HOTFIXRHBZ1812031.el7sat.noarch.rpm pulp-rpm-plugins-2.18.1.6-2.HOTFIXRHBZ1812031.el7sat.noarch.rpm --disableplugin=foreman-protector 5. Restart pulp services (ideally when no pulp task in in progress) # for i in pulp_celerybeat pulp_resource_manager pulp_streamer pulp_workers; do service $i restart; done Created attachment 1674133 [details]
pulp-server hotfix RPM for Satellite 6.5.3
Created attachment 1674134 [details]
pulp-rpm-plugins hotfix RPM for Satellite 6.5.3
Created attachment 1679713 [details]
pulp-rpm-plugins hotfix RPM for Satellite 6.7.0
Hotfix is available for Satellite 6.7.0 This replaces a previously published hotfix with an updated (.3) version of pulp-server as well as an additional index added to the database: pulp-server-2.21.0-3.HOTFIXRHBZ1812031.el7sat.noarch.rpm INSTALLATION INSTRUCTIONS: 1. Make a backup or snapshot of Satellite server 2. Add the following index to MongoDB to improve performance of some queries # mongo pulp_database --eval "db.erratum_pkglists.createIndex( { repo_id: 1 } )" # mongo pulp_database --eval "db.consumer_unit_profiles.createIndex( {"id": 1} )" 3. Download attached files and copy them to Satellite server: pulp-server-2.21.0-3.HOTFIXRHBZ1812031.el7sat.noarch.rpm pulp-rpm-plugins-2.21.0.4-2.HOTFIXRHBZ1812031.el7sat.noarch.rpm 4. Install the packages # yum update pulp-server-2.21.0-3.HOTFIXRHBZ1812031.el7sat.noarch.rpm pulp-rpm-plugins-2.21.0.4-2.HOTFIXRHBZ1812031.el7sat.noarch.rpm --disableplugin=foreman-protector 5. Restart pulp services (ideally when no pulp task in in progress) # for i in pulp_celerybeat pulp_resource_manager pulp_streamer pulp_workers; do service $i restart; done Created attachment 1682553 [details]
pulp-server hotfix RPM for Satellite 6.7.0
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug. The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug. The Pulp upstream bug status is at ON_QA. Updating the external tracker on this bug. The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug. NOTE: The hotfix to this bug delivered in 6.6.2 is also applicable to 6.6.3 as the version of pulp did not change in the delivery of 6.6.3. To apply the hotfix to this bug on a Satellite 6.6.3 system, follow the instructions here: https://bugzilla.redhat.com/show_bug.cgi?id=1812031#c17 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Satellite 6.8 release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4366 |