Bug 995528
Summary: | mongo AutoReference SON manipulator reduces performance on large result sets | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Pulp | Reporter: | Jeff Ortel <jortel> | ||||
Component: | z_other | Assignee: | Barnaby Court <bcourt> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Preethi Thomas <pthomas> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 2.2 Beta | CC: | bcourt | ||||
Target Milestone: | --- | Keywords: | Triaged | ||||
Target Release: | 2.3.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-12-09 14:31:50 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Just learned that Katello folks routinely work with repositories containing 30k+ packages (Yes, the CDN has repositories that big). So, to put this in perspective, just to query the units in a repository that big will take ~92 seconds vs ~16 seconds (using really lazy math). This affects every distributor and heavily impacts node sync. Pull request: https://github.com/pulp/pulp/pull/619 build: 2.3.0-0.14.alpha verified [root@hp-sl2x160zg6-01 ~]# pulp-admin repo list +----------------------------------------------------------------------+ Repositories +----------------------------------------------------------------------+ Id: rhel6-4 Display Name: rhel6-4 Description: None Content Unit Counts: Distribution: 1 Erratum: 1956 Package Category: 10 Package Group: 202 Rpm: 11003 Yum Repo Metadata File: 1 [root@hp-sl2x160zg6-01 ~]# python benchmark.py duration: 9.683 (seconds) [root@hp-sl2x160zg6-01 ~]# <jortel> I only had 3178 and it took my little VM 2.2 seconds. on my same VM, it would have taken 7.616 seconds to fetch 11,003 rpms. seems about right. <jortel> without the fix, it would have taken ~90 seconds. Pulp 2.3 released. |
Created attachment 784902 [details] benchmarking script Description of problem: While benchmarking the retrieval of all content units associated to a repository, I discovered that cursor iteration was very slow. Simply fetching ALL of the units from the units_rpm collection took 9.2 seconds just iterate the cursor which had a result set of 3178 documents. Using a hand created connection (not using pulp.server.db.connection) the same thing took, ~1.6 seconds. And using the mongo CLI, ~900 ms. The performance decrease seems to be related to the AutoReference mongo SON manipulator. When commenting out: _DATABASE.add_son_manipulator(AutoReference(_DATABASE)) The performance increased to 2.2 seconds. I don't know what pulp uses the AutoReference for though suspect it has to do with the REST API. But, for plugins fetching large result sets, this represents a pretty significant performance impact. For example, for node operations such as publishing and syncing, it means a difference of 30 seconds, vs 4 seconds to iterate the cursor(s) when fetching the content units associated with a repository. I attached the script used to do the benchmark.