Bug 673535 - Performance issue with repo.py::_get_existing_repo()
Summary: Performance issue with repo.py::_get_existing_repo()
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Pulp
Classification: Retired
Component: z_other
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: John Matthews
QA Contact: Preethi Thomas
URL:
Whiteboard:
Depends On:
Blocks: verified-to-close
TreeView+ depends on / blocked
 
Reported: 2011-01-28 15:50 UTC by John Matthews
Modified: 2011-08-16 12:22 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2011-08-16 12:10:51 UTC
Embargoed:


Attachments (Terms of Use)

Description John Matthews 2011-01-28 15:50:03 UTC
While debugging some performance issues I saw that refreshing a repo object takes a significant amount of time, the more packages in the repo document the longer it takes.

For some rough numbers look at:
pulp/playpen/perf/time_repos.py

This will run "_get_existing_repo()" for each repo id available in mongo.

Below are some examples, notice that for a rhel-i386-server-5 repo it takes 12 seconds to run this:

Packages: 0 Length: 707 _get_existing_repo(clone_repo500) took 0.000874042510986 seconds
Packages: 500 Length: 456182 _get_existing_repo(repo500) took 0.266913175583 seconds
Packages: 1118 Length: 3195775 _get_existing_repo(f14_testing) took 0.954846858978 seconds
Packages: 5736 Length: 8749039 _get_existing_repo(f14_updates) took 8.28494310379 seconds
Packages: 7184 Length: 10369577 _get_existing_repo(local1) took 12.5831038952 seconds

When we fetch the repo document we are fetching all the fields.  The packages field is the largest, contributing almost the entire size to the repo.  

One possible fix could be we explore running without the SON Manipulator for AutoReference.  The repo document is store in mongo with a series of package document id's, when we fetch the Repo object pymongo is automatically dereferencing each package id to the package object, so we get back a full repo document with package objects populated.

Comment 1 John Matthews 2011-01-28 15:53:55 UTC
Another note, this call is relied on heavily through out pulp.  It's often called in loops that are processing packages.

Comment 2 John Matthews 2011-02-01 20:31:49 UTC
The change chosen was to convert repo["packages"] to a list of package ids, it is no longer a dictionary of package objects.

http://git.fedorahosted.org/git/?p=pulp.git;a=commitdiff;h=f2e1506eff24394206a191d39eb7f6edd859d45d

QE: There is a script under pulp's git checkout, "playpen/perf/time_repos.py" which can be run to generate the timings in comment #0.   I would recommend syncing a large repo and running it.  You should see the _get_available_repos() call takes under a second to complete.

Comment 3 Jay Dobies 2011-02-02 20:07:09 UTC
Fixed in 0.134.

Comment 4 Preethi Thomas 2011-02-03 15:12:02 UTC
verified

[root@preethi perf]# ./time_repos.py 
time_get_available_repoids() @ time_repos.py:71 - 2 ids query for repositories took 0.00137710571289 seconds
time_get_available_repoids() @ time_repos.py:76 - 2 ids query for repositories custom took 0.0001380443573 seconds
time_get_existing_repo() @ time_repos.py:85 - All fields.  Packages: 0 Length: 554 _get_existing_repo(foo) took 0.000941038131714 seconds
time_get_existing_repo() @ time_repos.py:85 - All fields.  Packages: 5932 Length: 2239867 _get_existing_repo(f14-update) took 0.328789949417 seconds
time_get_existing_repo() @ time_repos.py:94 - Limited fields: Shortened fetch<no packages>: Length: 71 _get_existing_repo(foo) took 0.00431704521179 seconds
time_get_existing_repo() @ time_repos.py:94 - Limited fields: Shortened fetch<no packages>: Length: 120 _get_existing_repo(f14-update) took 0.000724792480469 seconds
time_package_lookups_batched() @ time_repos.py:119 - Full retrieval <batched> of Repo<554> foo and 0 packages<2> took 0.0267741680145 seconds.
time_package_lookups_batched() @ time_repos.py:119 - Full retrieval <batched> of Repo<2239867> f14-update and 5932 packages<8252543> took 2.4161260128 seconds.
[root@preethi perf]# 

[root@preethi perf]# rpm -q pulp
pulp-0.0.134-1.fc14.noarch

Comment 5 Preethi Thomas 2011-08-16 12:10:51 UTC
Closing with Community Release 15

pulp-0.0.223-4.

Comment 6 Preethi Thomas 2011-08-16 12:22:37 UTC
Closing with Community Release 15

pulp-0.0.223-4.


Note You need to log in before you can comment on or make changes to this bug.