Bug 670532 - Performance Issue with re-sync of f14-updates
Summary: Performance Issue with re-sync of f14-updates
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Pulp
Classification: Retired
Component: z_other
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: John Matthews
QA Contact: Preethi Thomas
URL:
Whiteboard:
Depends On:
Blocks: verified-to-close
TreeView+ depends on / blocked
 
Reported: 2011-01-18 15:42 UTC by John Matthews
Modified: 2011-08-16 12:21 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-16 12:08:25 UTC
Embargoed:


Attachments (Terms of Use)
/var/log/pulp/pulp.log (566.12 KB, text/plain)
2011-01-18 15:43 UTC, John Matthews
no flags Details

Description John Matthews 2011-01-18 15:42:56 UTC
I ran a re-sync of f14-updates:
yum:http://ftp.linux.ncsu.edu/pub/fedora/linux//updates/14/x86_64/

My original sync was approximately 3 days old.  The re-sync download portion finished within a few minutes, yet the remove of old packages/errata and adding new errata has taken over 80 minutes and it is still on-going.

Version:
From git, close to QE version 0.126



Attaching log output with time stamps to see performance issue.

2011-01-18 09:15:09,100 [INFO][Thread-5] sync() @ repo_sync.py:387 - YumSynchronizer reported 10425 successes, 369 downloads, 0 errors
2011-01-18 09:15:17,072 [INFO][Thread-5] _process_repo_images() @ repo_sync.py:232 - No image files to import to repo..
2011-01-18 09:15:17,347 [INFO][Thread-5] add_packages_from_dir() @ repo_sync.py:205 - Loaded group info from /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64/repodata/4bec67e5c783d031f34db64006a63322b53db4c00b45aaee975b40ba8ac4aa1c-comps-f14.xml
2011-01-18 09:15:17,350 [INFO][Thread-5] add_packages_from_dir() @ repo_sync.py:220 - updateinfo is found in repomd.xml, it's path is /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64/repodata/updateinfo.xml.gz
2011-01-18 09:15:20,390 [INFO][Thread-5] _sync() @ repo.py:1172 - Sync returned 5533 packages, 1397 errata
2011-01-18 09:15:34,159 [INFO][Thread-5] _sync() @ repo.py:1183 - Removing package <SON([(u'vendor', None), (u'name', u'selinux-policy'), (u'repo_defined', True), (u'checksum', SON([(u'sha256', u'bb8902c6574c1f27e58447771a62368b745bd0f4b95f727a953851f13b5e13c3')])), (u'download_url', u'https://localhost//pulp/repos/f14_updates/selinux-policy-3.9.7-19.fc14.noarch.rpm'), (u'filename', u'selinux-policy-3.9.7-19.fc14.noarch.rpm'), (u'epoch', u'0'), (u'version', u'3.9.7'), (u'arch', u'noarch'), (u'provides', [u'selinux-policy', u'selinux-policy-devel', u'config(selinux-policy)']), (u'_ns', u'packages'), (u'release', u'19.fc14'), (u'_id', u'2510d8fe-51ee-47ac-94ec-f5b91996c6d6'), (u'requires', [u'config(selinux-policy)', u'/bin/sh', u'/usr/bin/bunzip2', u'policycoreutils', u'/bin/awk', u'm4', u'checkpolicy', u'libsemanage', u'/bin/sh', u'/bin/mktemp']), (u'id', u'2510d8fe-51ee-47ac-94ec-f5b91996c6d6'), (u'description', u'SELinux Reference Policy - modular.\nBased off of reference policy: Checked out revision  2.20091117')])> from repo <f14_updates>

.....
<snip  More in attachment>
.....
2011-01-18 10:34:50,797 [INFO][Thread-5] _sync() @ repo.py:1196 - Removing errata FEDORA-2010-18864 from repo f14_updates
2011-01-18 10:35:15,308 [INFO][Thread-5] _update_errata_packages() @ repo.py:740 - No Erratum with id: FEDORA-2010-18864 found
2011-01-18 10:35:17,954 [INFO][Thread-5] generate_updateinfo() @ updateinfo.py:141 - updateinfo.xml generated and written to file /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml
2011-01-18 10:35:18,213 [INFO][Thread-5] modify_repo() @ upload.py:111 - modifyrepo with /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml on /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64/repodata finished
2011-01-18 10:35:18,347 [INFO][Thread-5] _sync() @ repo.py:1196 - Removing errata FEDORA-2010-16349 from repo f14_updates
2011-01-18 10:35:43,048 [INFO][Thread-5] _update_errata_packages() @ repo.py:740 - No Erratum with id: FEDORA-2010-16349 found
2011-01-18 10:35:45,689 [INFO][Thread-5] generate_updateinfo() @ updateinfo.py:141 - updateinfo.xml generated and written to file /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml
2011-01-18 10:35:45,948 [INFO][Thread-5] modify_repo() @ upload.py:111 - modifyrepo with /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml on /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64/repodata finished
2011-01-18 10:35:46,089 [INFO][Thread-5] _sync() @ repo.py:1196 - Removing errata FEDORA-2010-17310 from repo f14_updates
2011-01-18 10:36:10,932 [INFO][Thread-5] _update_errata_packages() @ repo.py:740 - No Erratum with id: FEDORA-2010-17310 found
2011-01-18 10:36:13,626 [INFO][Thread-5] generate_updateinfo() @ updateinfo.py:141 - updateinfo.xml generated and written to file /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml
2011-01-18 10:36:13,885 [INFO][Thread-5] modify_repo() @ upload.py:111 - modifyrepo with /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml on /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64/repodata finished
2011-01-18 10:36:14,015 [INFO][Thread-5] _sync() @ repo.py:1196 - Removing errata FEDORA-2010-18910 from repo f14_updates
2011-01-18 10:36:38,489 [INFO][Thread-5] _update_errata_packages() @ repo.py:740 - No Erratum with id: FEDORA-2010-18910 found
2011-01-18 10:36:41,139 [INFO][Thread-5] generate_updateinfo() @ updateinfo.py:141 - updateinfo.xml generated and written to file /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml
2011-01-18 10:36:41,400 [INFO][Thread-5] modify_repo() @ upload.py:111 - modifyrepo with /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml on /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64/repodata finished
2011-01-18 10:36:41,531 [INFO][Thread-5] _sync() @ repo.py:1196 - Removing errata FEDORA-2010-18858 from repo f14_updates
2011-01-18 10:37:06,071 [INFO][Thread-5] _update_errata_packages() @ repo.py:740 - No Erratum with id: FEDORA-2010-18858 found
2011-01-18 10:37:08,715 [INFO][Thread-5] generate_updateinfo() @ updateinfo.py:141 - updateinfo.xml generated and written to file /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml
2011-01-18 10:37:08,977 [INFO][Thread-5] modify_repo() @ upload.py:111 - modifyrepo with /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml on /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64/repodata finished
2011-01-18 10:37:09,107 [INFO][Thread-5] _sync() @ repo.py:1196 - Removing errata FEDORA-2010-18453 from repo f14_updates
2011-01-18 10:37:33,548 [INFO][Thread-5] _update_errata_packages() @ repo.py:740 - No Erratum with id: FEDORA-2010-18453 found
2011-01-18 10:37:36,168 [INFO][Thread-5] generate_updateinfo() @ updateinfo.py:141 - updateinfo.xml generated and written to file /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml
2011-01-18 10:37:36,428 [INFO][Thread-5] modify_repo() @ upload.py:111 - modifyrepo with /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml on /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64/repodata finished
2011-01-18 10:37:36,556 [INFO][Thread-5] _sync() @ repo.py:1196 - Removing errata FEDORA-2010-17268 from repo f14_updates
2011-01-18 10:38:00,859 [INFO][Thread-5] _update_errata_packages() @ repo.py:740 - No Erratum with id: FEDORA-2010-17268 found
2011-01-18 10:38:03,472 [INFO][Thread-5] generate_updateinfo() @ updateinfo.py:141 - updateinfo.xml generated and written to file /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml
2011-01-18 10:38:03,731 [INFO][Thread-5] modify_repo() @ upload.py:111 - modifyrepo with /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml on /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64/repodata finished
2011-01-18 10:38:03,861 [INFO][Thread-5] _sync() @ repo.py:1196 - Removing errata FEDORA-2010-19179 from repo f14_updates
2011-01-18 10:38:28,393 [INFO][Thread-5] _update_errata_packages() @ repo.py:740 - No Erratum with id: FEDORA-2010-19179 found
2011-01-18 10:38:31,048 [INFO][Thread-5] generate_updateinfo() @ updateinfo.py:141 - updateinfo.xml generated and written to file /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml
2011-01-18 10:38:31,308 [INFO][Thread-5] modify_repo() @ upload.py:111 - modifyrepo with /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml on /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64/repodata finished
2011-01-18 10:38:31,451 [INFO][Thread-5] _sync() @ repo.py:1196 - Removing errata FEDORA-2010-18292 from repo f14_updates
2011-01-18 10:38:55,881 [INFO][Thread-5] _update_errata_packages() @ repo.py:740 - No Erratum with id: FEDORA-2010-18292 found
2011-01-18 10:38:58,506 [INFO][Thread-5] generate_updateinfo() @ updateinfo.py:141 - updateinfo.xml generated and written to file /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml
2011-01-18 10:38:58,765 [INFO][Thread-5] modify_repo() @ upload.py:111 - modifyrepo with /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml on /var/lib/pulp/repos/pub/fedora/linux//updates/14/x86_64/repodata finished
2011-01-18 10:38:58,924 [INFO][Thread-5] _sync() @ repo.py:1196 - Removing errata FEDORA-2010-17539 from repo f14_updates

Comment 1 John Matthews 2011-01-18 15:43:44 UTC
Created attachment 474081 [details]
/var/log/pulp/pulp.log

Comment 2 John Matthews 2011-01-18 16:49:29 UTC
For sync in comment #0, after download packages was completed, it took 90 minutes to finish sync operations.

Comment 3 John Matthews 2011-01-25 22:01:55 UTC
Looking at resyncs of local channels
Example  rhel-i386-server-5 resync takes ~20 mins.

20 mins local sync rhel-i386-server-5
 - 10 mins md5sum check of ~7100 packages
 - 5 mins spent determining old packages to remove
 - 3 mins spent errata processing

Comment 4 John Matthews 2011-01-28 14:57:39 UTC
Same resync as of comment #3 now completes in under 13 minutes.

Changes made were to reduce the time for:
 - Determining old packages and old errata


Sync of f14-testing as noted in comment #0 completed in under 7 minutes.

QE: 
When testing this bug I would recommend you sync Fedora 14 Testing, then wait two days and re-sync it.  You should see several packages/errata have been updated.  

Previously it would take roughly 30 seconds to process each package to remove.  If you look at the logs you can see how many packages/errata are old and being removed.  This is also communicated back to the user through the foreground sync status as text saying: "Removing 95 old packages".

[INFO][Thread-5] _sync() @ repo.py:1195 - Sync returned 1118 packages, 397 errata in 346.108876944 seconds

[INFO][Thread-5] _sync() @ repo.py:1206 - 95 old packages to process, 92 new packages to process
...
...
...
 [INFO][Thread-5] _sync() @ repo.py:1220 - Examining 392 errata from repo f14_testing
 [INFO][Thread-5] _sync() @ repo.py:1224 - Removing 20 old errata from repo f14_testing

Comment 5 Jay Dobies 2011-02-02 20:07:16 UTC
Fixed in 0.134.

Comment 6 Preethi Thomas 2011-03-21 13:34:08 UTC
2011-03-21 09:17:08,314 [INFO][Thread-6] _sync() @ repo.py:1423 - Sync returned 7038 packages, 1997 errata in 346.846409082 seconds
2011-03-21 09:17:10,342 [INFO][Thread-6] _sync() @ repo.py:1434 - 209 old packages to process, 410 new packages to process


2011-03-21 09:25:11,362 [INFO][Thread-6] _sync() @ repo.py:1448 - Examining 1935 errata from repo f14-update
2011-03-21 09:25:11,772 [INFO][Thread-6] _sync() @ repo.py:1452 - Removing 47 old errata from repo f14-update

2011-03-21 09:25:14,153 [DEBUG][Thread-6] remove_packages() @ repo.py:907 - remove_packages invoked on f14-update with no packages

Comment 7 Preethi Thomas 2011-03-21 13:43:56 UTC
verified

[root@preethi ~]# rpm -q pulp
pulp-0.0.151-1.fc14.noarch

sync seems to take around 15 minutes


Start time
2011-03-21 09:11:21,466 [INFO][Thread-6] _sync() @ repo.py:1412 - Sync of f14-update starting, skip_dict = {}


End time

2011-03-21 09:25:18,567 [INFO][Thread-6] modify_repo() @ util.py:362 - modifyrepo with /var/lib/pulp//repos/pub/fedora/linux//updates/14/x86_64//updateinfo.xml on /var/lib/pulp//repos/pub/fedora/linux//updates/14/x86_64/repodata finished
2011-03-21 09:25:18,987 [INFO][Thread-6] _sync() @ repo.py:1456 - Adding 109 new errata to repo f14-update

Comment 8 Preethi Thomas 2011-08-16 12:08:25 UTC
Closing with Community Release 15

pulp-0.0.223-4.

Comment 9 Preethi Thomas 2011-08-16 12:21:27 UTC
Closing with Community Release 15

pulp-0.0.223-4.


Note You need to log in before you can comment on or make changes to this bug.