Bug 953665 - copying large repo uses tons of RAM and takes too long
Summary: copying large repo uses tons of RAM and takes too long
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Pulp
Classification: Retired
Component: rpm-support
Version: 2.1 Beta
Hardware: Unspecified
OS: Unspecified
urgent
unspecified
Target Milestone: ---
: 2.1.1
Assignee: Michael Hrivnak
QA Contact: Preethi Thomas
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-04-18 20:48 UTC by Michael Hrivnak
Modified: 2013-05-08 14:08 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-05-08 14:08:20 UTC
Embargoed:


Attachments (Terms of Use)

Description Michael Hrivnak 2013-04-18 20:48:42 UTC
Copying a large repo, like a RHEL6 repo, can easily consume gigabytes of RAM. The latest RHEL6 repo as of today required 4.2GB of RAM to copy just the RPMs into a new repo. This is because when the platform loads the list of units to copy, it doesn't limit the fields, so we end up with the entire repo's metadata in RAM. The same info was being loaded by the yum importer for dep solving purposes.

Furthermore, copying groups takes potentially more time than the original sync. Copying the groups only from a RHEL6 repo took about 35 minutes. This is because for each group, it is making new database queries to try associating that group's member packages, and this process is slow.

Comment 2 Jeff Ortel 2013-04-19 21:25:17 UTC
build: 2.1.1-0.5.beta

Comment 3 Preethi Thomas 2013-04-23 14:01:24 UTC
verified



[root@preethi ~]# rpm -q pulp-server
pulp-server-2.1.1-0.5.beta.fc17.noarch
[root@preethi ~]# 


[root@preethi ~]# pulp-admin rpm repo copy category --copy-children false --from-repo-id copy-rhel --to-repo-id copy-category
Progress on this task can be viewed using the commands under "repo tasks".


2013-04-23 09:16:11,057 pulp.plugins.yum_importer.importer:INFO: Importing 10 units from copy-rhel to copy-category
2013-04-23 09:16:11,119 pulp.server.dispatch.task:INFO: SUCCESS: Task a9e32625-8703-4509-98e8-107341755aae: CallRequest: RepoUnitAssociationManager.associate_from_repo('copy-rhel', u'copy-category', import_config_override={'copy_children': False}, criteria={'unit_sort': None, 'association_filters': {}, '_id': ObjectId('5176899a758cc92f5f009ce2'), 'remove_duplicates': False, 'skip': None, 'association_fields': None, 'unit_filters': {'_id': {'$in': [u'00ade3d9-6c43-4cf6-8f39-69a6cbe69757', u'5f0d54fe-d9c6-4e92-ad8f-3d27ff318c29', u'fd158eff-f5cf-4407-a027-8862d2b9eb4b', u'03aff6e1-0395-4ef9-a5d6-f679f1e90d4a', u'9162f223-09eb-4d33-a543-dd43ce392b6d', u'df5f23ff-3e75-41e1-a634-f946d7a5545a', u'5124b5cb-75a1-464e-ab20-b70b2e588806', u'2047bc12-97db-4934-bbb3-dbeb9ae61454', u'ae9a0a59-f81b-4475-ae46-bff158101f4a', u'd3fc5b4e-80f5-4ccb-b9f3-ccc55f1f28c4']}}, 'association_sort': None, 'unit_fields': None, 'limit': None, 'type_ids': ['package_category'], 'id': '5176899a758cc92f5f009ce2'})


[root@preethi ~]# pulp-admin rpm repo content errata --repo-id copy-errata
[root@preethi ~]# 
[root@preethi ~]# time pulp-admin rpm repo copy errata --copy-children false --from-repo-id copy-rhel --to-repo-id copy-errata
Progress on this task can be viewed using the commands under "repo tasks".


real	0m0.503s
user	0m0.339s
sys	0m0.057s
[root@preethi ~]# pulp-admin repo tasks
Usage: pulp-admin [SUB_SECTION, ..] COMMAND
Description: list and cancel tasks related to a specific repository

Available Commands:
  cancel  - cancel one or more tasks
  details - displays more detailed information about a specific task
  list    - lists tasks queued or running in the server
[root@preethi ~]# pulp-admin repo tasks list
Command: list
Description: lists tasks queued or running in the server

Available Arguments:

  --repo-id - (required) identifies the repository to display
The following options are required but were not specified:
  --repo-id
[root@preethi ~]# pulp-admin repo tasks list --repo-id copy-errata
+----------------------------------------------------------------------+
                                 Tasks
+----------------------------------------------------------------------+

Operations:  associate
Resources:   copy-errata (repository), copy-rhel (repository)
State:       Successful
Start Time:  2013-04-23T13:23:31Z
Finish Time: 2013-04-23T13:23:40Z
Result:      N/A
Task Id:     5796cca3-1ffb-480e-bdbe-26e7450af2d7


[root@preethi ~]# time pulp-admin rpm repo copy group --copy-children false --from-repo-id copy-rhel --to-repo-id copy-group 
Progress on this task can be viewed using the commands under "repo tasks".


real	0m0.507s
user	0m0.338s
sys	0m0.063s
[root@preethi ~]#

Comment 4 Preethi Thomas 2013-05-08 14:08:20 UTC
2.1.1 released


Note You need to log in before you can comment on or make changes to this bug.