Bug 1158545 - Multiple unit_types in association call causes to fetch everything from source repo
Summary: Multiple unit_types in association call causes to fetch everything from sourc...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Pulp
Classification: Retired
Component: API/integration
Version: 2.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: pulp-bugs
QA Contact: pulp-qe-list
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-29 15:40 UTC by Tomas Kopecek
Modified: 2015-02-28 22:41 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-28 22:41:54 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Pulp Redmine 596 0 None None None Never

Description Tomas Kopecek 2014-10-29 15:40:12 UTC
When I run two almost same copy calls it has very different impact on performance. First (a) is creating correct query on all-content repo. It means it contains filters. If I call second one (b), pulp somehow leave out filters and queries everything in all-content. It fetches all and then call file by file queries. As in my repo is quite a lot of files, apache soon eats all memory (30GB) and crashes. Difference is just in querying more unit types in one call. Of course there is a workaround to split it to more calls, but in first place it is not expected behaviour.

This behaviour is present in 2.3 release, not sure if it is fixed in newer releases.

from pulp_rpm.common.ids import UNIT_KEY_RPM
filters = {'checksum': {'$in': ['abc', 'def', ghi']}, 'checksumtype': 'sha256'}
fields = list(UNIT_KEY_RPM) + ['filename', 'signature']
a = repo_unit_api.copy('all-content', 'test-rpm', type_ids = ['rpm'], filters=filters, fields=fields).response_body.task_id
b = repo_unit_api.copy('all-content', 'test-rpm', type_ids = ['rpm', 'iso'], filters=filters, fields=fields).response_body.task_id

Description: 

Packages to be added: 

Comps group:  

Default: 

Mandatory: 

Visible: 

Multi-lib: 

Need to be present for arches: 


Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Chris Duryee 2015-01-06 22:02:32 UTC
This appears to work OK for me in pulp 2.4.

My test was to create a RHEL7 repo and sync it down, then create a new "copyrepo" repo. "pulp-admin rpm repo list" had the following output:

[devel@245erratatest-net0 ~]$ pulp-admin rpm repo list  
+----------------------------------------------------------------------+
                            RPM Repositories
+----------------------------------------------------------------------+

Id:                  rhel7
Display Name:        rhel7
Description:         None
Content Unit Counts: 
  Distribution:           1
  Erratum:                167
  Package Category:       9
  Package Environment:    6
  Package Group:          70
  Rpm:                    9591
  Yum Repo Metadata File: 1

Id:                  copyrepo
Display Name:        copyrepo
Description:         None
Content Unit Counts: 



I then created a json with the following contents:

{"source_repo_id": "rhel7",
 "override_config": {},
 "criteria": {"type_ids": ["erratum", "distribution"],
              "filters": {"unit": {}}}}

I ran: curl -k -X POST -d @./copy.json "https://admin:admin@localhost/pulp/api/v2/repositories/copyrepo/actions/associate/"

This appears to have copied the correct units over:

$ pulp-admin rpm repo list
                                           
+----------------------------------------------------------------------+
                            RPM Repositories
+----------------------------------------------------------------------+

Id:                  rhel7
Display Name:        rhel7
Description:         None
Content Unit Counts: 
  Distribution:           1
  Erratum:                167
  Package Category:       9
  Package Environment:    6
  Package Group:          70
  Rpm:                    9591
  Yum Repo Metadata File: 1

Id:                  copyrepo
Display Name:        copyrepo
Description:         None
Content Unit Counts: 
  Distribution: 1
  Erratum:      167

Comment 2 Chris Duryee 2015-01-06 22:03:34 UTC
Marking as CLOSED/WORKSFORME but feel free to re-open if you hit this issue in Pulp 2.4.

Comment 3 Tomas Kopecek 2015-01-08 08:32:30 UTC
Differrence is not the result - this is really correct - but performance. Running associate per unit_type works appropriately. Running your json with will drastically influence performance. My worst case is importing few new units to repo with 150k units already. It shouldn't eat more than few kilobytes of memory, but it uses tens of gigabytes, as it is fetching everything from that repo to memory (probably to check if unit_key is already present). So problem is somewhere on ORM level which constructs many queries in this case (and later filtering it on pulp side instead of mongo side) instead of one query in first case.
Have you checked this difference? I'm not sure if it will be remarkably visible on 10000 rpms repo, but it should make a (smaller) memory consumption peek also.

Comment 4 Chris Duryee 2015-01-09 02:13:38 UTC
I was able to repro this behavior. I am working on a fix now.

Comment 5 Chris Duryee 2015-01-09 15:01:03 UTC
I am not sure if the way you set the fields is going to work as expected. It looks like you are setting "fields = list(UNIT_KEY_RPM) + ['filename', 'signature']" but then trying to associate both RPMs and ISOs.

Unfortunately, I think that if you are specifying unit fields (which is needed for memory considerations) you will need to copy items one type_id at a time.

Comment 6 Tomas Kopecek 2015-01-09 15:13:50 UTC
Doesn't it just ignore non-existent fields? If it is the base of the problem, let's close it as NOTABUG. I'm already using per unit type approach, but was thinking that we can get to lower number of queries with this.

Comment 7 Chris Duryee 2015-01-09 15:59:09 UTC
sounds good, I will close as CLOSED/NOTABUG since there is a workaround.

However, your point is still valid:) I have created a redmine task at https://pulp.plan.io/issues/105 to track the OOM issue I saw.

Thanks for the bug report!

Comment 8 Chris Duryee 2015-01-26 22:29:34 UTC
closing redmine issue and re-opening bz for now.

Triage team: there are a few places in unit copying that use lists instead of generators which is the cause of this bug.

Comment 9 Brian Bouterse 2015-02-28 22:41:54 UTC
Moved to https://pulp.plan.io/issues/596


Note You need to log in before you can comment on or make changes to this bug.