Bug 1158545
Summary: | Multiple unit_types in association call causes to fetch everything from source repo | ||
---|---|---|---|
Product: | [Retired] Pulp | Reporter: | Tomas Kopecek <tkopecek> |
Component: | API/integration | Assignee: | pulp-bugs |
Status: | CLOSED UPSTREAM | QA Contact: | pulp-qe-list |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 2.3 | CC: | cduryee, jortel, mhrivnak, tkopecek |
Target Milestone: | --- | Keywords: | Reopened, Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-02-28 22:41:54 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Tomas Kopecek
2014-10-29 15:40:12 UTC
This appears to work OK for me in pulp 2.4. My test was to create a RHEL7 repo and sync it down, then create a new "copyrepo" repo. "pulp-admin rpm repo list" had the following output: [devel@245erratatest-net0 ~]$ pulp-admin rpm repo list +----------------------------------------------------------------------+ RPM Repositories +----------------------------------------------------------------------+ Id: rhel7 Display Name: rhel7 Description: None Content Unit Counts: Distribution: 1 Erratum: 167 Package Category: 9 Package Environment: 6 Package Group: 70 Rpm: 9591 Yum Repo Metadata File: 1 Id: copyrepo Display Name: copyrepo Description: None Content Unit Counts: I then created a json with the following contents: {"source_repo_id": "rhel7", "override_config": {}, "criteria": {"type_ids": ["erratum", "distribution"], "filters": {"unit": {}}}} I ran: curl -k -X POST -d @./copy.json "https://admin:admin@localhost/pulp/api/v2/repositories/copyrepo/actions/associate/" This appears to have copied the correct units over: $ pulp-admin rpm repo list +----------------------------------------------------------------------+ RPM Repositories +----------------------------------------------------------------------+ Id: rhel7 Display Name: rhel7 Description: None Content Unit Counts: Distribution: 1 Erratum: 167 Package Category: 9 Package Environment: 6 Package Group: 70 Rpm: 9591 Yum Repo Metadata File: 1 Id: copyrepo Display Name: copyrepo Description: None Content Unit Counts: Distribution: 1 Erratum: 167 Marking as CLOSED/WORKSFORME but feel free to re-open if you hit this issue in Pulp 2.4. Differrence is not the result - this is really correct - but performance. Running associate per unit_type works appropriately. Running your json with will drastically influence performance. My worst case is importing few new units to repo with 150k units already. It shouldn't eat more than few kilobytes of memory, but it uses tens of gigabytes, as it is fetching everything from that repo to memory (probably to check if unit_key is already present). So problem is somewhere on ORM level which constructs many queries in this case (and later filtering it on pulp side instead of mongo side) instead of one query in first case. Have you checked this difference? I'm not sure if it will be remarkably visible on 10000 rpms repo, but it should make a (smaller) memory consumption peek also. I was able to repro this behavior. I am working on a fix now. I am not sure if the way you set the fields is going to work as expected. It looks like you are setting "fields = list(UNIT_KEY_RPM) + ['filename', 'signature']" but then trying to associate both RPMs and ISOs. Unfortunately, I think that if you are specifying unit fields (which is needed for memory considerations) you will need to copy items one type_id at a time. Doesn't it just ignore non-existent fields? If it is the base of the problem, let's close it as NOTABUG. I'm already using per unit type approach, but was thinking that we can get to lower number of queries with this. sounds good, I will close as CLOSED/NOTABUG since there is a workaround. However, your point is still valid:) I have created a redmine task at https://pulp.plan.io/issues/105 to track the OOM issue I saw. Thanks for the bug report! closing redmine issue and re-opening bz for now. Triage team: there are a few places in unit copying that use lists instead of generators which is the cause of this bug. Moved to https://pulp.plan.io/issues/596 |