Description of problem: Satellite CV publish and promote waits hours for a pulp tasks unassociate_by_criteria for docker repos, in case there are >1M docker tag units stored. The cause is, the unassociate_by_criteria task queries or deletes from units_docker_tag mongo collection units based on filter/query with repo_id and manifest_digest, like: Oct 18 07:20:38 satellite007 mongod.27017[2985]: [conn80] command pulp_database.$cmd command: delete { delete: "units_docker_tag", ordered: true, deletes: [ { q: { repo_id: "1-mycv-library-f2183c89-ff33-4d1f-a80f-545aecf105e9", manifest_digest: "sha256:ab3a77ab3a77ab3a77ab3a77ab3aa0604477deab3a773d7f81a060440cfb81d3" }, limit: 0 } ] } numYields:0 reslen:44 locks:{ Global: { acquireCount: { r: 11753, w: 11753 } }, Database: { acquireCount: { w: 11753 }, acquireWaitCount: { w: 8 }, timeAcquiringMicros: { w: 901 } }, Collection: { acquireCount: { w: 11753 } } } protocol:op_query 2150ms But the mongo has just the default index on this collection: > db.units_docker_tag.getIndices() [ { "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "pulp_database.units_docker_tag" }, { "v" : 1, "unique" : true, "key" : { "name" : 1, "repo_id" : 1, "schema_version" : 1, "manifest_type" : 1 }, "name" : "name_1_repo_id_1_schema_version_1_manifest_type_1", "ns" : "pulp_database.units_docker_tag", "background" : false } ] > Therefore, to improve performance of those pulp tasks AND consequently of CV publish/promote (that has bigger docker repos), I suggest adding relevant index. Version-Release number of selected component (if applicable): Sat6.5 (applicable to 6.6 as well) How reproducible: 100% Steps to Reproduce: 1. Populate your Satellite with big docker repo(s) with hundreds of thousands of docker tags - such that units_docker_tag collection has 1M-ish records 2. Add a docker repo to a CV and publish/promote the repo 3. Wait.. Actual results: CV publish/promote is expected to take a long time; meanwhile /var/log/messages will have mongo long queries logged like the above one Expected results: CV publish/promote in a reasonable time Additional info: (please ensure the index is present in mongo for pulp2 and in postgres in pulp3, to prevent future performance regression)
It would be great if pulp engineering provides a workaround commands how to add index to units_docker_tag collection, based on a pair of attributes repo_id and manifest_digest.
(In reply to Pavel Moravec from comment #2) > It would be great if pulp engineering provides a workaround commands how to > add index to units_docker_tag collection, based on a pair of attributes > repo_id and manifest_digest. Hi Tania, could you please provide this workaround (or fwd needinfo on who knows mongo better)?
As a workaround/improvement: The "Limit Sync Tags" for docker repositories which was released with [0] would improve the situation greatly as you would only sync in the desired tags. [0] https://access.redhat.com/errata/RHSA-2019:1222
Indeed, seems like dup of 1690070, performance much improved after an upgrade to a version with 1690070 fixed. *** This bug has been marked as a duplicate of bug 1690070 ***