Bug 1169503

Summary: RPM unit types have redundant search indices specified
Product: [Retired] Pulp Reporter: Randy Barlow <rbarlow>
Component: rpm-supportAssignee: pulp-bugs
Status: CLOSED UPSTREAM QA Contact: pulp-qe-list
Severity: low Docs Contact:
Priority: medium    
Version: 2.4.0CC: skarmark
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-28 22:45:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Randy Barlow 2014-12-01 20:56:05 UTC
Description of problem:
The RPM unit type JSON file specifies the first field of the unit key as a search index on multiple types[0]. For example, RPM lists "name" as a search index and also as the first element of the unit key. This is redundant since the unit key causes a uniqueness constraint on the collection that effectively gives us a free "name" index. The harm in doing this is that MongoDB will use more RAM and will make writes slower since it has to update two indices needlessly.

Version-Release number of selected component (if applicable):
2.4.3-1

How reproducible:
Every time.

Steps to Reproduce:
1. Look at /usr/lib/pulp/plugins/types/rpm_support.json

-or-

1. Use Mongo's shell to inspect the installed indices on the various unit collections.

Actual results:
There are redundant search indices.

Expected results:
There should not be any redundant search indices.

Additional info:
[0] https://github.com/pulp/pulp_rpm/blob/2.4-release/plugins/types/rpm_support.json

Comment 1 Randy Barlow 2014-12-01 21:00:48 UTC
It may also be worth reconsidering all the fields we are indexing for searching while we fix this bug. For example, some of the units are indexing "arch", which will likely not be useful and will waste resources. The decision on what to index and what not to index can be tricky, but try to think about the balance of what would be useful to end users vs. how much RAM/write time each index costs.

Comment 2 Brian Bouterse 2015-02-28 22:45:49 UTC
Moved to https://pulp.plan.io/issues/630