Description of problem: This is an enhancement request, not a bug - where do enhancement requests go in bugzilla ? I'm sure there used to be a "enhancement" severity ? ... sqlite + berkeley DB are great tools, but the way you've deployed them for RPM they are VERY slow . If I want to find all packages with "office" in the name, for example : $ time rpm -qa '*office*' openoffice.org-writer-core-3.0.1-15.3.fc10.x86_64 ... openoffice.org-draw-core-3.0.1-15.3.fc10.x86_64 real 0m3.49s user 0m1.78s sys 0m0.19s It takes around 3.5 seconds on a 2.2GHZx2 x86_64 with 2GB RAM under FC-10 (userspace CPU frequency governor set to max cpufreq (2.2Ghz)) . Using a database system that constructs good indexes, such as AT&T Daytona ( http://www.research.att.com/~daytona/ ) : $ time ./rpmByName office openoffice.org-writer-core3.0.115.3.fc10x86_6411236669384sApplications/Productivityhttp://www.openoffice.org/Word Processor libraries for openoffice.org ... openoffice.org-draw-core3.0.115.3.fc10x86_6411236669384sApplications/Productivityhttp://www.openoffice.org/Drawing libraries for openoffice.org real 0m0.09s user 0m0.00s sys 0m0.02s That's 3.5/0.09 = 38.8 times faster ! On FC10, /var/lib/rpm is @ 83MB in size : $ du -ks /var/lib/rpm 83776 /var/lib/rpm The Daytona database ASCII text file is @ 250KB : $ ls -l RPM.db -rw-rw-r-- 1 root daytona 248930 2009-05-12 01:25 RPM.db which contains all data necessary to identify an RPM (name,version,release,epoch,arch,buildtime,source,spec,url,group,summary) plus an integer identifier that can be used as a "KEY" by which a separate eg. "RPM_File.db" can reference an RPM , for all RPMs on the system. But on my machine, one could copy all the data under /var/lib/rpm into memory 24 times over, and still have 6GB of swap left over ; as crude a method as mapping all the files under /var/lib/rpm into memory and then doing a "grep" of the process memory image could be done on my machine in less than 2 seconds. Surely there must be a better way improve RPM's lookup performance ? Even compiling a query from scratch with Daytona is quicker than running 'rpm -q' : $ time ksh -c "echo 'select ^Name^, ^Version^, ^Release^ from ^RPM^ where ( ^Name^ Matches \"rpm\" )' | DS QQ -" %msg01)delim)| %msg02) %msg03)Query File: /tmp/DS.QQ.16826 %msg04) %msg05) %msg06)recls)DS_QQ_16826 %msg07)flds)Name|Version|Release %msg08)types)STR(*)|STR(*)|STR(*) %msg09) rpm-python|4.6.0|2.fc10 redhat-rpm-config|9.0.3|3.fc10 rpm|4.6.0|2.fc10 rpm-devel|4.6.0|2.fc10 rpm-libs|4.6.0|2.fc10 rpm-build|4.6.0|2.fc10 real 0m1.51s user 0m0.81s sys 0m0.45s Of which @ 99% is compilation time : $ echo 'select ^Name^, ^Version^, ^Release^ from ^RPM^ where ( ^Name^ Matches "rpm" )' > rpm.rpms.S $ DS_Mk rpm.rpms.S rpm.rpms ... $ time ./rpm.rpms %msg01)delim)| %msg02) %msg03)Query File: /home/root/rpms.S %msg04) %msg05) %msg06)recls)RPMS %msg07)flds)Name|Version|Release %msg08)types)STR(*)|STR(*)|STR(*) %msg09) rpm-python|4.6.0|2.fc10 redhat-rpm-config|9.0.3|3.fc10 rpm|4.6.0|2.fc10 rpm-devel|4.6.0|2.fc10 rpm-libs|4.6.0|2.fc10 rpm-build|4.6.0|2.fc10 real 0m0.08s user 0m0.00s sys 0m0.02s The "rpmByName" query demonstrates Daytona's Cymbal, into which SQL can be compiled : $ cat rpmByName.cy local: STR .str RE .re; set [ .str ] = read( from _cmd_line_ but_if_absent [ ".*" ] ) ; set .re = (RE).str; for_each_time there_is_a RPM where ( Name Matches .re ) { where ( this_is_a RPM ) with_format _data_ do Describe; } Now, I'm not suggesting using a proprietary database such as AT&T's Daytona for RPM - but Daytona's RPM.db is only an ASCII text file, which is indexed using algorithms in the public domain - indeed, which have been there for over twenty years ! The "Name" index file is used for the above query automatically by Daytona - and Index files could easily be constructed for the sqlite or Berkeley DB files used by RPM - I could make some suggestions as to how this might be done if you like. The query time of @ 0.08 seconds above includes the time necessary to verify that the index files are in-sync with the data files . Consider how much CPU time globally is spent by scripts doing an 'rpm -q' ! Version-Release number of selected component (if applicable): all How reproducible: 100% Steps to Reproduce: see above Actual results: takes @ 3.5 seconds on a 2.2ghz x 2 CPU Expected results: could take < 0.1 seconds Additional info: I suggest constructing indexes for "RPM By Name", "RPM By Name-Version-Release", "RPM By Name-Version-Release-Arch", "RPM By Arch", "RPM By Release", ... etc.
Daytona & Cymbal is hardly a fair comparison. Go ahead, install software using Daytona if you wish. The usual performance complaint (your RFE comes along regular as clockwork for almost a decade), is that signatures/digests are verified when headers are read. Adding --nosignature and --nodigest will likely speed your rpm -qa by an order of magnitude. Disabling the signature check can be configured per-user/per-system/per-vendor persistently as well. You are correct that generating additional tables would improve performance, and one could generate all the variants of the tuple {N,V,R,E,A,O} N==name etc if there were need. The ability to generate additional indices is also persistently configurable (although there are limitations on the tag data boundaries). But there just aren't that many "rpm -q" scripts around (as you claim), nor does the data in an rpmdb change so rapidly that one _MUST_ query an rpmdb (with the performance penalities of signature/digest checking and lack of optimized table index optimizations). One can easily query once and do whatever optimizations one wishes, with Daytona/Cymbal or perl/awk or ... on the data retrieved.
*** This bug has been marked as a duplicate of bug 438625 ***