This small patch makes yum-metadata-parser analyze and vacuum the databases it creates. Analyzing the db should make index usage more efficient, and vacuuming it saves space. Analyzing takes very little time and causes a small space increase, vacuuming takes longer than analyzing. Some numbers: Current Fedora devel x86_64 sqlite db's: 11M filelists.sqlite.bz2 17M other.sqlite.bz2 7.0M primary.sqlite.bz2 34M total After analyze+vacuum: 9.8M filelists.sqlite.bz2 16M other.sqlite.bz2 6.0M primary.sqlite.bz2 31M total Creating a small repo from a typical Fedora devel mock build package set takes slightly more time than currently (around 5% on estimate) and the end result is slightly smaller (and should be more efficient wrt indexes, but I have no numbers about that). http://www.sqlite.org/lang_analyze.html http://www.sqlite.org/lang_vacuum.html
Created attachment 290186 [details] Analyze and vacuum created databases
Any comments? This seems like a no-brainer to me.
Don't believe the zodbot, lmacken is da man.
Ok, BZ thinks I own createrepo too
okay - the only question is really how much of a deal this is for a 5% time increase to createrepo runs on the server side. Adding jkeating for his opinion.
Yeah, I'd like to see what this actually does, other than saves a meg or two on download. Right now creation of the database files is by far the longest part of creating repodata in the builders, so much that I've considered not making database repodata anymore. I'm not thrilled to see it get even longer.
Not producing the database is a NON starter. Offloading all of that time onto each and every user just to save ourselves a little time is not a good idea.
just running some quick tests with the fedora primary and filelists sqlite dbs (not updates) analyzed and vacuumed gives me pretty identical times for ops like: remove glibc install Terminal search devel
One possible speedup for database creation could be to create various indexes after populating tables with data instead of creating them beforehand and running inserts to tables that have indexes enabled. With my C skills that'd take more time than I have available right now, but I suppose for someone more knowledgeable it would be a pretty quick thing to try out.
(In reply to comment #7) > Not producing the database is a NON starter. Offloading all of that time onto > each and every user just to save ourselves a little time is not a good idea. > I'm not talking about the repos we point people to for Fedora and updates and such, I'm talking about the repos used internally by the buildsystem. The biggest hurdles to getting large sets of interrelated builds through the buildsystem is the amount of time newRepo tasks take, and a good chunk of that is the createrepo time. We've made great strides to make this time faster, I just don't want to take steps backward. And yes, I realize this means that the builds themselves may take slightly longer as they end up having to do some of this, but that's a bit more spread out and easier to consume. Anyway I haven't turned it off yet, just idle thinking.
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
So, it looks like we need to make a decision here. Do we take a 5% createrepo slowdown on the serverside to yield a more efficient db, and thus a better client experience? Sounds like a good idea to me.
Luke: We've yet to determine the second part of the claim.
Well, smaller files does equate to a better client experience, but I do think knowing exactly how much of a performance gain this gives us from a sqlite perspective is also important. According to jbowes in comment #8, there were no obvious speedups, but having some actual numbers would be nice as well.
Apologies, I should have posted this earlier: The patch from bug 461403 already provides a part of the space savings - if nothing else, the tests I made earlier should be done again to see what the numbers are now. The patch was applied upstream and although AFAIK there's no official upstream release with it out yet I gather it is also included as a patch in F-10+'s yum-metadata-parser. I'll do some tests to see what the numbers are after the "delayed indexes" change and will also see if I can come up with some numbers for the analysis part. Until then, please hold this.
This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Still on my TODO list...
ping?
Sorry, nothing new to report, I haven't found time to revisit this yet, but will get to it eventually. Feel free to close this bug if you want, I have this on my TODO list somewhere else.
I finally got around to testing vacuum again. With the F-11 i386 repo, vacuuming decreased bzipped db file sizes (primary+other+filelists total) about 300k (of 12M), and a bit more (but surprisingly little) for uncompressed db files (of total 59M). Vacuuming slowed down the whole (cached) createrepo -d process by about 3% (2:45 vanilla, 2:50 w/vacuum) on my box. I got similar results of magnitude from testing the current F-11 updates-testing repo. So, as suspected, there's not much space savings available after the index-after-data-load changes from bug 461403 were applied. 3% slowdown isn't that bad, but the 2.5% space savings gained with it aren't anything to get too excited about either. Feel free to close this bug if you agree. I'll re-check ANALYZE maybe sometime later, and let you know about numbers if I find scenarios where it improves things.