Red Hat Bugzilla – Bug 426406
[PATCH] analyze and vacuum sqlite databases
Last modified: 2016-09-19 22:38:20 EDT
This small patch makes yum-metadata-parser analyze and vacuum the databases it
creates. Analyzing the db should make index usage more efficient, and vacuuming
it saves space. Analyzing takes very little time and causes a small space
increase, vacuuming takes longer than analyzing.
Current Fedora devel x86_64 sqlite db's:
Creating a small repo from a typical Fedora devel mock build package set takes
slightly more time than currently (around 5% on estimate) and the end result is
slightly smaller (and should be more efficient wrt indexes, but I have no
numbers about that).
Created attachment 290186 [details]
Analyze and vacuum created databases
Any comments? This seems like a no-brainer to me.
Don't believe the zodbot, lmacken is da man.
Ok, BZ thinks I own createrepo too
okay - the only question is really how much of a deal this is for a 5% time
increase to createrepo runs on the server side. Adding jkeating for his opinion.
Yeah, I'd like to see what this actually does, other than saves a meg or two on
download. Right now creation of the database files is by far the longest part
of creating repodata in the builders, so much that I've considered not making
database repodata anymore. I'm not thrilled to see it get even longer.
Not producing the database is a NON starter. Offloading all of that time onto
each and every user just to save ourselves a little time is not a good idea.
just running some quick tests with the fedora primary and filelists sqlite dbs
(not updates) analyzed and vacuumed gives me pretty identical times for ops like:
One possible speedup for database creation could be to create various indexes
after populating tables with data instead of creating them beforehand and
running inserts to tables that have indexes enabled.
With my C skills that'd take more time than I have available right now, but I
suppose for someone more knowledgeable it would be a pretty quick thing to try out.
(In reply to comment #7)
> Not producing the database is a NON starter. Offloading all of that time onto
> each and every user just to save ourselves a little time is not a good idea.
I'm not talking about the repos we point people to for Fedora and updates and
such, I'm talking about the repos used internally by the buildsystem. The
biggest hurdles to getting large sets of interrelated builds through the
buildsystem is the amount of time newRepo tasks take, and a good chunk of that
is the createrepo time. We've made great strides to make this time faster, I
just don't want to take steps backward. And yes, I realize this means that the
builds themselves may take slightly longer as they end up having to do some of
this, but that's a bit more spread out and easier to consume. Anyway I haven't
turned it off yet, just idle thinking.
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
So, it looks like we need to make a decision here.
Do we take a 5% createrepo slowdown on the serverside to yield a more efficient db, and thus a better client experience? Sounds like a good idea to me.
Luke: We've yet to determine the second part of the claim.
Well, smaller files does equate to a better client experience, but I do think knowing exactly how much of a performance gain this gives us from a sqlite perspective is also important. According to jbowes in comment #8, there were no obvious speedups, but having some actual numbers would be nice as well.
Apologies, I should have posted this earlier:
The patch from bug 461403 already provides a part of the space savings - if nothing else, the tests I made earlier should be done again to see what the numbers are now. The patch was applied upstream and although AFAIK there's no official upstream release with it out yet I gather it is also included as a patch in F-10+'s yum-metadata-parser.
I'll do some tests to see what the numbers are after the "delayed indexes" change and will also see if I can come up with some numbers for the analysis part. Until then, please hold this.
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '9'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 9's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 9 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Still on my TODO list...
Sorry, nothing new to report, I haven't found time to revisit this yet, but will get to it eventually. Feel free to close this bug if you want, I have this on my TODO list somewhere else.
I finally got around to testing vacuum again.
With the F-11 i386 repo, vacuuming decreased bzipped db file sizes (primary+other+filelists total) about 300k (of 12M), and a bit more (but surprisingly little) for uncompressed db files (of total 59M). Vacuuming slowed down the whole (cached) createrepo -d process by about 3% (2:45 vanilla, 2:50 w/vacuum) on my box. I got similar results of magnitude from testing the current F-11 updates-testing repo.
So, as suspected, there's not much space savings available after the index-after-data-load changes from bug 461403 were applied. 3% slowdown isn't that bad, but the 2.5% space savings gained with it aren't anything to get too excited about either.
Feel free to close this bug if you agree. I'll re-check ANALYZE maybe sometime later, and let you know about numbers if I find scenarios where it improves things.