Bug 426406 - [PATCH] analyze and vacuum sqlite databases
[PATCH] analyze and vacuum sqlite databases
Product: Fedora
Classification: Fedora
Component: createrepo (Show other bugs)
All Linux
low Severity low
: ---
: ---
Assigned To: Luke Macken
Fedora Extras Quality Assurance
: Patch
Depends On:
  Show dependency treegraph
Reported: 2007-12-20 15:36 EST by Ville Skyttä
Modified: 2016-09-19 22:38 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-10-06 09:47:01 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Analyze and vacuum created databases (518 bytes, patch)
2007-12-20 15:36 EST, Ville Skyttä
no flags Details | Diff

  None (edit)
Description Ville Skyttä 2007-12-20 15:36:27 EST
This small patch makes yum-metadata-parser analyze and vacuum the databases it
creates.  Analyzing the db should make index usage more efficient, and vacuuming
it saves space.  Analyzing takes very little time and causes a small space
increase, vacuuming takes longer than analyzing.

Some numbers:

Current Fedora devel x86_64 sqlite db's:
11M     filelists.sqlite.bz2
17M     other.sqlite.bz2
7.0M    primary.sqlite.bz2
34M     total

After analyze+vacuum:
9.8M    filelists.sqlite.bz2
16M     other.sqlite.bz2
6.0M    primary.sqlite.bz2
31M     total

Creating a small repo from a typical Fedora devel mock build package set takes
slightly more time than currently (around 5% on estimate) and the end result is
slightly smaller (and should be more efficient wrt indexes, but I have no
numbers about that).

Comment 1 Ville Skyttä 2007-12-20 15:36:27 EST
Created attachment 290186 [details]
Analyze and vacuum created databases
Comment 2 Ville Skyttä 2008-01-23 12:15:36 EST
Any comments?  This seems like a no-brainer to me.
Comment 3 James Antill 2008-01-23 15:05:56 EST
 Don't believe the zodbot, lmacken is da man.
Comment 4 James Antill 2008-01-23 15:06:57 EST
 Ok, BZ thinks I own createrepo too
Comment 5 Seth Vidal 2008-01-23 15:12:16 EST
okay - the only question is really how much of a deal this is for a 5% time
increase to createrepo runs on the server side. Adding jkeating for his opinion.
Comment 6 Jesse Keating 2008-01-23 15:30:46 EST
Yeah, I'd like to see what this actually does, other than saves a meg or two on
download.  Right now creation of the database files is by far the longest part
of creating repodata in the builders, so much that I've considered not making
database repodata anymore.  I'm not thrilled to see it get even longer.
Comment 7 Seth Vidal 2008-01-23 15:40:00 EST
Not producing the database is a NON starter. Offloading all of that time onto
each and every user just to save ourselves a little time is not a good idea.
Comment 8 James Bowes 2008-01-23 15:49:28 EST
just running some quick tests with the fedora primary and filelists sqlite dbs
(not updates) analyzed and vacuumed gives me pretty identical times for ops like:

remove glibc
install Terminal
search devel
Comment 9 Ville Skyttä 2008-01-23 16:09:10 EST
One possible speedup for database creation could be to create various indexes
after populating tables with data instead of creating them beforehand and
running inserts to tables that have indexes enabled.

With my C skills that'd take more time than I have available right now, but I
suppose for someone more knowledgeable it would be a pretty quick thing to try out.
Comment 10 Jesse Keating 2008-01-23 16:38:52 EST
(In reply to comment #7)
> Not producing the database is a NON starter. Offloading all of that time onto
> each and every user just to save ourselves a little time is not a good idea.

I'm not talking about the repos we point people to for Fedora and updates and
such, I'm talking about the repos used internally by the buildsystem.  The
biggest hurdles to getting large sets of interrelated builds through the
buildsystem is the amount of time newRepo tasks take, and a good chunk of that
is the createrepo time.  We've made great strides to make this time faster, I
just don't want to take steps backward.  And yes, I realize this means that the
builds themselves may take slightly longer as they end up having to do some of
this, but that's a bit more spread out and easier to consume.  Anyway I haven't
turned it off yet, just idle thinking.
Comment 11 Bug Zapper 2008-05-14 00:13:28 EDT
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
Comment 12 Luke Macken 2008-11-21 12:21:08 EST
So, it looks like we need to make a decision here.

Do we take a 5% createrepo slowdown on the serverside to yield a more efficient db, and thus a better client experience?  Sounds like a good idea to me.
Comment 13 seth vidal 2008-11-21 12:32:27 EST
Luke: We've yet to determine the second part of the claim.
Comment 14 Luke Macken 2008-11-21 13:56:07 EST
Well, smaller files does equate to a better client experience, but I do think knowing exactly how much of a performance gain this gives us from a sqlite perspective is also important.  According to jbowes in comment #8, there were no obvious speedups, but having some actual numbers would be nice as well.
Comment 15 Ville Skyttä 2008-11-21 14:32:35 EST
Apologies, I should have posted this earlier:

The patch from bug 461403 already provides a part of the space savings - if nothing else, the tests I made earlier should be done again to see what the numbers are now.  The patch was applied upstream and although AFAIK there's no official upstream release with it out yet I gather it is also included as a patch in F-10+'s yum-metadata-parser.

I'll do some tests to see what the numbers are after the "delayed indexes" change and will also see if I can come up with some numbers for the analysis part.  Until then, please hold this.
Comment 16 Bug Zapper 2009-06-09 19:18:25 EDT
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
Comment 17 Ville Skyttä 2009-06-16 15:47:53 EDT
Still on my TODO list...
Comment 18 seth vidal 2009-09-24 16:16:21 EDT
Comment 19 Ville Skyttä 2009-09-24 17:01:42 EDT
Sorry, nothing new to report, I haven't found time to revisit this yet, but will get to it eventually.  Feel free to close this bug if you want, I have this on my TODO list somewhere else.
Comment 20 Ville Skyttä 2009-10-05 17:19:47 EDT
I finally got around to testing vacuum again.

With the F-11 i386 repo, vacuuming decreased bzipped db file sizes (primary+other+filelists total) about 300k (of 12M), and a bit more (but surprisingly little) for uncompressed db files (of total 59M).  Vacuuming slowed down the whole (cached) createrepo -d process by about 3% (2:45 vanilla, 2:50 w/vacuum) on my box.  I got similar results of magnitude from testing the current F-11 updates-testing repo.

So, as suspected, there's not much space savings available after the index-after-data-load changes from bug 461403 were applied.  3% slowdown isn't that bad, but the 2.5% space savings gained with it aren't anything to get too excited about either.

Feel free to close this bug if you agree.  I'll re-check ANALYZE maybe sometime later, and let you know about numbers if I find scenarios where it improves things.

Note You need to log in before you can comment on or make changes to this bug.