Bug 426406 - [PATCH] analyze and vacuum sqlite databases
Summary: [PATCH] analyze and vacuum sqlite databases
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: createrepo
Version: rawhide
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Luke Macken
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-12-20 20:36 UTC by Ville Skyttä
Modified: 2016-09-20 02:38 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-10-06 13:47:01 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Analyze and vacuum created databases (518 bytes, patch)
2007-12-20 20:36 UTC, Ville Skyttä
no flags Details | Diff

Description Ville Skyttä 2007-12-20 20:36:27 UTC
This small patch makes yum-metadata-parser analyze and vacuum the databases it
creates.  Analyzing the db should make index usage more efficient, and vacuuming
it saves space.  Analyzing takes very little time and causes a small space
increase, vacuuming takes longer than analyzing.

Some numbers:

Current Fedora devel x86_64 sqlite db's:
11M     filelists.sqlite.bz2
17M     other.sqlite.bz2
7.0M    primary.sqlite.bz2
34M     total

After analyze+vacuum:
9.8M    filelists.sqlite.bz2
16M     other.sqlite.bz2
6.0M    primary.sqlite.bz2
31M     total

Creating a small repo from a typical Fedora devel mock build package set takes
slightly more time than currently (around 5% on estimate) and the end result is
slightly smaller (and should be more efficient wrt indexes, but I have no
numbers about that).

http://www.sqlite.org/lang_analyze.html
http://www.sqlite.org/lang_vacuum.html

Comment 1 Ville Skyttä 2007-12-20 20:36:27 UTC
Created attachment 290186 [details]
Analyze and vacuum created databases

Comment 2 Ville Skyttä 2008-01-23 17:15:36 UTC
Any comments?  This seems like a no-brainer to me.

Comment 3 James Antill 2008-01-23 20:05:56 UTC
 Don't believe the zodbot, lmacken is da man.

Comment 4 James Antill 2008-01-23 20:06:57 UTC
 Ok, BZ thinks I own createrepo too

Comment 5 Seth Vidal 2008-01-23 20:12:16 UTC
okay - the only question is really how much of a deal this is for a 5% time
increase to createrepo runs on the server side. Adding jkeating for his opinion.

Comment 6 Jesse Keating 2008-01-23 20:30:46 UTC
Yeah, I'd like to see what this actually does, other than saves a meg or two on
download.  Right now creation of the database files is by far the longest part
of creating repodata in the builders, so much that I've considered not making
database repodata anymore.  I'm not thrilled to see it get even longer.

Comment 7 Seth Vidal 2008-01-23 20:40:00 UTC
Not producing the database is a NON starter. Offloading all of that time onto
each and every user just to save ourselves a little time is not a good idea.


Comment 8 James Bowes 2008-01-23 20:49:28 UTC
just running some quick tests with the fedora primary and filelists sqlite dbs
(not updates) analyzed and vacuumed gives me pretty identical times for ops like:

remove glibc
install Terminal
search devel

Comment 9 Ville Skyttä 2008-01-23 21:09:10 UTC
One possible speedup for database creation could be to create various indexes
after populating tables with data instead of creating them beforehand and
running inserts to tables that have indexes enabled.

With my C skills that'd take more time than I have available right now, but I
suppose for someone more knowledgeable it would be a pretty quick thing to try out.

Comment 10 Jesse Keating 2008-01-23 21:38:52 UTC
(In reply to comment #7)
> Not producing the database is a NON starter. Offloading all of that time onto
> each and every user just to save ourselves a little time is not a good idea.
> 

I'm not talking about the repos we point people to for Fedora and updates and
such, I'm talking about the repos used internally by the buildsystem.  The
biggest hurdles to getting large sets of interrelated builds through the
buildsystem is the amount of time newRepo tasks take, and a good chunk of that
is the createrepo time.  We've made great strides to make this time faster, I
just don't want to take steps backward.  And yes, I realize this means that the
builds themselves may take slightly longer as they end up having to do some of
this, but that's a bit more spread out and easier to consume.  Anyway I haven't
turned it off yet, just idle thinking.

Comment 11 Bug Zapper 2008-05-14 04:13:28 UTC
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 12 Luke Macken 2008-11-21 17:21:08 UTC
So, it looks like we need to make a decision here.

Do we take a 5% createrepo slowdown on the serverside to yield a more efficient db, and thus a better client experience?  Sounds like a good idea to me.

Comment 13 seth vidal 2008-11-21 17:32:27 UTC
Luke: We've yet to determine the second part of the claim.

Comment 14 Luke Macken 2008-11-21 18:56:07 UTC
Well, smaller files does equate to a better client experience, but I do think knowing exactly how much of a performance gain this gives us from a sqlite perspective is also important.  According to jbowes in comment #8, there were no obvious speedups, but having some actual numbers would be nice as well.

Comment 15 Ville Skyttä 2008-11-21 19:32:35 UTC
Apologies, I should have posted this earlier:

The patch from bug 461403 already provides a part of the space savings - if nothing else, the tests I made earlier should be done again to see what the numbers are now.  The patch was applied upstream and although AFAIK there's no official upstream release with it out yet I gather it is also included as a patch in F-10+'s yum-metadata-parser.

I'll do some tests to see what the numbers are after the "delayed indexes" change and will also see if I can come up with some numbers for the analysis part.  Until then, please hold this.

Comment 16 Bug Zapper 2009-06-09 23:18:25 UTC
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 17 Ville Skyttä 2009-06-16 19:47:53 UTC
Still on my TODO list...

Comment 18 seth vidal 2009-09-24 20:16:21 UTC
ping?

Comment 19 Ville Skyttä 2009-09-24 21:01:42 UTC
Sorry, nothing new to report, I haven't found time to revisit this yet, but will get to it eventually.  Feel free to close this bug if you want, I have this on my TODO list somewhere else.

Comment 20 Ville Skyttä 2009-10-05 21:19:47 UTC
I finally got around to testing vacuum again.

With the F-11 i386 repo, vacuuming decreased bzipped db file sizes (primary+other+filelists total) about 300k (of 12M), and a bit more (but surprisingly little) for uncompressed db files (of total 59M).  Vacuuming slowed down the whole (cached) createrepo -d process by about 3% (2:45 vanilla, 2:50 w/vacuum) on my box.  I got similar results of magnitude from testing the current F-11 updates-testing repo.

So, as suspected, there's not much space savings available after the index-after-data-load changes from bug 461403 were applied.  3% slowdown isn't that bad, but the 2.5% space savings gained with it aren't anything to get too excited about either.

Feel free to close this bug if you agree.  I'll re-check ANALYZE maybe sometime later, and let you know about numbers if I find scenarios where it improves things.


Note You need to log in before you can comment on or make changes to this bug.