Bug 461403 - Index tables after loading data
Index tables after loading data
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: yum-metadata-parser (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Seth Vidal
Fedora Extras Quality Assurance
: Patch
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-07 06:51 EDT by Ville Skyttä
Modified: 2014-01-21 18:06 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-09-10 16:29:08 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Index tables after loading data (12.63 KB, patch)
2008-09-07 06:51 EDT, Ville Skyttä
no flags Details | Diff
Create indexes with IF NOT EXISTS (3.78 KB, patch)
2008-09-07 06:55 EDT, Ville Skyttä
no flags Details | Diff

  None (edit)
Description Ville Skyttä 2008-09-07 06:51:47 EDT
Created attachment 315962 [details]
Index tables after loading data

The attached patch makes yum-metadata-parser index SQLite tables after loading them - inserting into indexed tables is slower than inserting to unindexed ones and later indexing them.  The diff is a bit hard to read, but it just moves index creation out of create table functions into ones of their own, there are no other changes.

This provides a nice speedup, below are results on my x86_64 AMD64 3200+, 2G RAM, F-9 box when doing createrepos on a couple of days old Rawhide i386 repo, taken from createrepo -v, customized so that it additionally outputs timestamps for the SQLite db operations only (w/o bzip2 and other things done after the db is created).  Values below are Xs / Ys so that X includes the bzip2 and friends part of createrepo, Y only the db operations part.

Unpatched:

- other:      8s /  2s
- filelists: 48s / 32s
- primary:   17s /  9s
= total:     73s / 43s

Patched:

- other:      8s /  2s (no changes)
- filelists: 29s / 12s (much faster)
- primary:   14s /  6s (somewhat faster)
= total:     51s / 20s (quite a bit faster)

As an additional bonus, the patched version creates somewhat smaller databases:

Unpatched:

11M     repodata/filelists.sqlite.bz2
3.7M    repodata/other.sqlite.bz2
7.0M    repodata/primary.sqlite.bz2

Patched:

11M     repodata/filelists.sqlite.bz2
3.5M    repodata/other.sqlite.bz2
6.4M    repodata/primary.sqlite.bz2
Comment 1 Ville Skyttä 2008-09-07 06:55:26 EDT
Created attachment 315963 [details]
Create indexes with IF NOT EXISTS

Companion patch to be applied on top of the previous one: I'm not sure if there's a scenario in which indexes would already exist after applying the previous patch, but I suppose creating them with IF NOT EXISTS does not hurt in any case.
Comment 2 seth vidal 2008-09-08 11:45:18 EDT
Just to make sure I understand this - does this mean the indexes will get created on the client machine? Doesn't that mean we're offloading all the time onto the user?
Comment 3 seth vidal 2008-09-08 11:45:53 EDT
Or are you just making the indexes in the repo-side AFTER all the other data has been inserted first?
Comment 4 Ville Skyttä 2008-09-08 16:53:20 EDT
(In reply to comment #3)
> Or are you just making the indexes in the repo-side AFTER all the other data
> has been inserted first?

Yes.
Comment 5 seth vidal 2008-09-10 16:29:08 EDT
Committed to upstream, thanks.

Note You need to log in before you can comment on or make changes to this bug.