453356 – PackageKit unbearably slow

Bug 453356 - PackageKit unbearably slow

Summary: PackageKit unbearably slow

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	PackageKit
Sub Component:
Version:	9
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Robin Norwood
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-06-30 03:27 UTC by Christopher Curran
Modified:	2014-01-21 23:03 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-09-17 08:44:18 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
test patch (7.58 KB, patch) 2008-07-02 15:58 UTC, Richard Hughes	no flags	Details \| Diff
View All

Description Christopher Curran 2008-06-30 03:27:21 UTC

Description of problem:
PackageKit suffers from system cripplingly bad performance. It barely runs or
fails to run at all on older hardware, embedded system or low power devices. It
takes several orders of magnitude longer than yum or pup on the same hardware.

Version-Release number of selected component (if applicable):
9 and rawhide

How reproducible:
Every time PackageKit is activated.

Steps to Reproduce:
1. Perform any task in packagekit or using packagekit
2. system grinds to a halt and packagekit takes up to an hour to load(via c7 1.5GHz)

  
Actual results:
PackagesKit is unusably slow in any task invloving updating, installing or
removing packages.

Expected results:
PackageKit is usable and does not take an hour to select a package from the list
and then remove it.

Additional info:
This could be fixed with a local database file of the repositories. This file
could be updated by yum or packagekit when the user needs to upgrade or load a
package instead of being dynamically generated every single time the user
performs any task. Just having a local database cache would be an improvement.

Comment 1 Richard Hughes 2008-06-30 10:01:23 UTC

I don't see why it would add any other delays compared to pup or yum -- at the
end of the day packagekitd is a very thin wrapper that doesn't do much apart
from proxy requests.

>This could be fixed with a local database file of the repositories

Seriously? You want to layer on a packagekit database, on top of a yum database,
on top of a rpm database? The problem is probably yum not using the cache or
doing crazy depsolving.

Could you provide the process output when the system is loaded using packagekit
and tell me what it's doing? Is it disk bound? Network bound? Thanks.

Comment 2 Christopher Curran 2008-07-01 00:25:41 UTC

A local database was probably the wrong wording on my part. What I mean is:
1. open a section of the Add/Remove Software gui
2. open another section
3. reopen the first section and it spends time rescanning everything.

Why? Surely it is possible to cache the data and then only rescan if it is required?

Comment 3 Richard Hughes 2008-07-02 11:04:25 UTC

No, it's not possible to cache this in PackageKit, as we have no notification
when it has changed and we need to invalidate the cache.

I've done some profiling:

list installed packages using rpms 1.04830384254 s
list available packages using yum 14.5752980709 s
filtering 9.60826873779e-05 s
sending to daemon 0.0483319759369 s

So, 93% of the time is spent in the few lines of yum matching code. Breaking
this down even further:

groupdict has key 0.0007600784 s
groupdict  0.0000259876 s
haskey has key 0.0000219345 s
groupmap  0.0000219345 s
filter  0.0000262260 s

This is done for each package in all repos, which for me is 17391 packages. So
this is 0.000856161 (0.85ms) per package multiplied by 17391 which is 14.8
seconds (higher than before as I added the profiling hooks).

So, for listings under a second (which is what users expect) we need to get the
per-package calculations down to 57us. Now do you see the problem? This is
python, and really not designed for this sort of thing. We're using two dicts to
do the lookups, but even reducing this to one dict is still 812us per package,
over by an order of magnitude.

The only way of doing this is to cheat like yum does, and just rely on the comps
groups to have included the package. We should probably be using the rpm 'Group'
as well, but this would make the match even slower still.

I'll see what I can do my inverting the groupMap and using that inside of yum
and see what performance gains I can get. I'll work on that this afternoon.

Comment 4 Richard Hughes 2008-07-02 13:03:13 UTC

Using the yum method:

Inverting the matrix takes 0.014s, creating the group lists takes 0.61s and
outputting the package list as pkgs takes 8.83s (for a group of 196 package names).

Again, taking apart the 8.83s:

Looking up in the rpmdb: 0.0001080036s
Searching in the yum database: 0.0419199467s

When we are searching, all we are doing is asking yum to return all the pkg
objects when the pkg.name matches the package name. This takes 0.04 of a second,
so that's 7.84 seconds just in the bowels of yum.

I really can't find a much quicker way to do this, short of querying the sql
database of yum directly, which is _really_ hacky.

Comment 5 Richard Hughes 2008-07-02 14:32:11 UTC

Doing a single database access takes 0.71 ms. Searching in all enabled repos
takes 3.22ms, so a lot quicker than using yum.

Some sample output:

[hughsie@hughsie-work helpers]$ ./search-group.py none games
allow-cancel	true
status	query
init:  0.00214576721191 ms
searchNevra:  375.646829605 ms
searchNevra:  42.9711341858 ms
searchNevra:  45.3989505768 ms
init:  0.827074050903 ms
SQL resolve:  3.25989723206 ms
SQL resolve:  0.318050384521 ms
SQL resolve:  0.277042388916 ms
package	available	gnome-power-manager;2.22.1-1.fc9;i386;fedora	GNOME Power Manager
emit:  0.142812728882 ms
close:  0.407218933105 ms

So we take 0.8ms to init the SQL databases, and 0.4ms to close them. Each
resolve then takes about 0.3ms. This means we can process the group list in
approx 60ms, rather than the 8800ms yum functionality.

Let me get someone to sanity check using the SQL directly before I continue
working on this, as it's a rather hacky way of accessing the data.

Comment 6 Richard Hughes 2008-07-02 15:28:12 UTC

Also, I've been told I should try doPackageLists:

for a single item:

doPackageLists:  603.208065033 ms

For three items:

doPackageLists:  1137.35294342 ms

So really, not worth considering.

Comment 7 Richard Hughes 2008-07-02 15:58:50 UTC

Created attachment 310814 [details]
test patch

The old:

[hughsie@hughsie-work helpers]$ time ./search-group.py none games
real	0m14.899s
user	0m14.520s
sys	0m0.277s

The new:

[hughsie@hughsie-work helpers]$ time ./search-group.py none games
real	0m1.050s
user	0m0.867s
sys	0m0.114s

This is using sqlite directly rather than using yum. It needs some more work,
but I'm basically happy with it.

Comment 8 Bastien Nocera 2008-07-02 16:20:26 UTC

Shouldn't that be fixed in yum directly instead?

Comment 9 Seth Vidal 2008-07-02 20:52:57 UTC

where is search-group.py. I want to see what it is doing so I can figure out
where these numbers are coming from. in the backend code it seems like you're
querying for package objects FROM package objects - so you already have what you
need.

But I might be missing what that is seeing.

thanks

Comment 10 James Antill 2008-07-15 14:04:26 UTC

 Note that searchNames() is in 3.2.17-2, as part of the SQLitesack ... and so
should solve the group-package-names => package-objects conversion quickly.

Comment 11 Richard Hughes 2008-09-17 08:44:18 UTC

Tim has switched us to using the new code and along with the new dispatcher PackageKit is much snappier in the GUI. I'll close this now, as rawhide seems pretty quick to me.

Note You need to log in before you can comment on or make changes to this bug.