Description of problem: PackageKit suffers from system cripplingly bad performance. It barely runs or fails to run at all on older hardware, embedded system or low power devices. It takes several orders of magnitude longer than yum or pup on the same hardware. Version-Release number of selected component (if applicable): 9 and rawhide How reproducible: Every time PackageKit is activated. Steps to Reproduce: 1. Perform any task in packagekit or using packagekit 2. system grinds to a halt and packagekit takes up to an hour to load(via c7 1.5GHz) Actual results: PackagesKit is unusably slow in any task invloving updating, installing or removing packages. Expected results: PackageKit is usable and does not take an hour to select a package from the list and then remove it. Additional info: This could be fixed with a local database file of the repositories. This file could be updated by yum or packagekit when the user needs to upgrade or load a package instead of being dynamically generated every single time the user performs any task. Just having a local database cache would be an improvement.
I don't see why it would add any other delays compared to pup or yum -- at the end of the day packagekitd is a very thin wrapper that doesn't do much apart from proxy requests. >This could be fixed with a local database file of the repositories Seriously? You want to layer on a packagekit database, on top of a yum database, on top of a rpm database? The problem is probably yum not using the cache or doing crazy depsolving. Could you provide the process output when the system is loaded using packagekit and tell me what it's doing? Is it disk bound? Network bound? Thanks.
A local database was probably the wrong wording on my part. What I mean is: 1. open a section of the Add/Remove Software gui 2. open another section 3. reopen the first section and it spends time rescanning everything. Why? Surely it is possible to cache the data and then only rescan if it is required?
No, it's not possible to cache this in PackageKit, as we have no notification when it has changed and we need to invalidate the cache. I've done some profiling: list installed packages using rpms 1.04830384254 s list available packages using yum 14.5752980709 s filtering 9.60826873779e-05 s sending to daemon 0.0483319759369 s So, 93% of the time is spent in the few lines of yum matching code. Breaking this down even further: groupdict has key 0.0007600784 s groupdict 0.0000259876 s haskey has key 0.0000219345 s groupmap 0.0000219345 s filter 0.0000262260 s This is done for each package in all repos, which for me is 17391 packages. So this is 0.000856161 (0.85ms) per package multiplied by 17391 which is 14.8 seconds (higher than before as I added the profiling hooks). So, for listings under a second (which is what users expect) we need to get the per-package calculations down to 57us. Now do you see the problem? This is python, and really not designed for this sort of thing. We're using two dicts to do the lookups, but even reducing this to one dict is still 812us per package, over by an order of magnitude. The only way of doing this is to cheat like yum does, and just rely on the comps groups to have included the package. We should probably be using the rpm 'Group' as well, but this would make the match even slower still. I'll see what I can do my inverting the groupMap and using that inside of yum and see what performance gains I can get. I'll work on that this afternoon.
Using the yum method: Inverting the matrix takes 0.014s, creating the group lists takes 0.61s and outputting the package list as pkgs takes 8.83s (for a group of 196 package names). Again, taking apart the 8.83s: Looking up in the rpmdb: 0.0001080036s Searching in the yum database: 0.0419199467s When we are searching, all we are doing is asking yum to return all the pkg objects when the pkg.name matches the package name. This takes 0.04 of a second, so that's 7.84 seconds just in the bowels of yum. I really can't find a much quicker way to do this, short of querying the sql database of yum directly, which is _really_ hacky.
Doing a single database access takes 0.71 ms. Searching in all enabled repos takes 3.22ms, so a lot quicker than using yum. Some sample output: [hughsie@hughsie-work helpers]$ ./search-group.py none games allow-cancel true status query init: 0.00214576721191 ms searchNevra: 375.646829605 ms searchNevra: 42.9711341858 ms searchNevra: 45.3989505768 ms init: 0.827074050903 ms SQL resolve: 3.25989723206 ms SQL resolve: 0.318050384521 ms SQL resolve: 0.277042388916 ms package available gnome-power-manager;2.22.1-1.fc9;i386;fedora GNOME Power Manager emit: 0.142812728882 ms close: 0.407218933105 ms So we take 0.8ms to init the SQL databases, and 0.4ms to close them. Each resolve then takes about 0.3ms. This means we can process the group list in approx 60ms, rather than the 8800ms yum functionality. Let me get someone to sanity check using the SQL directly before I continue working on this, as it's a rather hacky way of accessing the data.
Also, I've been told I should try doPackageLists: for a single item: doPackageLists: 603.208065033 ms For three items: doPackageLists: 1137.35294342 ms So really, not worth considering.
Created attachment 310814 [details] test patch The old: [hughsie@hughsie-work helpers]$ time ./search-group.py none games real 0m14.899s user 0m14.520s sys 0m0.277s The new: [hughsie@hughsie-work helpers]$ time ./search-group.py none games real 0m1.050s user 0m0.867s sys 0m0.114s This is using sqlite directly rather than using yum. It needs some more work, but I'm basically happy with it.
Shouldn't that be fixed in yum directly instead?
where is search-group.py. I want to see what it is doing so I can figure out where these numbers are coming from. in the backend code it seems like you're querying for package objects FROM package objects - so you already have what you need. But I might be missing what that is seeing. thanks
Note that searchNames() is in 3.2.17-2, as part of the SQLitesack ... and so should solve the group-package-names => package-objects conversion quickly.
Tim has switched us to using the new code and along with the new dispatcher PackageKit is much snappier in the GUI. I'll close this now, as rawhide seems pretty quick to me.