Bug 846199

Summary: recipes page is slow
Product: [Retired] Beaker Reporter: Raymond Mancy <rmancy>
Component: web UIAssignee: matt jia <mjia>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 0.9CC: bpeck, dcallagh, dowang, ebaak, mjia, rjoost, stl, tools-bugs
Target Milestone: 23.0Keywords: EasyFix, Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: ImplementationQuality
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-07 23:12:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raymond Mancy 2012-08-07 06:15:39 UTC
The recipes page is a bit slow, due to the rountrips needed to get util.js. Even though the server returns 304s.

Comment 1 Dan Callaghan 2012-11-02 00:44:25 UTC
I guess the real problem here is that we are referencing util.js many times unnecessarily on the recipe page, due to a widget.

Comment 5 Dan Callaghan 2016-05-16 02:57:30 UTC
Not sure if the point in comment 1 (about util.js referenced many times) is still true. I think we might have fixed that up a long time ago.

However, visiting /recipes/ in our production environemnt, or with a production database dump, is still very slow. In some circumstances I also see it fail with a MySQL error indicating /tmp was exhausted. I suspect it's because that page is issuing a query that produces a large temp table when there are many recipes.

Needs more investigation.

Comment 6 matt jia 2016-05-25 06:27:18 UTC
(In reply to Dan Callaghan from comment #5)
> Not sure if the point in comment 1 (about util.js referenced many times) is
> still true. I think we might have fixed that up a long time ago.
Yes, I think we have fixed this already as I have not seen this problem right now.

Comment 7 matt jia 2016-05-25 06:31:35 UTC
It seems like the slowness is caused by the query used for counting rows.

MariaDB [beaker]> SELECT count(*) AS count_1 
    -> FROM (SELECT recipe.id AS recipe_id, recipe.recipe_set_id AS recipe_recipe_set_id, recipe.distro_tree_id AS recipe_distro_tree_id, recipe.result AS recipe_result, recipe.status AS recipe_status, recipe.start_time AS recipe_start_time, recipe.finish_time AS recipe_finish_time, recipe._host_requires AS recipe__host_requires, recipe._distro_requires AS recipe__distro_requires, recipe.kickstart AS recipe_kickstart, recipe.type AS recipe_type, recipe.ttasks AS recipe_ttasks, recipe.ntasks AS recipe_ntasks, recipe.ptasks AS recipe_ptasks, recipe.wtasks AS recipe_wtasks, recipe.ftasks AS recipe_ftasks, recipe.ktasks AS recipe_ktasks, recipe.whiteboard AS recipe_whiteboard, recipe.ks_meta AS recipe_ks_meta, recipe.kernel_options AS recipe_kernel_options, recipe.kernel_options_post AS recipe_kernel_options_post, recipe.role AS recipe_role, recipe.panic AS recipe_panic, recipe._partitions AS recipe__partitions, recipe.autopick_random AS recipe_autopick_random, recipe.log_server AS recipe_log_server, recipe.virt_status AS recipe_virt_status 
    -> FROM recipe 
    -> WHERE recipe.type = 'machine_recipe' AND (EXISTS (SELECT 1 
    -> FROM recipe_set 
    -> WHERE recipe_set.id = recipe.recipe_set_id AND (EXISTS (SELECT 1 
    -> FROM job 
    -> WHERE job.id = recipe_set.job_id AND job.deleted IS NULL AND job.to_delete IS NULL))))) AS anon_1;
+---------+
| count_1 |
+---------+
|  584783 |
+---------+
1 row in set (9.95 sec)


After fixing up this query, I get a better performace.


MariaDB [beaker]> SELECT count(*) AS count_1 
    -> FROM (SELECT recipe.id AS recipe_id, recipe.recipe_set_id AS recipe_recipe_set_id, recipe.distro_tree_id AS recipe_distro_tree_id, recipe.result AS recipe_result, recipe.status AS recipe_status, recipe.start_time AS recipe_start_time, recipe.finish_time AS recipe_finish_time, recipe._host_requires AS recipe__host_requires, recipe._distro_requires AS recipe__distro_requires, recipe.kickstart AS recipe_kickstart, recipe.type AS recipe_type, recipe.ttasks AS recipe_ttasks, recipe.ntasks AS recipe_ntasks, recipe.ptasks AS recipe_ptasks, recipe.wtasks AS recipe_wtasks, recipe.ftasks AS recipe_ftasks, recipe.ktasks AS recipe_ktasks, recipe.whiteboard AS recipe_whiteboard, recipe.ks_meta AS recipe_ks_meta, recipe.kernel_options AS recipe_kernel_options, recipe.kernel_options_post AS recipe_kernel_options_post, recipe.role AS recipe_role, recipe.panic AS recipe_panic, recipe._partitions AS recipe__partitions, recipe.autopick_random AS recipe_autopick_random, recipe.log_server AS recipe_log_server, recipe.virt_status AS recipe_virt_status 
    -> FROM recipe INNER JOIN recipe_set ON recipe_set.id = recipe.recipe_set_id INNER JOIN job ON job.id = recipe_set.job_id 
    -> WHERE recipe.type = 'machine_recipe' AND job.deleted IS NULL AND job.to_delete IS NULL) AS anon_1
    -> ;
+---------+
| count_1 |
+---------+
|  584783 |
+---------+
1 row in set (2.42 sec)

Comment 8 matt jia 2016-05-25 06:36:31 UTC
Ideally we should not count rows for the tables with large rows. Once we point this page to Backbone, the performance will be much better as we have already implemented “pageable JSON collections” which does not use a query for counting rows.

Comment 9 matt jia 2016-05-25 06:47:32 UTC
On Gerrit:
 
    https://gerrit.beaker-project.org/#/c/4933/

Comment 10 Dan Callaghan 2016-05-25 16:25:00 UTC
Yeah so there are two parts to the problem here... for very large tables (like activity) we are not showing the total number of results (or pages) in order to avoid the COUNT() entirely. That was bug 1101817.

The other problem here looks like the same one we have hit in many places -- SQLAlchemy changed its behaviour to wrap all queries inside a subquery when doing .count(): bug 1297603, bug 1183239, <http://stackoverflow.com/a/14758255/120202>.

Comment 11 Dan Callaghan 2016-05-25 18:57:27 UTC
Actually I guess there is a *third* problem which is that the query was unnecessarily doing the EXISTS stuff, when it could just be JOIN'ing instead. That's what Matt's patch is fixing.

Matt, can you please just paste some timing information using a production db dump for (a) original bad COUNT query, (b) your fixed version with JOIN instead of EXISTS, and (c) with the subquery wrapping removed, by using an explicit func.count() as described in http://stackoverflow.com/a/14758255/120202 ?

Comment 12 matt jia 2016-05-30 01:44:56 UTC
> Matt, can you please just paste some timing information using a production
> db dump for (a) original bad COUNT query, (b) your fixed version with JOIN
> instead of EXISTS, and (c) with the subquery wrapping removed, by using an
> explicit func.count() as described in
> http://stackoverflow.com/a/14758255/120202 ?

a:

 MariaDB [beaker]> SELECT count(*) AS count_1 
    -> FROM (SELECT recipe.id AS recipe_id, recipe.recipe_set_id AS recipe_recipe_set_id, recipe.distro_tree_id AS recipe_distro_tree_id, recipe.result AS recipe_result, recipe.status AS recipe_status, recipe.start_time AS recipe_start_time, recipe.finish_time AS recipe_finish_time, recipe._host_requires AS recipe__host_requires, recipe._distro_requires AS recipe__distro_requires, recipe.kickstart AS recipe_kickstart, recipe.type AS recipe_type, recipe.ttasks AS recipe_ttasks, recipe.ntasks AS recipe_ntasks, recipe.ptasks AS recipe_ptasks, recipe.wtasks AS recipe_wtasks, recipe.ftasks AS recipe_ftasks, recipe.ktasks AS recipe_ktasks, recipe.whiteboard AS recipe_whiteboard, recipe.ks_meta AS recipe_ks_meta, recipe.kernel_options AS recipe_kernel_options, recipe.kernel_options_post AS recipe_kernel_options_post, recipe.role AS recipe_role, recipe.panic AS recipe_panic, recipe._partitions AS recipe__partitions, recipe.autopick_random AS recipe_autopick_random, recipe.log_server AS recipe_log_server, recipe.virt_status AS recipe_virt_status 
    -> FROM recipe 
    -> WHERE recipe.type = 'machine_recipe' AND (EXISTS (SELECT 1 
    -> FROM recipe_set 
    -> WHERE recipe_set.id = recipe.recipe_set_id AND (EXISTS (SELECT 1 
    -> FROM job 
    -> WHERE job.id = recipe_set.job_id AND job.deleted IS NULL AND job.to_delete IS NULL))))) AS anon_1;
+---------+
| count_1 |
+---------+
|  584783 |
+---------+
1 row in set (10.12 sec)

b:


MariaDB [beaker]> SELECT count(*) AS count_1 
    -> FROM (SELECT recipe.id AS recipe_id, recipe.recipe_set_id AS recipe_recipe_set_id, recipe.distro_tree_id AS recipe_distro_tree_id, recipe.result AS recipe_result, recipe.status AS recipe_status, recipe.start_time AS recipe_start_time, recipe.finish_time AS recipe_finish_time, recipe._host_requires AS recipe__host_requires, recipe._distro_requires AS recipe__distro_requires, recipe.kickstart AS recipe_kickstart, recipe.type AS recipe_type, recipe.ttasks AS recipe_ttasks, recipe.ntasks AS recipe_ntasks, recipe.ptasks AS recipe_ptasks, recipe.wtasks AS recipe_wtasks, recipe.ftasks AS recipe_ftasks, recipe.ktasks AS recipe_ktasks, recipe.whiteboard AS recipe_whiteboard, recipe.ks_meta AS recipe_ks_meta, recipe.kernel_options AS recipe_kernel_options, recipe.kernel_options_post AS recipe_kernel_options_post, recipe.role AS recipe_role, recipe.panic AS recipe_panic, recipe._partitions AS recipe__partitions, recipe.autopick_random AS recipe_autopick_random, recipe.log_server AS recipe_log_server, recipe.virt_status AS recipe_virt_status 
    -> FROM recipe INNER JOIN recipe_set ON recipe_set.id = recipe.recipe_set_id INNER JOIN job ON job.id = recipe_set.job_id 
    -> WHERE recipe.type = 'machine_recipe' AND job.deleted IS NULL AND job.to_delete IS NULL) AS anon_1
    -> ;
+---------+
| count_1 |
+---------+
|  584783 |
+---------+
1 row in set (2.52 sec)

c: with the subquery wrapping removed on the current version

MariaDB [beaker]> SELECT count(recipe.id) AS count_1
    -> FROM recipe 
    -> WHERE recipe.type = 'machine_recipe' AND (EXISTS (SELECT 1 
    -> FROM recipe_set 
    -> WHERE recipe_set.id = recipe.recipe_set_id AND (EXISTS (SELECT 1 
    -> FROM job 
    -> WHERE job.id = recipe_set.job_id AND job.deleted IS NULL AND job.to_delete IS NULL))));
+---------+
| count_1 |
+---------+
|  584783 |
+---------+
1 row in set (9.64 sec)


d: with the subquery wrapping removed on my fixed version

MariaDB [beaker]> SELECT count(recipe.id) AS count_1
    -> FROM recipe INNER JOIN recipe_set ON recipe_set.id = recipe.recipe_set_id INNER JOIN job ON job.id = recipe_set.job_id 
    -> WHERE recipe.type = 'machine_recipe' AND job.deleted IS NULL AND job.to_delete IS NULL
    -> ;
+---------+
| count_1 |
+---------+
|  584783 |
+---------+

Comment 13 matt jia 2016-05-30 02:02:19 UTC
(In reply to matt jia from comment #12) 
> d: with the subquery wrapping removed on my fixed version
> 
> MariaDB [beaker]> SELECT count(recipe.id) AS count_1
>     -> FROM recipe INNER JOIN recipe_set ON recipe_set.id =
> recipe.recipe_set_id INNER JOIN job ON job.id = recipe_set.job_id 
>     -> WHERE recipe.type = 'machine_recipe' AND job.deleted IS NULL AND
> job.to_delete IS NULL
>     -> ;
> +---------+
> | count_1 |
> +---------+
> |  584783 |
> +---------+

Opps, the result is missing.

MariaDB [beaker]> SELECT count(recipe.id) AS count_1
    -> FROM recipe INNER JOIN recipe_set ON recipe_set.id =
    -> recipe.recipe_set_id INNER JOIN job ON job.id = recipe_set.job_id 
    -> WHERE recipe.type = 'machine_recipe' AND job.deleted IS NULL AND
    -> job.to_delete IS NULL;
+---------+
| count_1 |
+---------+
|  584783 |
+---------+
1 row in set (2.43 sec)

Comment 14 Dan Callaghan 2016-05-31 13:21:22 UTC
Okay, so I guess this is showing that the subquery wrapping done by .count() has no negative impact on performance here, so we don't need to bother changing it. Your fix for the JOINs is all we need.

Comment 18 Dan Callaghan 2016-07-07 23:12:41 UTC
Beaker 23.0 has been released.

Comment 19 Dan Callaghan 2016-07-29 00:36:05 UTC
I think I remember the reason why we had the weird nested EXISTS instead of just doing a JOIN... with optimizer_search_depth=6 set on the server (which we had in our previous setup on db01, and which we still have on dbdevel.bne) the query breaks due to excessively large temp table, I guess because it has more than 6 joins and so MySQL picks a bad query plan. Same thing as in bug 590723.