Bug 1297603 - [SLOW QUERY] beakerd counting dirty jobs
[SLOW QUERY] beakerd counting dirty jobs
Status: ASSIGNED
Product: Beaker
Classification: Community
Component: scheduler (Show other bugs)
21
Unspecified Unspecified
unspecified Severity unspecified (vote)
: ---
: ---
Assigned To: Dan Callaghan
tools-bugs
: Triaged
Depends On:
Blocks: 1097861
  Show dependency treegraph
 
Reported: 2016-01-11 19:53 EST by Dan Callaghan
Modified: 2018-06-04 03:38 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dan Callaghan 2016-01-11 19:53:40 EST
The update_dirty_jobs thread in beakerd is structured as a polling loop which regularly checks for dirty jobs and then processes any which are found. Checking for dirty jobs means querying the job table for dirty_version != clean_version.

The dirty_version != clean_version condition does use an index according to the EXPLAIN output, although it is not a constant lookup so MySQL still has to examine many rows. However it is still able to do this quickly. The query to actually grab the dirty job ids:

  SELECT job.id AS job_id 
  FROM job 
  WHERE job.dirty_version != job.clean_version

does show up in our slow query log (presumably because it's not using a constant index lookup) but it averages 0.32 seconds running time which is acceptable.

  MySQL [beaker_dcallagh]> explain SELECT job.id AS job_id    FROM job    WHERE job.dirty_version != job.clean_version;
  +----+-------------+-------+-------+---------------+----------------------------+---------+------+--------+--------------------------+
  | id | select_type | table | type  | possible_keys | key                        | key_len | ref  | rows   | Extra                    |
  +----+-------------+-------+-------+---------------+----------------------------+---------+------+--------+--------------------------+
  |  1 | SIMPLE      | job   | index | NULL          | ix_job_dirty_clean_version | 32      | NULL | 846116 | Using where; Using index |
  +----+-------------+-------+-------+---------------+----------------------------+---------+------+--------+--------------------------+
  1 row in set (0.01 sec)

However beakerd's polling loop actually calls .count() on this first, to determine if any dirty job rows exist, and then actually looks up their ids if the .count() call returned greater than zero. The current query generated by .count() is:

  SELECT count(*) AS count_1 
  FROM (SELECT job.id AS job_id, job.dirty_version AS job_dirty_version, job.clean_version AS job_clean_version, job.owner_id AS job_owner_id, job.submitter_id AS job_submitter_id, job.group_id AS job_group_id, job.whiteboard AS job_whiteboard, job.retention_tag_id AS job_retention_tag_id, job.product_id AS job_product_id, job.result AS job_result, job.status AS job_status, job.deleted AS job_deleted, job.to_delete AS job_to_delete, job.ttasks AS job_ttasks, job.ntasks AS job_ntasks, job.ptasks AS job_ptasks, job.wtasks AS job_wtasks, job.ftasks AS job_ftasks, job.ktasks AS job_ktasks 
  FROM job 
  WHERE job.dirty_version != job.clean_version) AS anon_1

It seems that the subquery wrapping defeats MySQL's optimizer and produces a much slower query. The EXPLAIN output indicates that this version does not use any index at all:

  MySQL [beaker_dcallagh]> explain SELECT count(*) AS count_1    FROM (SELECT job.id AS job_id, job.dirty_version AS job_dirty_version, job.clean_version AS job_clean_version, job.owner_id AS job_owner_id, job.submitter_id AS job_submitter_id, job.group_id AS job_group_id, job.whiteboard AS job_whiteboard, job.retention_tag_id AS job_retention_tag_id, job.product_id AS job_product_id, job.result AS job_result, job.status AS job_status, job.deleted AS job_deleted, job.to_delete AS job_to_delete, job.ttasks AS job_ttasks, job.ntasks AS job_ntasks, job.ptasks AS job_ptasks, job.wtasks AS job_wtasks, job.ftasks AS job_ftasks, job.ktasks AS job_ktasks    FROM job    WHERE job.dirty_version != job.clean_version) AS anon_1;
  +----+-------------+-------+------+---------------+------+---------+------+--------+------------------------------+
  | id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra                        |
  +----+-------------+-------+------+---------------+------+---------+------+--------+------------------------------+
  |  1 | PRIMARY     | NULL  | NULL | NULL          | NULL | NULL    | NULL |   NULL | Select tables optimized away |
  |  2 | DERIVED     | job   | ALL  | NULL          | NULL | NULL    | NULL | 846116 | Using where                  |
  +----+-------------+-------+------+---------------+------+---------+------+--------+------------------------------+
  2 rows in set (1.52 sec)

Compared with a simpler COUNT(*) using no subquery:

  MySQL [beaker_dcallagh]> explain SELECT COUNT(*)    FROM job    WHERE job.dirty_version != job.clean_version;
  +----+-------------+-------+-------+---------------+----------------------------+---------+------+--------+--------------------------+
  | id | select_type | table | type  | possible_keys | key                        | key_len | ref  | rows   | Extra                    |
  +----+-------------+-------+-------+---------------+----------------------------+---------+------+--------+--------------------------+
  |  1 | SIMPLE      | job   | index | NULL          | ix_job_dirty_clean_version | 32      | NULL | 846116 | Using where; Using index |
  +----+-------------+-------+-------+---------------+----------------------------+---------+------+--------+--------------------------+
  1 row in set (0.00 sec)

The .count() subquery wrapping, and its performance problems on MySQL, is a known issue <http://stackoverflow.com/a/14758255/120202> and one we have actually hit before (e.g. bug 1183239). In this particular case we know that the subquery wrapping is superfluous so we can either force SQLAlchemy to generate the simpler COUNT(*) query, as suggested in the Stack Overflow answer, or even better we could skip the .count() entirely and always just fetch the list of dirty job ids.
Comment 1 Dan Callaghan 2018-06-04 03:38:33 EDT
As of https://git.beaker-project.org/cgit/beaker/commit/?id=fa50dca70e604118c25158eeba7abc8ee5084690 we no longer issue this problematic COUNT() query.

However the corresponding SELECT query itself, that is:

  SELECT job.id AS job_id 
  FROM job 
  WHERE job.dirty_version != job.clean_version

came up in bug 1097861 as something which might be worth optimizing:

https://bugzilla.redhat.com/show_bug.cgi?id=1097861#c36

Here is a patch to switch to the "naive" dirty flag approach:

https://gerrit.beaker-project.org/6121

as described in:

https://bugzilla.redhat.com/show_bug.cgi?id=1097861#c41

although it is not at all clear to me right now whether this change will make things better or worse overall. It might be worth seeking more data before deciding to merge that patch.

Note You need to log in before you can comment on or make changes to this bug.