1297603 – [SLOW QUERY] beakerd counting dirty jobs

Bug 1297603 - [SLOW QUERY] beakerd counting dirty jobs

Summary: [SLOW QUERY] beakerd counting dirty jobs

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Beaker
Classification:	Retired
Component:	scheduler
Sub Component:
Version:	21
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	26.0
Assignee:	Dan Callaghan
QA Contact:	tools-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1097861
TreeView+	depends on / blocked

Reported:	2016-01-12 00:53 UTC by Dan Callaghan
Modified:	2018-10-08 02:16 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-10-08 02:16:52 UTC
Embargoed:

Attachments	(Terms of Use)

Description Dan Callaghan 2016-01-12 00:53:40 UTC

The update_dirty_jobs thread in beakerd is structured as a polling loop which regularly checks for dirty jobs and then processes any which are found. Checking for dirty jobs means querying the job table for dirty_version != clean_version.

The dirty_version != clean_version condition does use an index according to the EXPLAIN output, although it is not a constant lookup so MySQL still has to examine many rows. However it is still able to do this quickly. The query to actually grab the dirty job ids:

  SELECT job.id AS job_id 
  FROM job 
  WHERE job.dirty_version != job.clean_version

does show up in our slow query log (presumably because it's not using a constant index lookup) but it averages 0.32 seconds running time which is acceptable.

  MySQL [beaker_dcallagh]> explain SELECT job.id AS job_id    FROM job    WHERE job.dirty_version != job.clean_version;
  +----+-------------+-------+-------+---------------+----------------------------+---------+------+--------+--------------------------+
  | id | select_type | table | type  | possible_keys | key                        | key_len | ref  | rows   | Extra                    |
  +----+-------------+-------+-------+---------------+----------------------------+---------+------+--------+--------------------------+
  |  1 | SIMPLE      | job   | index | NULL          | ix_job_dirty_clean_version | 32      | NULL | 846116 | Using where; Using index |
  +----+-------------+-------+-------+---------------+----------------------------+---------+------+--------+--------------------------+
  1 row in set (0.01 sec)

However beakerd's polling loop actually calls .count() on this first, to determine if any dirty job rows exist, and then actually looks up their ids if the .count() call returned greater than zero. The current query generated by .count() is:

  SELECT count(*) AS count_1 
  FROM (SELECT job.id AS job_id, job.dirty_version AS job_dirty_version, job.clean_version AS job_clean_version, job.owner_id AS job_owner_id, job.submitter_id AS job_submitter_id, job.group_id AS job_group_id, job.whiteboard AS job_whiteboard, job.retention_tag_id AS job_retention_tag_id, job.product_id AS job_product_id, job.result AS job_result, job.status AS job_status, job.deleted AS job_deleted, job.to_delete AS job_to_delete, job.ttasks AS job_ttasks, job.ntasks AS job_ntasks, job.ptasks AS job_ptasks, job.wtasks AS job_wtasks, job.ftasks AS job_ftasks, job.ktasks AS job_ktasks 
  FROM job 
  WHERE job.dirty_version != job.clean_version) AS anon_1

It seems that the subquery wrapping defeats MySQL's optimizer and produces a much slower query. The EXPLAIN output indicates that this version does not use any index at all:

  MySQL [beaker_dcallagh]> explain SELECT count(*) AS count_1    FROM (SELECT job.id AS job_id, job.dirty_version AS job_dirty_version, job.clean_version AS job_clean_version, job.owner_id AS job_owner_id, job.submitter_id AS job_submitter_id, job.group_id AS job_group_id, job.whiteboard AS job_whiteboard, job.retention_tag_id AS job_retention_tag_id, job.product_id AS job_product_id, job.result AS job_result, job.status AS job_status, job.deleted AS job_deleted, job.to_delete AS job_to_delete, job.ttasks AS job_ttasks, job.ntasks AS job_ntasks, job.ptasks AS job_ptasks, job.wtasks AS job_wtasks, job.ftasks AS job_ftasks, job.ktasks AS job_ktasks    FROM job    WHERE job.dirty_version != job.clean_version) AS anon_1;
  +----+-------------+-------+------+---------------+------+---------+------+--------+------------------------------+
  | id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra                        |
  +----+-------------+-------+------+---------------+------+---------+------+--------+------------------------------+
  |  1 | PRIMARY     | NULL  | NULL | NULL          | NULL | NULL    | NULL |   NULL | Select tables optimized away |
  |  2 | DERIVED     | job   | ALL  | NULL          | NULL | NULL    | NULL | 846116 | Using where                  |
  +----+-------------+-------+------+---------------+------+---------+------+--------+------------------------------+
  2 rows in set (1.52 sec)

Compared with a simpler COUNT(*) using no subquery:

  MySQL [beaker_dcallagh]> explain SELECT COUNT(*)    FROM job    WHERE job.dirty_version != job.clean_version;
  +----+-------------+-------+-------+---------------+----------------------------+---------+------+--------+--------------------------+
  | id | select_type | table | type  | possible_keys | key                        | key_len | ref  | rows   | Extra                    |
  +----+-------------+-------+-------+---------------+----------------------------+---------+------+--------+--------------------------+
  |  1 | SIMPLE      | job   | index | NULL          | ix_job_dirty_clean_version | 32      | NULL | 846116 | Using where; Using index |
  +----+-------------+-------+-------+---------------+----------------------------+---------+------+--------+--------------------------+
  1 row in set (0.00 sec)

The .count() subquery wrapping, and its performance problems on MySQL, is a known issue <http://stackoverflow.com/a/14758255/120202> and one we have actually hit before (e.g. bug 1183239). In this particular case we know that the subquery wrapping is superfluous so we can either force SQLAlchemy to generate the simpler COUNT(*) query, as suggested in the Stack Overflow answer, or even better we could skip the .count() entirely and always just fetch the list of dirty job ids.

Comment 1 Dan Callaghan 2018-06-04 07:38:33 UTC

As of https://git.beaker-project.org/cgit/beaker/commit/?id=fa50dca70e604118c25158eeba7abc8ee5084690 we no longer issue this problematic COUNT() query.

However the corresponding SELECT query itself, that is:

  SELECT job.id AS job_id 
  FROM job 
  WHERE job.dirty_version != job.clean_version

came up in bug 1097861 as something which might be worth optimizing:

https://bugzilla.redhat.com/show_bug.cgi?id=1097861#c36

Here is a patch to switch to the "naive" dirty flag approach:

https://gerrit.beaker-project.org/6121

as described in:

https://bugzilla.redhat.com/show_bug.cgi?id=1097861#c41

although it is not at all clear to me right now whether this change will make things better or worse overall. It might be worth seeking more data before deciding to merge that patch.

Comment 2 Dan Callaghan 2018-07-27 03:00:38 UTC

(In reply to Dan Callaghan from comment #1)
> although it is not at all clear to me right now whether this change will
> make things better or worse overall. It might be worth seeking more data
> before deciding to merge that patch.

We don't have any real data proving this approach will be better or worse, and I cannot come up with any ideas how to get it. We don't have any kind of realistic load testing mechanism that would allow us to reproduce production-like load in a production-like environment to compare the behaviour of the two schema design choices.

But knowing that:
* the current version does occasionally hit the lock wait issues
* the proposed version is a simpler schema, without index pollution
* the corresponding SELECT query is also simpler, and the EXPLAIN looks nicer

... I think we should assume it *probably* won't be any worse. And thus we should roll it out.

If it turns out we are wrong, well... we can always go back to the previous approach by reverting this patch.

Comment 4 Dan Callaghan 2018-10-08 02:16:52 UTC

Beaker 26.0 has been released.

Note You need to log in before you can comment on or make changes to this bug.