Bug 1426764 - beakerd repeatedly runs update_dirty_jobs unnecessarily
Summary: beakerd repeatedly runs update_dirty_jobs unnecessarily
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Community
Component: general
Version: 24
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified vote
Target Milestone: 24.2
Assignee: Dan Callaghan
QA Contact: tools-bugs
URL:
Whiteboard:
Keywords: Patch
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-24 19:21 UTC by Dan Callaghan
Modified: 2017-03-30 03:23 UTC (History)
3 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2017-03-30 03:23:04 UTC


Attachments (Terms of Use)

Description Dan Callaghan 2017-02-24 19:21:05 UTC
The main loop of beakerd currently looks like this:

def _main_recipes():
    work_done = update_dirty_jobs()
    work_done |= abort_dead_recipes()
    work_done |= update_dirty_jobs()
    work_done |= process_new_recipes()
    work_done |= update_dirty_jobs()
    work_done |= queue_processed_recipesets()
    if _virt_enabled():
        work_done |= update_dirty_jobs()
        work_done |= provision_virt_recipes()
    work_done |= update_dirty_jobs()
    work_done |= schedule_queued_recipes()
    work_done |= update_dirty_jobs()
    work_done |= provision_scheduled_recipesets()
    if _outstanding_data_migrations:
        work_done |= run_data_migrations()
    return work_done

update_dirty_jobs is interleaved with each call to the scheduling steps. (Before this version, update_dirty_jobs was actually concurrent with the scheduling steps but that was prone to deadlocks.) The reason for this is because each scheduling step is expected to produce some dirty jobs, *if* it actually did some work.

However in the quite common scenario where the scheduler has finished doing a bunch of work, and there are no more recipes that need scheduling *but* a large number of new ones are submitted -- in this case, every scheduling function apart from process_new_recipes() has no work to do. But each update_dirty_jobs() call still runs anyway, potentially up to 6 times repeatedly before the loop comes back around to process_new_recipes().

This is somewhat wasteful (the repeated calls to update_dirty_jobs() will spend some time query for dirty jobs but will generally not find any, or only one or two) and it introduces extra latency in handling newly submitted recipes -- easily several minutes on a heavily loaded Beaker instance like our production one.

Users have reported that this latency in handling new recipes is actually noticeable so we should minimise it as much as possible.

Comment 1 Dan Callaghan 2017-02-24 19:21:41 UTC
I think we can just adjust the conditionals so that each scheduling step is followed by update_dirty_jobs() only if the scheduling step did some work.

Comment 2 Dan Callaghan 2017-02-24 21:28:50 UTC
https://gerrit.beaker-project.org/5650

Comment 3 Roman Joost 2017-03-06 23:56:53 UTC
We'll have to do QE on the items before we can tag a release.

Comment 4 Roman Joost 2017-03-20 07:09:35 UTC
I've kicked off 3 system scans, waited a while and cancelled one job while I kept the others running. I'm pasting parts of the log here in which I think the changes have taken affect:

Mar 20 07:38:13 beaker-server-lc beakerd[32477]: bkr.server.tools.beakerd DEBUG Provisioning recipe 5 in RS:5
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.model.activity DEBUG Tentative SystemActivity: object_id=3L, service=u'Scheduler', field=u'Distro Tree', action=u'Provision', old=u'', new=u'CentOS-6.8 x86_64', user=admin
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.tools.beakerd DEBUG Provisioning recipe 6 in RS:6
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.model.activity DEBUG Tentative SystemActivity: object_id=2L, service=u'Scheduler', field=u'Distro Tree', action=u'Provision', old=u'', new=u'CentOS-6.8 x86_64', user=admin
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.tools.beakerd DEBUG Provisioning recipe 7 in RS:7
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.model.activity DEBUG Tentative SystemActivity: object_id=1L, service=u'Scheduler', field=u'Distro Tree', action=u'Provision', old=u'', new=u'CentOS-6.8 x86_64', user=admin
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.tools.beakerd DEBUG Updating dirty job 5
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.tools.beakerd DEBUG Updating dirty job 7
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.tools.beakerd DEBUG Updating dirty job 6
Mar 20 07:38:31 beaker-server-lc beakerd[32477]: bkr.server.tools.beakerd DEBUG Updating dirty job 6
Mar 20 07:38:31 beaker-server-lc beakerd[32477]: bkr.server.model.scheduler DEBUG Releasing system beaker-test-vm2 for recipe 6
Mar 20 07:38:31 beaker-server-lc beakerd[32477]: bkr.server.model.activity DEBUG Tentative SystemActivity: object_id=2L, service=u'Scheduler', field=u'User', action=u'Returned', old=u'admin', new=u'', user=admin

Verified on my own Beaker installation, running 24.2.git.9.d4c983d

Comment 5 Dan Callaghan 2017-03-30 03:23:04 UTC
Beaker 24.2 has been released.


Note You need to log in before you can comment on or make changes to this bug.