Bug 1426764

Summary: beakerd repeatedly runs update_dirty_jobs unnecessarily
Product: [Retired] Beaker Reporter: Dan Callaghan <dcallagh>
Component: generalAssignee: Dan Callaghan <dcallagh>
Status: CLOSED CURRENTRELEASE QA Contact: tools-bugs <tools-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 24CC: dcallagh, mjia, rjoost
Target Milestone: 24.2Keywords: Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-30 03:23:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Callaghan 2017-02-24 19:21:05 UTC
The main loop of beakerd currently looks like this:

def _main_recipes():
    work_done = update_dirty_jobs()
    work_done |= abort_dead_recipes()
    work_done |= update_dirty_jobs()
    work_done |= process_new_recipes()
    work_done |= update_dirty_jobs()
    work_done |= queue_processed_recipesets()
    if _virt_enabled():
        work_done |= update_dirty_jobs()
        work_done |= provision_virt_recipes()
    work_done |= update_dirty_jobs()
    work_done |= schedule_queued_recipes()
    work_done |= update_dirty_jobs()
    work_done |= provision_scheduled_recipesets()
    if _outstanding_data_migrations:
        work_done |= run_data_migrations()
    return work_done

update_dirty_jobs is interleaved with each call to the scheduling steps. (Before this version, update_dirty_jobs was actually concurrent with the scheduling steps but that was prone to deadlocks.) The reason for this is because each scheduling step is expected to produce some dirty jobs, *if* it actually did some work.

However in the quite common scenario where the scheduler has finished doing a bunch of work, and there are no more recipes that need scheduling *but* a large number of new ones are submitted -- in this case, every scheduling function apart from process_new_recipes() has no work to do. But each update_dirty_jobs() call still runs anyway, potentially up to 6 times repeatedly before the loop comes back around to process_new_recipes().

This is somewhat wasteful (the repeated calls to update_dirty_jobs() will spend some time query for dirty jobs but will generally not find any, or only one or two) and it introduces extra latency in handling newly submitted recipes -- easily several minutes on a heavily loaded Beaker instance like our production one.

Users have reported that this latency in handling new recipes is actually noticeable so we should minimise it as much as possible.

Comment 1 Dan Callaghan 2017-02-24 19:21:41 UTC
I think we can just adjust the conditionals so that each scheduling step is followed by update_dirty_jobs() only if the scheduling step did some work.

Comment 2 Dan Callaghan 2017-02-24 21:28:50 UTC
https://gerrit.beaker-project.org/5650

Comment 3 Roman Joost 2017-03-06 23:56:53 UTC
We'll have to do QE on the items before we can tag a release.

Comment 4 Roman Joost 2017-03-20 07:09:35 UTC
I've kicked off 3 system scans, waited a while and cancelled one job while I kept the others running. I'm pasting parts of the log here in which I think the changes have taken affect:

Mar 20 07:38:13 beaker-server-lc beakerd[32477]: bkr.server.tools.beakerd DEBUG Provisioning recipe 5 in RS:5
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.model.activity DEBUG Tentative SystemActivity: object_id=3L, service=u'Scheduler', field=u'Distro Tree', action=u'Provision', old=u'', new=u'CentOS-6.8 x86_64', user=admin
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.tools.beakerd DEBUG Provisioning recipe 6 in RS:6
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.model.activity DEBUG Tentative SystemActivity: object_id=2L, service=u'Scheduler', field=u'Distro Tree', action=u'Provision', old=u'', new=u'CentOS-6.8 x86_64', user=admin
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.tools.beakerd DEBUG Provisioning recipe 7 in RS:7
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.model.activity DEBUG Tentative SystemActivity: object_id=1L, service=u'Scheduler', field=u'Distro Tree', action=u'Provision', old=u'', new=u'CentOS-6.8 x86_64', user=admin
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.tools.beakerd DEBUG Updating dirty job 5
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.tools.beakerd DEBUG Updating dirty job 7
Mar 20 07:38:14 beaker-server-lc beakerd[32477]: bkr.server.tools.beakerd DEBUG Updating dirty job 6
Mar 20 07:38:31 beaker-server-lc beakerd[32477]: bkr.server.tools.beakerd DEBUG Updating dirty job 6
Mar 20 07:38:31 beaker-server-lc beakerd[32477]: bkr.server.model.scheduler DEBUG Releasing system beaker-test-vm2 for recipe 6
Mar 20 07:38:31 beaker-server-lc beakerd[32477]: bkr.server.model.activity DEBUG Tentative SystemActivity: object_id=2L, service=u'Scheduler', field=u'User', action=u'Returned', old=u'admin', new=u'', user=admin

Verified on my own Beaker installation, running 24.2.git.9.d4c983d

Comment 5 Dan Callaghan 2017-03-30 03:23:04 UTC
Beaker 24.2 has been released.