Bug 668473 - Jobs left in queued state forever
Summary: Jobs left in queued state forever
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: scheduler
Version: 0.6
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
Assignee: Bill Peck
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-10 15:46 UTC by Bill Peck
Modified: 2019-05-22 13:38 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-03-10 05:21:26 UTC
Embargoed:


Attachments (Terms of Use)

Description Bill Peck 2011-01-10 15:46:03 UTC
Description of problem:

Currently the scheduler only aborts recipes when they are first entered if they don't match a distro or system.  But recipes can stay queued for a while if systems are in use or the user doesn't have access to many systems.  In that time the requested distro could be removed and then the job will never run.

Distro.labController.remove() should abort any queued recipes that have been assigned to it.

We should also add a routine to abort recipes that have no matching systems.

Comment 1 Bill Peck 2011-02-22 15:36:21 UTC
This should be doable with one routine.

>>> print Recipe.query().join('status').outerjoin(['systems']).outerjoin(['distro','lab_controller_assocs','lab_controller']).filter(or_(and_(Recipe.status==TaskStatus.by_name(u'Queued'),System.id==None,),and_(Recipe.status==TaskStatus.by_name(u'Queued'),LabController.id==None,),),)


SELECT recipe.id AS recipe_id, recipe.recipe_set_id AS recipe_recipe_set_id, recipe.distro_id AS recipe_distro_id, recipe.system_id AS recipe_system_id, recipe.result_id AS recipe_result_id, recipe.status_id AS recipe_status_id, recipe.start_time AS recipe_start_time, recipe.finish_time AS recipe_finish_time, recipe._host_requires AS recipe__host_requires, recipe._distro_requires AS recipe__distro_requires, recipe.kickstart AS recipe_kickstart, recipe.type AS recipe_type, recipe.ttasks AS recipe_ttasks, recipe.ptasks AS recipe_ptasks, recipe.wtasks AS recipe_wtasks, recipe.ftasks AS recipe_ftasks, recipe.ktasks AS recipe_ktasks, recipe.whiteboard AS recipe_whiteboard, recipe.ks_meta AS recipe_ks_meta, recipe.kernel_options AS recipe_kernel_options, recipe.kernel_options_post AS recipe_kernel_options_post, recipe.role AS recipe_role, recipe.panic AS recipe_panic, recipe._partitions AS recipe__partitions, recipe.autopick_random AS recipe_autopick_random 
FROM recipe INNER JOIN task_status ON task_status.id = recipe.status_id LEFT OUTER JOIN system_recipe_map ON recipe.id = system_recipe_map.recipe_id LEFT OUTER JOIN system ON system.id = system_recipe_map.system_id LEFT OUTER JOIN distro ON distro.id = recipe.distro_id LEFT OUTER JOIN distro_lab_controller_map ON distro.id = distro_lab_controller_map.distro_id LEFT OUTER JOIN lab_controller ON lab_controller.id = distro_lab_controller_map.lab_controller_id 
WHERE %s = recipe.status_id AND system.id IS NULL OR %s = recipe.status_id AND lab_controller.id IS NULL ORDER BY recipe.id


>>> for recipe in Recipe.query().join('status').outerjoin(['systems']).outerjoin(['distro','lab_controller_assocs','lab_controller']).filter(or_(and_(Recipe.status==TaskStatus.by_name(u'Queued'),System.id==None,),and_(Recipe.status==TaskStatus.by_name(u'Queued'),LabController.id==None,),),):
...     print recipe.id, len(recipe.systems), len(recipe.distro.lab_controller_assocs)
... 
6689 1 0
12289 1 0
14307 1 0
14374 1 0
14378 1 0
14476 1 0
14521 3 0
24853 0 0
24869 0 0
26510 1 0
34933 1 0
41921 0 0
42218 0 1
42229 0 0
42230 0 0
42282 1 0
42346 1 0
42361 0 5
42372 1 0
42395 1 0
42437 0 1
42438 0 1

Comment 2 Bill Peck 2011-03-02 22:12:07 UTC
pushed to gerrit for review.  has test case now.


Note You need to log in before you can comment on or make changes to this bug.