Bug 668473

Summary: Jobs left in queued state forever
Product: [Retired] Beaker Reporter: Bill Peck <bpeck>
Component: schedulerAssignee: Bill Peck <bpeck>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 0.6CC: bpeck, dcallagh, mcsontos, pbunyan, rmancy
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-10 05:21:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bill Peck 2011-01-10 15:46:03 UTC
Description of problem:

Currently the scheduler only aborts recipes when they are first entered if they don't match a distro or system.  But recipes can stay queued for a while if systems are in use or the user doesn't have access to many systems.  In that time the requested distro could be removed and then the job will never run.

Distro.labController.remove() should abort any queued recipes that have been assigned to it.

We should also add a routine to abort recipes that have no matching systems.

Comment 1 Bill Peck 2011-02-22 15:36:21 UTC
This should be doable with one routine.

>>> print Recipe.query().join('status').outerjoin(['systems']).outerjoin(['distro','lab_controller_assocs','lab_controller']).filter(or_(and_(Recipe.status==TaskStatus.by_name(u'Queued'),System.id==None,),and_(Recipe.status==TaskStatus.by_name(u'Queued'),LabController.id==None,),),)


SELECT recipe.id AS recipe_id, recipe.recipe_set_id AS recipe_recipe_set_id, recipe.distro_id AS recipe_distro_id, recipe.system_id AS recipe_system_id, recipe.result_id AS recipe_result_id, recipe.status_id AS recipe_status_id, recipe.start_time AS recipe_start_time, recipe.finish_time AS recipe_finish_time, recipe._host_requires AS recipe__host_requires, recipe._distro_requires AS recipe__distro_requires, recipe.kickstart AS recipe_kickstart, recipe.type AS recipe_type, recipe.ttasks AS recipe_ttasks, recipe.ptasks AS recipe_ptasks, recipe.wtasks AS recipe_wtasks, recipe.ftasks AS recipe_ftasks, recipe.ktasks AS recipe_ktasks, recipe.whiteboard AS recipe_whiteboard, recipe.ks_meta AS recipe_ks_meta, recipe.kernel_options AS recipe_kernel_options, recipe.kernel_options_post AS recipe_kernel_options_post, recipe.role AS recipe_role, recipe.panic AS recipe_panic, recipe._partitions AS recipe__partitions, recipe.autopick_random AS recipe_autopick_random 
FROM recipe INNER JOIN task_status ON task_status.id = recipe.status_id LEFT OUTER JOIN system_recipe_map ON recipe.id = system_recipe_map.recipe_id LEFT OUTER JOIN system ON system.id = system_recipe_map.system_id LEFT OUTER JOIN distro ON distro.id = recipe.distro_id LEFT OUTER JOIN distro_lab_controller_map ON distro.id = distro_lab_controller_map.distro_id LEFT OUTER JOIN lab_controller ON lab_controller.id = distro_lab_controller_map.lab_controller_id 
WHERE %s = recipe.status_id AND system.id IS NULL OR %s = recipe.status_id AND lab_controller.id IS NULL ORDER BY recipe.id


>>> for recipe in Recipe.query().join('status').outerjoin(['systems']).outerjoin(['distro','lab_controller_assocs','lab_controller']).filter(or_(and_(Recipe.status==TaskStatus.by_name(u'Queued'),System.id==None,),and_(Recipe.status==TaskStatus.by_name(u'Queued'),LabController.id==None,),),):
...     print recipe.id, len(recipe.systems), len(recipe.distro.lab_controller_assocs)
... 
6689 1 0
12289 1 0
14307 1 0
14374 1 0
14378 1 0
14476 1 0
14521 3 0
24853 0 0
24869 0 0
26510 1 0
34933 1 0
41921 0 0
42218 0 1
42229 0 0
42230 0 0
42282 1 0
42346 1 0
42361 0 5
42372 1 0
42395 1 0
42437 0 1
42438 0 1

Comment 2 Bill Peck 2011-03-02 22:12:07 UTC
pushed to gerrit for review.  has test case now.