Bug 1270649

Summary: broken system detection logic fires if *any* task is Aborted, rather than *all* tasks Aborted
Product: [Retired] Beaker Reporter: Dan Callaghan <dcallagh>
Component: schedulerAssignee: Roman Joost <rjoost>
Status: CLOSED CURRENTRELEASE QA Contact: tools-bugs <tools-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 21CC: alemay, bpeck, dcallagh, dowang, draeuftl, jburke, jstancek, mjia, rjoost
Target Milestone: 21.1Keywords: NeedsTestCase, Patch, Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-21 03:25:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Callaghan 2015-10-12 04:05:15 UTC
Description of problem:
Due to bug 714937 fixed in 21.0, a recipe is now Aborted if any task in the recipe is Aborted. Previously it was only Aborted if all tasks in the recipe are Aborted.

As a consequence, the broken system detection logic (which is currently triggered based on the recipe status) will consider a recipe to be a "suspicious abort" if any task in the recipe is Aborted. It should only consider recipes where every task is Aborted. 

Version-Release number of selected component (if applicable):
21.0

How reproducible:
somewhat easily

Steps to Reproduce:
1. Schedule a recipe for a particular system, using a released distro, with /distribution/install and /distribution/reservesys (use a small value for the RESERVETIME parameter to make testing easier)
2. Schedule another one so that they run consecutively
3. Wait for each recipe to start and then the watchdog timer to expire

Actual results:
System is marked as broken due to two consecutive Aborted recipes.

Expected results:
System should not be marked broken because the /distribution/install task completes successfully.

Additional info:
This has a high impact because it's quite common for /distribution/reservesys to be Aborted, if the job owner does not explicitly return the system before the reservation time runs out.

Comment 1 Roman Joost 2015-10-14 06:03:22 UTC
Patch available on gerrit:

https://gerrit.beaker-project.org/#/c/4432/

Comment 4 Dan Callaghan 2015-10-21 03:25:28 UTC
Beaker 21.1 has been released.