Bug 745960

Summary: [RFE] When task grows over limit (time, size...), reserve machine as-is for user to investigate
Product: [Retired] Beaker Reporter: David Kutálek <dkutalek>
Component: beahAssignee: Nick Coghlan <ncoghlan>
Status: CLOSED DUPLICATE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 0.7CC: bpeck, dcallagh, llim, mcsontos, psplicha, rmancy, stl
Target Milestone: ---Keywords: FutureFeature, Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: Misc
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-15 05:17:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description David Kutálek 2011-10-13 14:13:57 UTC
Description of problem:

When some test fails in a way it cycle infinitely and/or grows its log files over limits, it is being handled by watchdogs. I do not know how these watchdogs work exactly, but often it means end of complete job in warning state. In better case rest is proccessed, but system may be in unexpected state.

I propose new (optional) behaviour of watchdog(s):
 - stop such a problematic task 
 - hold the system as is and run reservesys

This way I will be able to immediately catch bugs in my tasks and save beaker machine resources by not having to run whole job once more.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Nick Coghlan 2012-10-17 04:34:37 UTC
Bulk reassignment of issues as Bill has moved to another team.

Comment 2 Min Shin 2012-11-07 07:22:38 UTC
This bugs is closed as it is either not in the current Beaker scope or we could not find sufficient data in the bug report for consideration.
Please feel free to reopen the bug with additional information and/or business cases behind it.

Comment 3 David Kutálek 2012-11-07 10:00:03 UTC
Either out of scope or insufficient data?

Please tell me more:
 - which one applies? 
 - if scope, why?
 - if data, what more data do you need?

David

(In reply to comment #2)
> This bugs is closed as it is either not in the current Beaker scope or we
> could not find sufficient data in the bug report for consideration.
> Please feel free to reopen the bug with additional information and/or
> business cases behind it.

Comment 4 Dan Callaghan 2012-11-07 23:04:07 UTC
(In reply to comment #3)

This bug might have been miscategorized. Your suggestion sounds reasonable, the only problem is that it's not possible to change the tasks in a recipe after it is scheduled. So it would have to be a feature of the harness that when local watchdog is triggered, the current task is suspended and its run time is extended for some amount of time (24 hours?). The only problem then is how will the user be notified? The reservation e-mail is sent by /distribution/reservesys. The answer to this might be bug 639938: treating reservation differently than other tasks.

We would definitely also want this behaviour to be opt-in, since we wouldn't want every local watchdog to hold onto the machine for 24 hours. That would create a huge amount of waste.

Comment 5 David Kutálek 2012-11-08 13:21:56 UTC
Thank you for response. Yes it should be most probably implemented in harness and should be configurable: What to do when local watchdog expires?

a) recipe is cancelled
b) task is cancelled and recipe execution continues
c) machine is reserved by harness and e-mail is sent

Reservation time should be also configurable, usually something like 2 hours may be sufficient.

Comment 6 Nick Coghlan 2013-04-15 05:17:08 UTC
Closing this as a duplicate of #639938.

We won't be adding any implicit reservation behaviour, but we will be adding the capability to request post-execution reservation of the system independent of the executionof the tasks.

*** This bug has been marked as a duplicate of bug 639938 ***