Bug 1008509

Summary: beaker-provision does not kill power script child processes
Product: [Retired] Beaker Reporter: Raymond Mancy <rmancy>
Component: lab controllerAssignee: Raymond Mancy <rmancy>
Status: CLOSED CURRENTRELEASE QA Contact: tools-bugs <tools-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 0.14CC: aigao, asaha, dcallagh, ebaak, jwalters, llim, qwan, rmancy, xjia
Target Milestone: 0.14.2Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-07 01:47:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raymond Mancy 2013-09-16 13:51:50 UTC
Description of problem:

Systems that share the same management interface can sometimes stop each other from running jobs if a blocking process goes awol.


Version-Release number of selected component (if applicable):

0.14.1

How reproducible:


Steps to Reproduce:
1. ?
2.
3.

Actual results:

Job was not running

Expected results:

beaker-provision should have times out the 'Waiting' command, 

Additional info:

The configure_netboot entries etc wete still in the Queued state (had been for a couple of days), beaker-provision et al were running, so there was no problem with Beaker per se.

Comment 4 Dan Callaghan 2013-09-16 22:42:50 UTC
Beaker-provision already enforces a timeout on power commands and kills the script if the timeout is exceeded. It sounds like the problem here is that the power script spawned a child process (telnet) which wasn't cleaned up.

Beaker-provision should make sure each power command is run in its own progress group and then kill the entire process group on timeout.

Comment 5 Dan Callaghan 2013-09-17 01:18:45 UTC
(In reply to Dan Callaghan from comment #4)
> Beaker-provision should make sure each power command is run in its own
> progress group

process group

Comment 6 Nick Coghlan 2013-09-18 00:23:14 UTC
This is the kind of provisioning reliability fix I'd like us to focus on in 0.16 :)

Comment 8 Dan Callaghan 2013-09-23 23:41:03 UTC
The other case beaker-provision should handle better is when the power script crashes or is killed, and leaves behind child processes. So really it should kill the process group in all cases, not just when timeouts occur.

Comment 10 Raymond Mancy 2013-10-02 00:39:38 UTC
http://gerrit.beaker-project.org/#/c/2322/

Comment 14 Raymond Mancy 2013-10-23 07:00:19 UTC
beaker 0.15.1 has been released.

Comment 15 Raymond Mancy 2013-10-23 07:02:50 UTC
This change has been nominated to be back ported to the 0.14 branch, to be released as part of the next maintenance release 0.14.2.

Comment 16 Nick Coghlan 2013-10-25 06:36:38 UTC
Adjusting target milestone to make the changes backported to 0.14.2 easier to identify. 0.15.0 has enough significant regressions that it shouldn't be used, so the change means that 0.15.1 can be effectively reidentified as the union of that tag and the 0.14.2 target milestone.

Comment 18 Raymond Mancy 2013-10-29 05:55:27 UTC
Verified as per the original intructions on comment#13

Comment 19 Nick Coghlan 2013-11-07 01:47:49 UTC
Closing as addressed in Beaker 0.14.2.