Bug 832250

Summary: beaker-provision has a race between creating and reaping child processes
Product: [Retired] Beaker Reporter: Dan Callaghan <dcallagh>
Component: lab controllerAssignee: Dan Callaghan <dcallagh>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 0.9CC: bpeck, dcallagh, rmancy, stl
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-26 06:40:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Callaghan 2012-06-15 01:41:30 UTC
The following exceptions were observed:

2012-06-14 14:07:05 [ERROR   ] { 9879} bkr.labcontroller.provision:85 Error processing command 3323
Traceback (most recent call last):
   File "/usr/lib/python2.6/site-packages/bkr/labcontroller/provision.py", line
73, in handle
     handle_power(dict(command.items() + [('action', u'off')]))
   File "/usr/lib/python2.6/site-packages/bkr/labcontroller/provision.py", line 168, in handle_power
     timeout=300)
   File "/usr/lib/python2.6/site-packages/bkr/labcontroller/async.py", line 78, in __init__
     super(MonitoredSubprocess, self).__init__(*args, **kwargs)
   File "/usr/lib64/python2.6/subprocess.py", line 639, in __init__
     errread, errwrite)
   File "/usr/lib64/python2.6/subprocess.py", line 1138, in _execute_child
     self.pid = os.fork()
   File "/usr/lib64/python2.6/site-packages/gevent/hub.py", line 162, in fork
     result = _original_fork()
   File "/usr/lib/python2.6/site-packages/bkr/labcontroller/async.py", line 90, in _sigchld_handler
     if child.poll() is not None:
   File "/usr/lib64/python2.6/subprocess.py", line 755, in poll
     return self._internal_poll()
   File "/usr/lib64/python2.6/subprocess.py", line 1256, in _internal_poll
     pid, sts = _waitpid(self.pid, _WNOHANG)
TypeError: an integer is required

2012-06-14 11:22:21 [ERROR   ] {10593} bkr.labcontroller.provision:85 Error processing command 3346
Traceback (most recent call last):
   File "/usr/lib/python2.6/site-packages/bkr/labcontroller/provision.py", line
74, in handle
     handle_power(dict(command.items() + [('action', u'on')]))
   File "/usr/lib/python2.6/site-packages/bkr/labcontroller/provision.py", line
168, in handle_power
     timeout=300)
   File "/usr/lib/python2.6/site-packages/bkr/labcontroller/async.py", line 78, in __init__
     super(MonitoredSubprocess, self).__init__(*args, **kwargs)
   File "/usr/lib/python2.6/site-packages/bkr/labcontroller/async.py", line 90, in _sigchld_handler
     if child.poll() is not None:
   File "/usr/lib64/python2.6/subprocess.py", line 755, in poll
     return self._internal_poll()
   File "/usr/lib64/python2.6/subprocess.py", line 1254, in _internal_poll
     if self.returncode is None:
AttributeError: 'MonitoredSubprocess' object has no attribute 'returncode'

In both cases it is because we are calling .poll() on a MonitoredSubprocess instance which has not been fully initialised yet.

Comment 1 Dan Callaghan 2012-06-15 02:48:02 UTC
I have a test which reliably reproduces both of those exceptions (you get one or the other at random).

I think the best fix is to move the actual work out of the SIGCHLD handler and into a separate greenlet that is scheduled as normal. Then it does not race with or interfere with anything else. And if it does raise for some reason, it won't screw up the whole gevent hub. In retrospect, I should have done it that way originally.

http://gerrit.beaker-project.org/1138

Ultimately (when gevent 1.0 is released) we should get away from installing our own SIGCHLD handler at all, and use gevent's child handling which is properly tied in to the libev event loop etc.

Comment 3 Dan Callaghan 2012-06-26 06:40:55 UTC
Beaker 0.9.0 has been released.