Bug 632374 - beaker-watchdog doesn't clean up zombies
Summary: beaker-watchdog doesn't clean up zombies
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: lab controller
Version: 0.5
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Raymond Mancy
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 632609
TreeView+ depends on / blocked
 
Reported: 2010-09-09 19:13 UTC by Bill Peck
Modified: 2019-05-22 13:34 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-09-30 04:57:03 UTC
Embargoed:


Attachments (Terms of Use)

Description Bill Peck 2010-09-09 19:13:49 UTC
Description of problem:

If a monitor process spawned by beaker-watchdog dies for some reason it won't be reaped until the watchdog expires.


This patch fixes that:

diff --git a/LabController/proxy/src/bkr/labcontroller/proxy.py b/LabController/proxy/src/bkr/labcontr
index 7446376..295063f 100644
--- a/LabController/proxy/src/bkr/labcontroller/proxy.py
+++ b/LabController/proxy/src/bkr/labcontroller/proxy.py
@@ -272,6 +272,11 @@ class Watchdog(ProxyHelper):
     def active_watchdogs(self):
         """Monitor active watchdog entries"""
 
+        # Look for zombies
+        for watchdog_system in self.watchdogs.copy():
+            if self.is_finished(watchdog_system):
+                self.logger.info("Monitor for %s died" % watchdog_system)
+                del self.watchdogs[watchdog_system]
         active_watchdogs = []
         for watchdog in self.hub.recipes.tasks.watchdogs('active'):
             active_watchdogs.append(watchdog['system'])
@@ -306,6 +311,29 @@ class Watchdog(ProxyHelper):
                          'abort', 
                          'External Watchdog Expired')
 
+    def is_finished(self, system):
+        """Determine if monitor has died.
+        Calling os.waitpid removes finished child process zombies.
+        """
+
+        pid = self.watchdogs[system]
+
+        try:
+            (childpid, status) = os.waitpid(pid, os.WNOHANG)
+        except OSError, ex:
+            if ex.errno != errno.ECHILD:
+                # should not happen
+                self.logger.error("Monitor hasn't exited with errno.ECHILD: %s" % system)
+                raise
+
+            # the process is already gone
+            return False
+
+        if childpid != 0:
+            return True
+
+        return False
+
     def monitor(self, watchdog):
         """ Upload console log if present to Scheduler
              and look for panic/bug/etc..


Note You need to log in before you can comment on or make changes to this bug.