Bug 1643139

Summary: ERROR Failed to poll for watchdogs: RuntimeError: dictionary changed size during iteration
Product: [Retired] Beaker Reporter: Dan Callaghan <dcallagh>
Component: lab controllerAssignee: Christopher Beer <cbeer>
Status: CLOSED CURRENTRELEASE QA Contact: tools-bugs <tools-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 26CC: dcallagh
Target Milestone: 26.1Keywords: Patch, Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-31 17:34:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Callaghan 2018-10-25 14:57:23 UTC
Description of problem:
The polling loop in beaker-watchdog can sometimes fail with an exception.

Version-Release number of selected component (if applicable):
26.0

How reproducible:
unsure, happens intermittently on a busy LC

Steps to Reproduce:
1. Look at the logs as beaker-watchdog polls

Actual results:
bkr.labcontroller.watchdog ERROR Failed to poll for watchdogs
 Traceback (most recent call last):
   File "/usr/lib/python2.6/site-packages/bkr/labcontroller/watchdog.py", line 174, in main_loop
     watchdog.poll()
   File "/usr/lib/python2.6/site-packages/bkr/labcontroller/watchdog.py", line 158, in poll
     for recipe_id, greenlet in self.monitor_greenlets.iteritems():
 RuntimeError: dictionary changed size during iteration

Expected results:
shouldn't fail

Additional info:
Probably a regression in 26.0 as that particular code is new in 26.0 for bug 991269.

The log also shows this, which might be related:

Oct 25 14:51:56 lab-02 beaker-watchdog[31762]: bkr.labcontroller.watchdog ERROR Monitor greenlet <Greenlet at 0x2381f50: run_monitor(<bkr.labcontroller.proxy.Monitor object at 0x27d5f)> had unhandled exception: <Fault 1: "<type 'exceptions.ValueError'>:Cannot record result for finished task T:82797280">

Comment 2 Dan Callaghan 2018-10-26 14:52:23 UTC
https://gerrit.beaker-project.org/c/beaker/+/6320

Comment 3 Christopher Beer 2018-10-30 17:35:32 UTC
Submitted changes

Comment 4 Christopher Beer 2018-10-30 19:37:44 UTC
Updated the beaker-devel and both lab controllers, ready for testing

Comment 5 Christopher Beer 2018-10-30 19:39:58 UTC
Started a bunch of recipes, verified that the logs on lab controller 2 correctly identified that the monitoring started and failed.

Comment 6 Christopher Beer 2018-10-31 17:34:32 UTC
Beaker 26.1 has been released.