991269 – beaker-watchdog dies if it cannot connect to the server

Bug 991269 - beaker-watchdog dies if it cannot connect to the server

Summary: beaker-watchdog dies if it cannot connect to the server

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Beaker
Classification:	Retired
Component:	lab controller
Sub Component:
Version:	0.13
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	26.0
Assignee:	Dan Callaghan
QA Contact:	tools-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-02 03:26 UTC by Dan Callaghan
Modified:	2018-10-08 02:16 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-10-08 02:16:46 UTC
Embargoed:

Attachments	(Terms of Use)

Description Dan Callaghan 2013-08-02 03:26:39 UTC

Beaker-watchdog is structured as a polling loop (of sorts), like beaker-provision. But if the server goes down or becomes unreachable while beaker-watchdog is running, the daemon dies completely instead of retrying each loop iteration until the server comes back.

Steps to reproduce:
1. Set up beaker-watchdog to run happily
2. Isolate beaker-watchdog from the server (e.g. stop httpd on the server, or use iptables)
3. Wait for the retry period to expire

Eventually the daemon dies like this:

Traceback (most recent call last):
  File "src/bkr/labcontroller/watchdog.py", line 127, in <module>
    main()
  File "src/bkr/labcontroller/watchdog.py", line 115, in main
    main_loop(watchdog, conf)
  File "src/bkr/labcontroller/watchdog.py", line 39, in main_loop
    watchdog.hub._login()
  File "/usr/lib/python2.6/site-packages/kobo/client/__init__.py", line 206, in _login
    if force or self._hub.auth.renew_session():
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
    verbose=self.__verbose
  File "/home/dcallagh/work/beaker/LabController/src/bkr/labcontroller/proxy.py", line 54, in request
    result = transport_class.request(self, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/kobo/xmlrpc.py", line 234, in _request
    self.send_content(h, request_body)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1349, in send_content
    connection.endheaders()
  File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders
    self._send_output()
  File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output
    self.send(msg)
  File "/usr/lib64/python2.6/httplib.py", line 739, in send
    self.connect()
  File "/usr/lib/python2.6/site-packages/kobo/xmlrpc.py", line 41, in connect
    httplib.HTTPConnection.connect(self)
  File "/usr/lib64/python2.6/httplib.py", line 720, in connect
    self.timeout)
  File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
    raise error, msg
error: [Errno 111] Connection refused

See also bug 734850 and
http://git.beaker-project.org/cgit/beaker/commit/?id=e92b6e9951e37db1c1e7d9c5c721edd4305291e4

Simple fix is to expand the exception types caught in main_loop at the bottom of the while loop. It might also be worth porting beaker-watchdog to use gevent, which makes it easy to write these kinds of polling loops with suitable error handling (as in beaker-provision).

Comment 3 matt jia 2016-03-30 00:05:05 UTC

On Gerrit:

   http://gerrit.beaker-project.org/#/c/4764/

Comment 4 Dan Callaghan 2017-10-19 23:15:57 UTC

This is getting urgent... On beaker-devel now that we are using the OpenStack integration more heavily, beaker-watchdog is dying regularly. I suspect it might be in a failure to fetch OpenStack console logs through the server. Unfortunately since we are on RHEL6 (without systemd) the traceback on stderr is lost, and we can't do any automatic restart logic either. Sigh.

Comment 5 Dan Callaghan 2017-10-20 06:15:42 UTC

Found the cause of the current crashes: bug 1504527. But we should really get this fixed too, to make beaker-watchdog more resilient instead of dying with an error on stderr that we will never see.

Comment 6 Dan Callaghan 2018-08-02 07:41:41 UTC

https://gerrit.beaker-project.org/#/c/beaker/+/6240

Comment 8 Dan Callaghan 2018-10-08 02:16:46 UTC

Beaker 26.0 has been released.

Note You need to log in before you can comment on or make changes to this bug.