Bug 681966

Summary: [RFE] remove polling - bkr.labcontroller.watchdog
Product: [Retired] Beaker Reporter: Bill Peck <bpeck>
Component: web UIAssignee: Raymond Mancy <rmancy>
Status: CLOSED WONTFIX QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 0.7CC: bpeck, dcallagh, ebaak, mcsontos, rmancy, stl
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-26 00:36:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 681964    

Description Bill Peck 2011-03-03 18:20:53 UTC
Description of problem:

Currently the bkr.labcontroller.watchdog process checks for active and expired watchdogs every 60 seconds.

Investigate if this would be better served by event trigger from the message bus.

Comment 1 Raymond Mancy 2011-03-15 12:18:53 UTC
If I understand it correctly, there isn't actually a way to trigger when a watchdog expires/activates. 

So we have the option of either broadcasting from the server the same data that the watchdog currently gathers via RPC, or have the watchdog behave the same way it does now, except get it's data via the bus. Either way, I think it would be the same amount of cycles being used.

Originally I made it to work like the former, but because it's not really a proper event, I didn't think it was suitable, so changed it to the latter. 

What do you think ?

Comment 2 Bill Peck 2011-03-15 19:43:18 UTC
I would implement this in beakerd.  The current query can be run on the scheduler and will only send an event out to the lab controllers when there is a hit.  Any time we avoid sending/requesting data over the network is a win.

Just think about the extra paths involved in the network version:
  httpd
  mod_wsgi
  xmlrpc setup, encode, teardown

Comment 3 Raymond Mancy 2011-03-15 21:07:39 UTC
Oh I've still implemented is via the bus, but it works like this,

LC send msg on bus for active and expired watchdogs
Scheduler receives request forwarded from broker
Scheduler sends results to LC via bus.

I don't think there is a 'hit' or a 'miss' in this case is there?
I mean, even if the data returned is empty, the LC may still need to know about it doesn't it?

Although perhaps you could keep a stateful representation of the data server side, and then only send it when it changes?

Comment 4 Bill Peck 2011-03-16 21:32:21 UTC
So the lab controller still polls the scheduler then.  How about this workflow..

on first startup beaker-watchdog requests active and expired watchdog's just like it currently does, but after the initial request it no longer polls.

beakerd process sends new active watchdog entries and expired watchdog entries which beaker-watchdog is listening for.


Am I making this too complicated?  Just trying to remove unneeded polling.

Comment 5 Raymond Mancy 2011-03-21 03:33:36 UTC
That almost sounds right. Although we shouldn't need to do the original request.
We can put the msgs onto a durable queue so they won't disappear.

Also at the moment it seems we send a message to the lc informing of an expired watchdog, and the the lc calls the server to tell it to abort the task. I think we could skip the call back to stop the task and have the server change the status itself.

Comment 6 Bill Peck 2011-03-21 13:27:44 UTC
(In reply to comment #5)
> That almost sounds right. Although we shouldn't need to do the original
> request.
> We can put the msgs onto a durable queue so they won't disappear.

Nice.

> 
> Also at the moment it seems we send a message to the lc informing of an expired
> watchdog, and the the lc calls the server to tell it to abort the task. I think
> we could skip the call back to stop the task and have the server change the
> status itself.

We need to be careful here.  We don't want to make a system available for another test before the lab controller finishes copying the console log.  If we get this wrong we could truncate the previous recipes console log.

Comment 7 Raymond Mancy 2011-03-21 21:16:07 UTC
(In reply to comment #6)

> 
> We need to be careful here.  We don't want to make a system available for
> another test before the lab controller finishes copying the console log.  If we
> get this wrong we could truncate the previous recipes console log.

I knew there must've been a good reason for that.

Comment 8 Raymond Mancy 2012-09-26 00:36:34 UTC
This was implemented, and then unimplemented.
Another BZ can be opened at a time if we want to reimplement the unimplemented implementation (perhaps with a different messaging bus).