Bug 632624 - [RFE] use messaging for tests to upload data
Summary: [RFE] use messaging for tests to upload data
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Beaker
Classification: Retired
Component: lab controller
Version: 0.5
Hardware: All
OS: Linux
low
medium
Target Milestone: future_maint
Assignee: Bill Peck
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 632609
TreeView+ depends on / blocked
 
Reported: 2010-09-10 15:29 UTC by Bill Peck
Modified: 2012-09-28 00:53 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-09-28 00:53:59 UTC
Embargoed:


Attachments (Terms of Use)

Description Bill Peck 2010-09-10 15:29:30 UTC
Description of problem:
Currently we use xmlrpc from the test machines and from the lab controller to the scheduler.

This bug is to investigate using messaging (qpid/qmf/etc..) from the test machines and lab controller.

Comment 1 Kevin Baker 2010-09-10 17:42:50 UTC
We need to breakdown the information that is being sent from the test hosts thru the Lab 
Controller's back to the Scheduler. In my simple view this is two things

 1) meta data about the state of execution. small in size. 
 2) multiple file logs. small to huge in size

I recommend we start examining them separately to see if we can build a more scalable architecture. If we end up building a storage pool per lab controller then we don't need to strain the pipes by uploading (2). Just store it locally and send meta information up to the Scheduler so it can track where everything is and provide pointers to get to it. This makes more sense when you think about coordinating the partner labs as well. 

Using messaging for (1) makes sense since the test hosts can fire-and-forget their status messages. The scheduler can then process them at its leisure. Separately the Scheduler can publish a message to a message queue about the status of a job. For example TCMS could subscribe to that msg queue for information on jobs it submitted to Beaker.

Comment 2 Bill Peck 2010-09-10 17:50:24 UTC
I don't see how having a separate machine for the logs helps us at all.  Were still transfering the logs from the test machines to one machine in the lab.  You've just replaced the lab controller with another machine.

Comment 3 Kevin Baker 2010-09-10 20:55:50 UTC
(In reply to comment #2)
> I don't see how having a separate machine for the logs helps us at all.  Were
> still transfering the logs from the test machines to one machine in the lab. 
> You've just replaced the lab controller with another machine.

You don't think it reduces the load of sending megabytes of data over the wire to the scheduler? I would think that both the LC & Scheduler would be less loaded and more capable of serving more clients.

And, how will it work in the case of the partner labs? Do we expect them to upload logs over the internet? Effectively they would store it locally and we'd just get the results uploaded no?

Comment 4 Bill Peck 2010-09-10 21:06:17 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > I don't see how having a separate machine for the logs helps us at all.  Were
> > still transfering the logs from the test machines to one machine in the lab. 
> > You've just replaced the lab controller with another machine.
> 
> You don't think it reduces the load of sending megabytes of data over the wire
> to the scheduler? I would think that both the LC & Scheduler would be less
> loaded and more capable of serving more clients.

To the Scheduler yes.

> 
> And, how will it work in the case of the partner labs? Do we expect them to
> upload logs over the internet? Effectively they would store it locally and we'd
> just get the results uploaded no?

If the lab controller hosted the data then I would be more open to this idea.  I think what is throwing me off is adding another machine in here.

Comment 5 Kevin Baker 2010-09-10 22:22:47 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > I don't see how having a separate machine for the logs helps us at all.  Were
> > > still transfering the logs from the test machines to one machine in the lab. 
> > > You've just replaced the lab controller with another machine.
> > 
> > You don't think it reduces the load of sending megabytes of data over the wire
> > to the scheduler? I would think that both the LC & Scheduler would be less
> > loaded and more capable of serving more clients.
> 
> To the Scheduler yes.

But the LC proxies them on so it still needs to move those bits through some process, correct? If we have over 1000 machines sending bundles of data up surely that becomes an issue at some point?

> > And, how will it work in the case of the partner labs? Do we expect them to
> > upload logs over the internet? Effectively they would store it locally and we'd
> > just get the results uploaded no?
> 
> If the lab controller hosted the data then I would be more open to this idea. 
> I think what is throwing me off is adding another machine in here.

I wasn't really thinking about whether it should be a separate machine or not. I guess the answer depends. I would make sense if the logs were served out from the LC. But I guess it's a question of how much load if you want it to be responsible for writes too.

Comment 6 Dan Callaghan 2012-09-28 00:53:59 UTC
I think we have agreed that using AMQP for communication inside Beaker is not a good choice. We could investigate doing something similar to this using ZeroMQ in future if the need arises.


Note You need to log in before you can comment on or make changes to this bug.