Bug 655009 - Getting right FQDNs for guests
Summary: Getting right FQDNs for guests
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: lab controller
Version: 0.5
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: 0.10.0
Assignee: Dan Callaghan
QA Contact: Amit Saha
URL:
Whiteboard: Misc
: 694400 701254 722908 730710 (view as bug list)
Depends On:
Blocks: 615785 862518
TreeView+ depends on / blocked
 
Reported: 2010-11-19 10:53 UTC by Gurhan Ozen
Modified: 2018-02-06 00:41 UTC (History)
15 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2012-11-22 06:44:06 UTC
Embargoed:


Attachments (Terms of Use)

Description Gurhan Ozen 2010-11-19 10:53:49 UTC
Description of problem:
This bug is spawned from RT #77381. There is a long standing issue of guests not getting the right FQDN because the guest domains are in a different subnet than the baremetal host machines. Not having a resolvable FQDN in guests makes multi-host testing in between guests impossible. We should have a way to get the guests have domain names.

Comment 1 Bill Peck 2010-11-19 14:52:29 UTC
The main changes would be:

1 - We would no longer need the guest-*.domain.com machines registered in beaker.  This in itself would be great since it would remove confusion for users who rightfully think they can reserve and provision them. 

2 - To keep it so that multi-host could work we would need to pass the recipeid to the kickstart so that the systems can request their recipe xml via the recipe id and not the system name.  This also fixes another bug when we fail to provision a system and we run the recipe on the wrong distro!

3 - After the system receives its recipe it would need to report back the fqdn of the system that the recipe is running on.

4 - The recipe record would need to support ad-hoc systems since we would be referencing systems that are not registered in beaker (dhcp-36-201.domain.com). 

5 - We would need some method to update the recipe roles and taskexecution roles.  It may just work if we force the client to pull the recipe xml again after everyone has checked in.

Comment 2 Marian Csontos 2011-04-07 09:14:07 UTC
(In reply to comment #1)

> 3 - After the system receives its recipe it would need to report back the fqdn
> of the system that the recipe is running on.

Any chance the FQDN will change over time, e.g. on reboot? DHCP may recycle IP addressed and IIUC the FQDN is a function of IP address. I also wonder if the guest will keep the same MAC address when stopped and started which would not help either.

Comment 3 Marian Csontos 2011-09-16 14:36:36 UTC
*** Bug 694400 has been marked as a duplicate of this bug. ***

Comment 4 Dan Callaghan 2012-09-04 22:29:59 UTC
This sounds like it might be tricky, though I have a feeling Steve might have solved this already for his RHEV-M patches.

Comment 5 Dan Callaghan 2012-09-11 04:54:45 UTC
The RHEV patches go most of the way towards achieving this, though in a very conservative way (conservative in terms of code and DB changes). There is an extra "check-in" step at the end of Anaconda %post where the system for the recipe is updated (and potentially created if it doesn't exist). It doesn't solve any of the issues Marian raises in comment 2 though. In particular, it won't handle the case where the system gets a different IP address/hostname after Anaconda reboots, which seems quite likely to me.

I think the real solution is to decouple Beaker's record of "what is running this recipe?" -- which might be a Beaker system, or a RHEV guest, or an EC2 instance, or something else -- from Beaker's record of an actual system, which is something discrete that can be reserved, loaned, owned by groups, etc.

All we really need to know about the former is how to terminate it if its watchdog expires. It might also be nice to record some inventory info about it, though that's not necessary.

Comment 7 Dan Callaghan 2012-09-18 12:10:16 UTC
So currently every Recipe is assigned a System when it's scheduled. My current thinking is that we remove that association, and introduce a new object. Call it Slave (terminology from Jenkins, I'm open to better suggestions). This represents "the thing that is running this recipe" and how to grab the thing and make it start running. We create a new one of these for every recipe when we schedule it.

Initially we would have two types of Slave:

* BeakerSlave or SystemSlave or MachineSlave or something: this is for recipes that are running on a machine reserved and provisioned through Beaker. It has an association to a System row. To start this slave we reserve it in Beaker and provision it. Each MachineRecipe would get one of these.

* GuestSlave: this is for GuestRecipes, the guests are created by /distribution/virtinstall so all we track for this is a FQDN.

Then we would add a RHEVSlave (or better, OvirtSlave) for recipes that are run on RHEV guests. There is no associated System in Beaker, instead we track which hypervisor and the id of the guest to pass to the ovirt API.

Later on we could have other types like EC2Slave or something like that as well.

Comment 8 Bill Peck 2012-09-18 12:33:33 UTC
(In reply to comment #7)
> So currently every Recipe is assigned a System when it's scheduled. My
> current thinking is that we remove that association, and introduce a new
> object. Call it Slave (terminology from Jenkins, I'm open to better
> suggestions). This represents "the thing that is running this recipe" and
> how to grab the thing and make it start running. We create a new one of
> these for every recipe when we schedule it.
> 
> Initially we would have two types of Slave:
> 
> * BeakerSlave or SystemSlave or MachineSlave or something: this is for
> recipes that are running on a machine reserved and provisioned through
> Beaker. It has an association to a System row. To start this slave we
> reserve it in Beaker and provision it. Each MachineRecipe would get one of
> these.
> 
> * GuestSlave: this is for GuestRecipes, the guests are created by
> /distribution/virtinstall so all we track for this is a FQDN.

Would we be able to configure how many GuestSlaves?  I can imagine that if we didn't we could exhaust the dhcp server.

But the blocking would need to be done at schedule time right?  Every guestrecipe would get assigned a GuestSlave but it wouldn't be able to be scheduled until $total_guest_slaves < MAX_GUEST_SLAVES.

> 
> Then we would add a RHEVSlave (or better, OvirtSlave) for recipes that are
> run on RHEV guests. There is no associated System in Beaker, instead we
> track which hypervisor and the id of the guest to pass to the ovirt API.

Maybe the RHEVSlave could keep track of the resources used internally or query the RHEVM system periodically to know if its exhausted?

> 
> Later on we could have other types like EC2Slave or something like that as
> well.

The GuestSlave seems fairly straight forward.  But how would the scheduling between MachineSlave and RHEVSlave work?

I still worry about efficiently scheduling against RHEVM hosts.  I don't like the idea of constantly hitting the RHEVM machine to see if it can schedule a recipe.  I'd like to see it done in a way where only when the state changes on the RHEVM machine do we try and schedule another recipe.  If RHEVM supported some kind of call back it would make this easier.

Comment 9 Dan Callaghan 2012-09-18 23:55:07 UTC
(In reply to comment #8)

The way I was thinking of it, there is *not* a fixed pool of Slaves. Instead we create a new one of the appropriate type for every recipe. So we don't schedule against the Slave objects at all -- they are just a way of tracking what is running each recipe, and how to start/stop it (if appropriate).

So for the GuestSlaves, when we start a GuestRecipe we create a new GuestSlave record with NULL fqdn, then once the guest picks up the recipe and starts executing it we update the GuestSlave record with the hostname assigned by DHCP. Nothing more. The whole point here is that we *can't* assign a particular hostname/address to GuestRecipes because we have no control over DHCP.

I think exhaustion of the DHCP pool is a problem for the administrator, for the same reason. We could build a feature to check and warn about this, but it would be quite tricky because Beaker has no knowledge of how the DHCP pools have been configured -- the right place to catch this is really in the DHCP server (it should log something if it can't satisfy DHCP requests because of pool exhaustion).

I agree with your concerns about the efficiency of RHEV scheduling but I think that is a separate issue. Again, the idea of the RHEVSlave/OvirtSlave is only to record a hypervisor + guest ID combination, so that we can kill off the guest at the end of the recipe (and also a hostname to show the user). The question of whether/how to schedule a recipe on RHEV vs. bare metal should be made by the scheduler *before* it creates a Slave object for the recipe.

Comment 11 Raymond Mancy 2012-09-19 07:29:00 UTC
This sounds like the right direction.

If you create the Slave object only as we are scheduling, how do we go about determining what resources are available to us (or even what kind of resources we are interested in)?  

Would we not need some kind of intemediary object attached to the recipe which would encapsulate the different options open to us (say MachineSlave, RHEVSlave), and then issue a command that would return to us what resource is currently available (if any) from the queued loop. This would then determine what our Slave object will be.

Comment 12 Dan Callaghan 2012-09-19 10:24:12 UTC
(In reply to comment #11)

No I don't want to tie this to the scheduling at all. Decisions about resources and where/how to run can be made by the scheduler. This is purely about keeping track of what is running the recipe and how we clean it up when the recipe ends.

I was thinking about this some more. I think a simpler approach is just to introduce new Watchdog types: SystemWatchdog and GuestWatchdog (and later OvirtWatchdog or whatever). I realised, that's pretty much what these "Slave" objects were in my head -- an expansion of Watchdog to track the different things that can be running a recipe. So this would do the same thing, but without the confusing "Slave" objects.

Steve's patches removed the Watchdog->System association, and I started building on that today, but I think what we need is actually the opposite: we keep an association from SystemWatchdog->System and just remove the assocation from Recipe->System.

Comment 14 Dan Callaghan 2012-09-28 02:08:41 UTC
*** Bug 722908 has been marked as a duplicate of this bug. ***

Comment 15 Dan Callaghan 2012-09-28 02:58:43 UTC
This is not ready in time for 0.9.4.

Comment 16 Dan Callaghan 2012-10-10 05:00:27 UTC
*** Bug 701254 has been marked as a duplicate of this bug. ***

Comment 17 Dan Callaghan 2012-10-12 05:55:32 UTC
This is almost done now, I think. First are a few preparatory patches:

http://gerrit.beaker-project.org/1070
http://gerrit.beaker-project.org/1382
http://gerrit.beaker-project.org/1383
http://gerrit.beaker-project.org/1418

Then there is this big one:

http://gerrit.beaker-project.org/1384

which removes the 'Virtual' system rows. Each recipe has a RecipeResource attached, meaning "the thing where this recipe is running". SystemResource means the recipe ran on a System from Beaker's inventory, and so we have a FK to the relevant system row. GuestResource means it ran on a virtual guest, so we only have a hostname from DHCP. GuestRecipes are also no longer scheduled, which lets us avoid some annoying problems (e.g. bug 701254).

Then we have this patch:

http://gerrit.beaker-project.org/1419

which adds the call to update the hostname for guests after installation. This will all dovetail nicely with the upcoming RHEV support, which has to do exactly the same thing.

Still left to do: update /distribution/virt/install to cope with these changes. The GUESTS env var won't go away but it will become mostly useless since there will be no hostnames in it. /distribution/virt/install will need to parse guest details directly from the host's recipe XML instead of using GUESTS. This will also be an opportunity to tidy up the code in that task a bit...

The only outstanding issue is recipe roles. Right now beah reads the recipe roles from the recipe XML once at the very start, and sets the role env vars from that. But the guests hostnames' won't be known until after they have been installed. Probably beah will need to be patched to re-check recipe roles at the start of every task and update the env vars in case they have changed.

Comment 19 Gurhan Ozen 2012-10-12 13:32:50 UTC
(In reply to comment #17)
> 
> Still left to do: update /distribution/virt/install to cope with these
> changes. The GUESTS env var won't go away but it will become mostly useless
> since there will be no hostnames in it. /distribution/virt/install will need
> to parse guest details directly from the host's recipe XML instead of using
> GUESTS. This will also be an opportunity to tidy up the code in that task a
> bit...
> 

  I don't understand this part. /distribution/virt/install doesn't rely on the hostnames passed by $GUESTS variable. The only time it uses that if guestname is not passed into to guestrecipe, in that case it uses the hostname as the guestname.

Comment 20 Dan Callaghan 2012-10-14 23:24:08 UTC
(In reply to comment #19)
>   I don't understand this part. /distribution/virt/install doesn't rely on
> the hostnames passed by $GUESTS variable. The only time it uses that if
> guestname is not passed into to guestrecipe, in that case it uses the
> hostname as the guestname.

It also looks up the kickstart URL for each guest using its hostname, which will no longer work.

Comment 21 Gurhan Ozen 2012-10-15 03:02:25 UTC
(In reply to comment #20)
> (In reply to comment #19)
> >   I don't understand this part. /distribution/virt/install doesn't rely on
> > the hostnames passed by $GUESTS variable. The only time it uses that if
> > guestname is not passed into to guestrecipe, in that case it uses the
> > hostname as the guestname.
> 
> It also looks up the kickstart URL for each guest using its hostname, which
> will no longer work.

Oh, yeah, that's right, every kickstart file is named after the guest's FQDNs. 

Thanks for the correction.

Comment 22 Dan Callaghan 2012-10-17 04:17:07 UTC
So this appears to be working pretty nicely now, with this additional patch to the harness to make it not use the hostname:

http://gerrit.beaker-project.org/1423

Then this patch makes the kickstart URL available in the results XML for /distribution/virt/install:

http://gerrit.beaker-project.org/1422

and then I have two patches for /distribution/virt/install which I can't put on Gerrit right now. This one fixes the looking-up of guest info:

http://dpaste.com/814576/

and this one reports in with Beaker to mark each guestrecipe as started right before starting virt-install. I think it fixes bug 615785 too:

http://dpaste.com/814577/

Comment 24 Dan Callaghan 2012-10-23 05:53:05 UTC
One more patch, to make Beaker manage MAC address allocation:

http://gerrit.beaker-project.org/1433

Comment 26 Dan Callaghan 2012-10-24 02:39:05 UTC
I still need to do something about recipe roles. Everything else is working nicely now.

Comment 27 Dan Callaghan 2012-10-25 22:37:28 UTC
(In reply to comment #26)

Scratch that, guest roles work fine because all guests are installed first (with /distribution/virt/install) and *then* all guests are started (by /distribution/virt/start) so when the harness starts up on each guest we already have the right hostnames. So multihost testing between guests will work fine.

Multihost testing between guests and hosts is still not supported with this bug fix though. It will need more work.

Comment 29 Dan Callaghan 2012-11-01 04:12:11 UTC
*** Bug 730710 has been marked as a duplicate of this bug. ***

Comment 31 Raymond Mancy 2012-11-21 06:28:49 UTC
I've also verfied this on my beaker env.

Comment 32 Raymond Mancy 2012-11-22 06:44:06 UTC
This has now been released


Note You need to log in before you can comment on or make changes to this bug.