Bug 828927 - [Bugfix] provide link to last kickstart / distro_tree
Summary: [Bugfix] provide link to last kickstart / distro_tree
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: scheduler
Version: 0.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Dan Callaghan
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 834462
TreeView+ depends on / blocked
 
Reported: 2012-06-05 15:10 UTC by Bill Peck
Modified: 2019-05-22 13:42 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-26 06:41:32 UTC
Embargoed:


Attachments (Terms of Use)

Description Bill Peck 2012-06-05 15:10:53 UTC
Description of problem:

After talking some more with Arlinton we came to the conclusion that we really do want a link to the last kickstart and the last distro that was provisioned to a system.

Having this information will allow a system to query beaker and re-provision itself if need be.

http://beaker.example.com/kickstarts/FQDN

and a proxy method for retrieving the last distro modeled after get_my_recipe.

proxy.get_my_distro(dict())
Accepts a dict with either system_name or recipe_id
    if system_name is defined return distro_tree for that system
    if recipe_id is defined return distro_tree for that id

Sanity checking should be done, only return the distro_tree if that distro_tree is still on the systems lab controller.

Version-Release number of selected component (if applicable):
0.9.0

Comment 1 Dan Callaghan 2012-06-08 02:55:57 UTC
What details for the distro tree would we need to return here? I assume just the Beaker distro tree ID is not of much use. If the system is going to reprovision itself then I suppose it needs the URL for the kernel and initrd from the distro tree. And what about kernel options? If there were any kernel options set when it was originally provisioned, it probably needs those again too?

As for the kickstart... I would be very reluctant to support something like

http://beaker.example.com/kickstarts/FQDN

There's not really any nice way of achieving that with our current code, just because right now we generate a kickstart at provisioning time and then essentially forget about it. To do this properly I would want to add the installation lifecycle tracking that we have discussed -- but that's quite a big change for 0.9.0.

The only other place we keep track of the kickstart is attached to the recipe. So we can easily do something like this:

http://beaker.example.com/kickstarts/by-recipe/RECIPEID

Otherwise, if the system wants to keep track of the kickstart URL then I think it should do that itself. What about a %pre or %post scriptlet which grabs the ks= argument from /proc/cmdline and writes it to /root/ks-url.txt on the installed system?

Comment 2 Arlinton Bourne 2012-06-08 04:37:01 UTC
Forgetting about the kickstart is a bad idea and a regression (for the last install). This makes troubleshooting (adding new machines into beaker) a pain for the administrators. Making the assumption that anaconda actually works is usually bad especically for those testing nightlys and other un-released distros.

Keeping as much information about the last provision and making it easily accessible make it easier to troubleshoot obscure errors on obscure hardware and allows for information reuse for the stable system provision model. 

Another way you can think of querying info could be by:
http://beaker.example.com/s/FQDN/ks
http://beaker.example.com/s/FQDN/cmdline

This way you can provide the lastest relevant information easily to users/scripts etc.

Also does provisioning via web UI generate a RECIPEID?

Comment 3 John W. Lockhart 2012-06-08 22:55:57 UTC
I agree fully with Arlinton: the items used for provisioning a system should be preserved for debugging/audit/reproducing/inspection.  There are a ton of things that can prevent getting a copy from the (partially?) provisioned system after the fact.

They needn't be preserved for all eternity, but the most recent ones are highly likely to be of interest on a fairly regular basis.  There should be an easy and reliable way of getting copies.

Comment 4 Dan Callaghan 2012-06-12 02:23:53 UTC
(In reply to comment #2)
> Forgetting about the kickstart is a bad idea and a regression (for the last
> install). This makes troubleshooting (adding new machines into beaker) a
> pain for the administrators. Making the assumption that anaconda actually
> works is usually bad especically for those testing nightlys and other
> un-released distros.

Beaker 0.9 has actually improved over the current situation here. For every provision we generate a kickstart which will have a unique URL like

https://beaker.example.com/kickstart/ID

These live forever so you can refer to any previous kickstart at any time. Unlike Cobbler, where all you get is

https://labcontroller.example.com/cblr/svc/op/ks/system/FQDN

which changes every time the system is re-provisioned.

In Beaker 0.9 we are storing an association from recipe to generated kickstart, so that we always have a record of what kickstart was used for every recipe.

> Keeping as much information about the last provision and making it easily
> accessible make it easier to troubleshoot obscure errors on obscure hardware
> and allows for information reuse for the stable system provision model. 
> 
> Another way you can think of querying info could be by:
> http://beaker.example.com/s/FQDN/ks
> http://beaker.example.com/s/FQDN/cmdline
> 
> This way you can provide the lastest relevant information easily to
> users/scripts etc.

We could do this, by looking at the most recent configure_netboot command for the system. But the command line would be pretty useless on its own. You would still need to fetch the kernel and initrd from the distro tree. We are not keeping those in /var/lib/tftpboot permanently like Cobbler currently does.

> Also does provisioning via web UI generate a RECIPEID?

No. That is the real problem here. "Manual provisioning" (Beaker's phrase for provisioning directly in the web UI without going through the scheduler) is not really a first-class citizen in Beaker like scheduled recipes are. That's the reason why we suggested stable systems be provisioned by running a Beaker job.

(In reply to comment #3)
> I agree fully with Arlinton: the items used for provisioning a system should
> be preserved for debugging/audit/reproducing/inspection.  There are a ton of
> things that can prevent getting a copy from the (partially?) provisioned
> system after the fact.
> 
> They needn't be preserved for all eternity, but the most recent ones are
> highly likely to be of interest on a fairly regular basis.  There should be
> an easy and reliable way of getting copies.

John, I agree. As I mentioned above, for scheduled recipes Beaker 0.9 will already store the generated kickstart with the recipe (even if the installation doesn't succeed) so we have improved over the current situation there.

This bug is about stable systems provisioning which uses "manual" provisioning rather than going through Beaker's scheduler.

Comment 5 Bill Peck 2012-06-15 15:09:29 UTC
Trying to get a grasp on the issues here:

- I don't believe this bug is a blocker for 0.9.0 because we can still manually provision stable systems.  The only thing we are missing is a stable system being able to re-provision itself (aka koan)
- We are going to put a proper lifecycle management in beaker but that won't be in 0.9.0, when we do that I believe that will satisfy this bug.


Arlinton, can you confirm that this isn't a blocker for 0.9?

Thanks

Comment 6 Arlinton Bourne 2012-06-15 15:34:52 UTC
(In reply to comment #5)
> Trying to get a grasp on the issues here:
> 
> - I don't believe this bug is a blocker for 0.9.0 because we can still
> manually provision stable systems.  The only thing we are missing is a
> stable system being able to re-provision itself (aka koan)
> - We are going to put a proper lifecycle management in beaker but that won't
> be in 0.9.0, when we do that I believe that will satisfy this bug.
> 
> 
> Arlinton, can you confirm that this isn't a blocker for 0.9?
> 
> Thanks

This really isn't an RFE and I'm not sure what this has to do with anything with Life cycle management. What it certainly is - it is a regression. Using the test systems as a data store (what was last used, etc) is unacceptable. This should be queryable via the infra that provisioned the system.

So yeah, I'd say this is a blocker.

Regards,
~Arlinton

Comment 7 John W. Lockhart 2012-06-15 17:44:56 UTC
Agreed, this is a blocker.

The stable systems have been re-provisioning themselves for years, once every two weeks, or on demand from the command line on the stable system.

This BZ affects that functionality.  GUI reinstalls of dozens of boxes isn't a very good workaround.

Comment 8 Bill Peck 2012-06-15 18:16:27 UTC
I could be wrong but I think Dan wanted lifecycle management because requesting a provision and one actually succeeding can be different.  But we don't need that for this right now.

Are we arguing over terminology?  I believe Arlinton and John are looking for Last Requested Distro, they will worry about if the system actually succeeded in installing it or not.

Comment 9 Arlinton Bourne 2012-06-15 19:24:29 UTC
(In reply to comment #8)
> I could be wrong but I think Dan wanted lifecycle management because
> requesting a provision and one actually succeeding can be different.  But we
> don't need that for this right now.
> 
> Are we arguing over terminology?  I believe Arlinton and John are looking
> for Last Requested Distro, they will worry about if the system actually
> succeeded in installing it or not.

Nope we want the last kickstart and all of its contents.

Comment 10 Arlinton Bourne 2012-06-15 19:54:50 UTC
(In reply to comment #8)
> I could be wrong but I think Dan wanted lifecycle management because
> requesting a provision and one actually succeeding can be different.  But we
> don't need that for this right now.
> 
> Are we arguing over terminology?  I believe Arlinton and John are looking
> for Last Requested Distro, they will worry about if the system actually
> succeeded in installing it or not.

Nope we want the last kickstart used and all of its contents accessible by the hostname.

Comment 11 Steven Lawrance 2012-06-15 20:01:39 UTC
(In reply to comment #10)
> Nope we want the last kickstart used and all of its contents accessible by
> the hostname.

Does that ever include distros which may have expired?

Comment 12 Arlinton Bourne 2012-06-15 20:09:56 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > Nope we want the last kickstart used and all of its contents accessible by
> > the hostname.
> 
> Does that ever include distros which may have expired?

It should not. In fact stable systems should only use distros that are released and supported. Released and supported distros should never expire.

Comment 13 Steven Lawrance 2012-06-15 20:44:44 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > Does that ever include distros which may have expired?
> 
> It should not. In fact stable systems should only use distros that are
> released and supported. Released and supported distros should never expire.

So what pieces are missing for being able to do this in a job?  IMHO that would be the ideal solution.

All I can really think of is perhaps some protections/controls to prevent other jobs from being scheduled on stable systems, in addition to group permissions.  It could be nice to have a long-running task that could keep the machine reserved, but that might not even be necessary.

Comment 14 Arlinton Bourne 2012-06-15 21:06:35 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > (In reply to comment #11)
> > > Does that ever include distros which may have expired?
> > 
> > It should not. In fact stable systems should only use distros that are
> > released and supported. Released and supported distros should never expire.
> 
> So what pieces are missing for being able to do this in a job?  IMHO that
> would be the ideal solution.
> 
> All I can really think of is perhaps some protections/controls to prevent
> other jobs from being scheduled on stable systems, in addition to group
> permissions.  It could be nice to have a long-running task that could keep
> the machine reserved, but that might not even be necessary.

This keeps coming up and really is outside of the scope of this regression. Re-engineering a solution simply because of regression is out of the question.

Comment 15 Dan Callaghan 2012-06-15 21:30:05 UTC
Arlinton, it seems this problem has arisen because stable systems are relying on Cobbler features in a way that I wasn't aware of. "Regression" implies that the change is accidentally, but removing Cobbler has been our definite goal :-)

If provisioning stable systems by using Beaker jobs is out of the question for you, then we will try to find a workaround (even if it is hacky, at least on the Beaker side).

Outside of a Beaker job, we don't keep track of the kickstart used for each system. Okay fine, we can add something in to do this. But what you really need is more than that.

You actually want koan, right? A utility which can:
* figure out which distro to re-install, and where it's stored
* grab the kernel and initrd images from it and write them to /boot
* add a grub entry for these images, pointing at a kickstart stored on a server somewhere
* boot the new grub entry

koan is completely tied to Cobbler's API. So using koan itself is out of the question. That leaves only two choices as I see it -- write a koan equivalent for Beaker, or deploy Cobbler for the stable systems.

Or just use Beaker jobs to provision stable systems.

Have I understood the situation here?

Comment 16 Arlinton Bourne 2012-06-15 21:50:44 UTC
(In reply to comment #15)
> Arlinton, it seems this problem has arisen because stable systems are
> relying on Cobbler features in a way that I wasn't aware of. "Regression"
> implies that the change is accidentally, but removing Cobbler has been our
> definite goal :-)
Which is why it's a regression. If you're replacing cobbler with something else and we're losing functionality that being used, it's a regression.

> 
> If provisioning stable systems by using Beaker jobs is out of the question
> for you, then we will try to find a workaround (even if it is hacky, at
> least on the Beaker side).
> 
> Outside of a Beaker job, we don't keep track of the kickstart used for each
> system. Okay fine, we can add something in to do this. But what you really
> need is more than that.

Providing the last used kickstart in an easily accessible manner (ala hostname) is sufficient. Any extra bits that would be useful in making things easier to provision, John Lockhart can shine some light on it. 


> You actually want koan, right? A utility which can:
> * figure out which distro to re-install, and where it's stored
> * grab the kernel and initrd images from it and write them to /boot
> * add a grub entry for these images, pointing at a kickstart stored on a
> server somewhere
> * boot the new grub entry
> 
> koan is completely tied to Cobbler's API. So using koan itself is out of the
> question. That leaves only two choices as I see it -- write a koan
> equivalent for Beaker, or deploy Cobbler for the stable systems.
We are not using koan, but we're using our own solution (that actually works), named OATS, that spoke to cobbler. John Lockhart can speak on the details on that end.


> Or just use Beaker jobs to provision stable systems.
Using the scheduler or having these machines in any 'automated' state is unacceptable at this point in time. Lets not change things because something was overlooked.

> Have I understood the situation here?

For the most part. 

John, if you could, can you outline any other possible requirements needed for OATS to provision the system?

Comment 22 Dan Callaghan 2012-06-20 05:52:28 UTC
Patch is on Gerrit: http://gerrit.beaker-project.org/1153

Unfortunately it's XMLRPC and not a RESTful interface, because we can't serve ordinary HTTP requests from beaker-proxy.

The proposed API is:

get_installation_for_system(fqdn) ->
{'kernel_url': ..., 'initrd_url': ..., 'kernel_options': ...}

Comment 27 Dan Callaghan 2012-06-21 03:24:03 UTC
Patch updated to return distro tree URLs.

Comment 29 Dan Callaghan 2012-06-21 22:29:03 UTC
This bug has been addressed in Beaker 0.9.0-3 which is currently running on stage:

https://beaker-stage.engineering.redhat.com/

Comment 34 Dan Callaghan 2012-06-26 06:41:32 UTC
Beaker 0.9.0 has been released.


Note You need to log in before you can comment on or make changes to this bug.