Bug 1071364 - The difference between Available and Free systems is confusing
Summary: The difference between Available and Free systems is confusing
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Beaker
Classification: Retired
Component: scheduler
Version: 0.15
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: beaker-dev-list
QA Contact: tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-28 15:32 UTC by Alexander Todorov
Modified: 2020-10-21 14:19 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
: 1071372 (view as bug list)
Environment:
Last Closed: 2020-10-21 14:14:30 UTC
Embargoed:


Attachments (Terms of Use)

Description Alexander Todorov 2014-02-28 15:32:25 UTC
Description of problem:

So I have a job to provision the latest Rawhide snapshot which tells me there are 360 possible systems to run my job:
https://beaker.engineering.redhat.com/recipes/systems?recipe_id=1247850

while at the same time it has been sitting idle for quite some time. 

So I start looking at the systems list and notice several problems:

* I see systems in many locations: RDU and BOS to name a few, while the distro selected for provisioning is only available at BRQ. 

* I see systems which are owned by Kernel QE and I can't use them due to groups ACLs however they appear in the list above.

* I see at least 3 systems which are Condition: Automated, Type: Machine, in BRQ BUT LOANED to another used (while User: is blank); 


So out of 360 possible systems it turns out there's actually NONE which can execute my job so setting this to High Prio.

Comment 2 Bill Peck 2014-02-28 16:13:20 UTC
(In reply to Alexander Todorov from comment #0)
> Description of problem:
> 
> So I have a job to provision the latest Rawhide snapshot which tells me
> there are 360 possible systems to run my job:
> https://beaker.engineering.redhat.com/recipes/systems?recipe_id=1247850
> 
> while at the same time it has been sitting idle for quite some time. 
> 
> So I start looking at the systems list and notice several problems:
> 
> * I see systems in many locations: RDU and BOS to name a few, while the
> distro selected for provisioning is only available at BRQ. 

This is by design.  When a distro is first built it may only be available in one location, but after it is synced to the other locations it could run there.  But, for this to work properly when a distro is expired/deleted from one lab then it needs to be expired/deleted from *ALL* labs.  Please open a ticket with eng-ops asking for this.  I'm pretty sure this has been asked for in the past, not sure why its not happening.

> 
> * I see systems which are owned by Kernel QE and I can't use them due to
> groups ACLs however they appear in the list above.

Not sure about this one.  I'll let the current beaker devs answer this one.

> 
> * I see at least 3 systems which are Condition: Automated, Type: Machine, in
> BRQ BUT LOANED to another used (while User: is blank); 

And if those systems are returned they will then be available for your job.  The list of available systems for this recipe should include a column for loan or some additional info for the user to see that the system really isn't available.

> 
> 
> So out of 360 possible systems it turns out there's actually NONE which can
> execute my job so setting this to High Prio.

Comment 3 Dan Callaghan 2014-03-02 22:35:31 UTC
The list of "possible systems" are those systems which could *ever* run the recipe, not ones which are free to run it right now. As Bill mentioned it includes systems currently loaned to someone else and systems in other labs. So the fact that there are 360 systems listed as "possible" but the recipe still hasn't started, is not necessarily a bug.

(In reply to Alexander Todorov from comment #0)
> * I see systems which are owned by Kernel QE and I can't use them due to
> groups ACLs however they appear in the list above.

Which system(s)?

Note that as of Beaker 0.15 we have access policies, the permissions are no longer controller by system group membership. That means that a system owned by kernel QE or in kernel QE group can still grant everybody permission to reserve it.

Comment 4 Alexander Todorov 2014-03-04 10:37:52 UTC
(In reply to Bill Peck from comment #2)
> (In reply to Alexander Todorov from comment #0)
> > Description of problem:
> > 
> > So I have a job to provision the latest Rawhide snapshot which tells me
> > there are 360 possible systems to run my job:
> > https://beaker.engineering.redhat.com/recipes/systems?recipe_id=1247850
> > 
> > while at the same time it has been sitting idle for quite some time. 
> > 
> > So I start looking at the systems list and notice several problems:
> > 
> > * I see systems in many locations: RDU and BOS to name a few, while the
> > distro selected for provisioning is only available at BRQ. 
> 
> This is by design.  When a distro is first built it may only be available in
> one location, but after it is synced to the other locations it could run
> there.  But, for this to work properly when a distro is expired/deleted from
> one lab then it needs to be expired/deleted from *ALL* labs.  Please open a
> ticket with eng-ops asking for this.  I'm pretty sure this has been asked
> for in the past, not sure why its not happening.
> 

Hi Bill,
this is not the case here. Rawhide at present is only imported in BRQ and will not be available in other labs.

Comment 5 Alexander Todorov 2014-03-04 10:41:39 UTC
(In reply to Dan Callaghan from comment #3)
> The list of "possible systems" are those systems which could *ever* run the
> recipe, not ones which are free to run it right now. As Bill mentioned it
> includes systems currently loaned to someone else and systems in other labs.
> So the fact that there are 360 systems listed as "possible" but the recipe
> still hasn't started, is not necessarily a bug.
> 

Hmm, then I guess we need another view to show how many systems can execute the job right now! Seeing 360 possible systems and my job staying idle is at least very confusing.

Comment 6 Dan Callaghan 2014-03-04 22:59:29 UTC
(In reply to Alexander Todorov from comment #5)
> Hmm, then I guess we need another view to show how many systems can execute
> the job right now! Seeing 360 possible systems and my job staying idle is at
> least very confusing.

If your job has stayed Queued for more than 20 seconds (give or take), then by definition there are no systems which can execute it right now. I don't think we need another view to say that :-)

Comment 7 Nick Coghlan 2014-03-05 04:22:24 UTC
I think this is a symptom of an underlying issue that the distinction between "Available", "Possible" and "Free" isn't presented clearly or consistently as part of the user experience (or even in the Architecture Guide), yet is critical to understanding Beaker's operation.

Free = available to run your recipe right now
Possible = potentially available to run your recipe at some point in the future, but not available right now due to some factor that is expected to be transient (in use by someone else, loaned to someone, distro not currently available in lab)
Available = Free or Possible

In this case, when you're inclined to go look at the possible systems list for a recipe, it's almost certainly because it has remained queued for some time. At the moment, the second column in that view is always just the current user of the system, which leaves out the other reasons a system may not be free to run your recipe.

What if, instead of just being "User", the second column was instead "Waiting due to..." with possible entries like:

- Distro <distro> not available in <lab>
- Guest distro <distro> not available in <lab>
- Loaned to <user> since <time>
- Running <recipe> for <user> since <time>
- Reserved by <user> since <time>

This wouldn't be especially *easy* to do, but I think it would significantly improve the usability of that screen.

A potentially simpler multi-column variant of the same idea would be to just add new columns to that view for "Loan Recipient" and "Distro Available".

Comment 8 Ales Zelinka 2015-01-31 15:00:19 UTC
Another example of "Possible systems" being confusing actually misleading:

https://beaker.engineering.redhat.com/jobs/864396

 * Multihost (x86_64 + ppc64) recipe, lab-agnostic (lab not specified when submitting job)
 * 1st machine beaker picked was a x86_64 machine from BRQ lab 
 * the distro for ppc64 is not present in brq => no machine will be ever found => deadlock

Expected results:

"Possible systems" for the pcpc64 is 0.
 

Actual results:
 
"Possible systems" for the pcpc64 is 120, it includes:
 * brq systems even though those system can't install the requested distro
 * bos systems even though the recipeset is locked to BRQ because of beaker's choice of the x86_64 system allocation

Comment 9 Dan Callaghan 2015-02-03 11:13:33 UTC
(In reply to Ales Zelinka from comment #8)
> "Possible systems" for the pcpc64 is 120, it includes:
>  * brq systems even though those system can't install the requested distro

As Nick mentioned in the other bug, Beaker assumes that if a distro isn't available in some labs it will become available "soon" once it is synced.

>  * bos systems even though the recipeset is locked to BRQ because of
> beaker's choice of the x86_64 system allocation

This one is surprising to me and seems like a bug.

Comment 10 Ales Zelinka 2015-02-03 18:24:51 UTC
(In reply to Dan Callaghan from comment #9)
> (In reply to Ales Zelinka from comment #8)
> > "Possible systems" for the pcpc64 is 120, it includes:
> >  * brq systems even though those system can't install the requested distro
> 
> As Nick mentioned in the other bug, Beaker assumes that if a distro isn't
> available in some labs it will become available "soon" once it is synced.

Can the "possible systems" list be updated as conditions change?

Comment 11 Nick Coghlan 2015-02-05 06:16:29 UTC
We actually have two definitions of "possible systems". The first one is the possible systems without accounting for distro availability or current system usage, which we calculate for each recipe as soon as the job is received. This is saved on the recipe and used as a base for the second definition, which is the dynamic one (recalculated on each scheduler pass) that accounts for current distro availability and system availability.

The rationale behind only showing the first list in this part of the UI is that if the second list isn't empty, we should have claimed one of those systems for the recipe.

So I agree with Dan that if the recipe UI is listing BOS systems for a recipe set that has already been locked to BRQ, something's gone wrong somewhere.


Note You need to log in before you can comment on or make changes to this bug.