Bug 824534 - [Beaker] RFE Default for scheduler to prioritise systems > 1 CPU
[Beaker] RFE Default for scheduler to prioritise systems > 1 CPU
Status: CLOSED CURRENTRELEASE
Product: Beaker
Classification: Community
Component: web UI (Show other bugs)
0.9
Unspecified Unspecified
urgent Severity high (vote)
: 0.12
: ---
Assigned To: Raymond Mancy
Amit Saha
Scheduler
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-23 13:23 EDT by Jeff Burke
Modified: 2014-12-07 20:12 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-04-11 00:57:35 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jeff Burke 2012-05-23 13:23:50 EDT
Description of problem:
 When folks are using the WebUI to schedule reserve jobs. We should default to selecting systems with greater then 1 CPU. Single CPU systems are rare, a mojority of the time associates that get a single CPU system don't not need a single CPU system.

Regards,
Jeff
Comment 1 Gurhan Ozen 2012-08-21 10:55:11 EDT
Is it possible to get this done rather quick? This is turning more into a functional requirement than an RFE, because we have very few single cpu x86 systems in beaker pool and it's taking pretty long to schedule them for our kerneltier1 jobs. Unfortunately we have a strict time restriction on reviewing tier1 results and this is starting to be an obstacle. 

And I would add, it shouldn't just be a webui reserve jobs. It should be for any jobs, unless specifically asked for a single cpu system or a hostname, it should default to multiple cpu systems. 

Thanks!
Comment 2 Dan Callaghan 2012-08-21 20:34:36 EDT
I think what we really want here is for Beaker to order systems by "rarity" when it is picking them, and then pick the least rare system that still satisfies the job's requirements.

We could hardcode the logic that "systems with >1 CPU are rarer than systems with 1 CPU", which is what you are asking for here. But Beaker actually knows how many CPUs each system has, so it can determine for itself what is rare. (For example, systems with 128 CPUs are probably just as rare as systems with 1 CPU.)

We need to be careful about adding more ORDER BY clauses to our queries for picking systems though. These queries are already huge and complicated and are prone to causing problems for MySQL. Maybe we could add a column for "rarity factory", just an int from 0-1000 which is computed offline (either in a cron job, or triggered every time a system's inventory info changes, or something like that). Then the queries for picking systems could just order by that column.

There are probably many other axes of rarity which could be factored in too (certain devices are much rarer than others, multiple NUMA nodes are rarer than a single one, large amounts of RAM are rarer than smaller amounts, etc). But to start with, we could just base it on number of CPUs.
Comment 3 Jeff Burke 2012-10-23 06:51:05 EDT
Minjung,
 Can you please evaluate this BZ. If possible can we have it resolved in 0.10 or the next release after that. Currently we are manualy policing the single cpu machines. For each kernel build we are sending personal emails asking users to release their reservations on single cpu systems.
Comment 5 Dan Callaghan 2012-10-23 22:54:57 EDT
(In reply to comment #4)

As a short term fix we can hardcode some logic to make single-CPU systems least preferred in the scheduler. But it will be defeated by anyone who uses <autopick random="True" /> in their job, and it will need re-working for the Beaker 1.0 groups features as well. It will just be a terrible hack. Estimate: 5

Estimate for the proper fix (order by rarity): 25
Comment 6 Raymond Mancy 2012-10-23 23:36:33 EDT
(In reply to comment #2)
> I think what we really want here is for Beaker to order systems by "rarity"
> when it is picking them, and then pick the least rare system that still
> satisfies the job's requirements.
> 

I'm not sure if rarity is exactly what we should be looking for, but rather a demand/supply ratio. This may or may not correlate to resources of which there are few available. 

> We could hardcode the logic that "systems with >1 CPU are rarer than systems
> with 1 CPU", which is what you are asking for here. But Beaker actually
> knows how many CPUs each system has, so it can determine for itself what is
> rare. (For example, systems with 128 CPUs are probably just as rare as
> systems with 1 CPU.)
> 
> We need to be careful about adding more ORDER BY clauses to our queries for
> picking systems though. These queries are already huge and complicated and
> are prone to causing problems for MySQL. Maybe we could add a column for
> "rarity factory", just an int from 0-1000 which is computed offline (either
> in a cron job, or triggered every time a system's inventory info changes, or
> something like that). Then the queries for picking systems could just order
> by that column.
> 

I think Bill mentioned this in a meeting at some point, but a monthly (or some other interval) re-assesment of what is in high demand for that month and what was not would be nice perhaps.

> There are probably many other axes of rarity which could be factored in too
> (certain devices are much rarer than others, multiple NUMA nodes are rarer
> than a single one, large amounts of RAM are rarer than smaller amounts,
> etc). But to start with, we could just base it on number of CPUs.
Comment 7 Raymond Mancy 2012-10-24 00:13:54 EDT
Thinking about this some more, I guess the only way we can determine what machines are in high demand is by analysing <hostRequires/> (and perhaps <distroRequires/> although I imagine this plays a much smaller role). The fact that certain hardware is being utilised to a large degree may not have any relationship to what a user's requirements are. 

We could have a cron job that analyses all the <hostRequires/> for recipes submitted within interval X, and then using some formula, come up with a 'rarity' ranking for each system. 

The main complication I can see is that specific intervals where statistics are gathered may not be representative of the general use case in beaker. i.e If there is a deadline coming up for team Y, they may sharply increase their usage of Beaker and they may have dispropotional requirements for certain hardware and once their deadline is met that hardware is no longer in demand. Increasing the sampling size is the obvious solution to this I guess.
Comment 8 Jeff Burke 2013-02-06 08:55:11 EST
Ray,
 Any chance we can get this BZ scheduled to be resolved?

Thanks,
Jeff
Comment 9 Min Shin 2013-02-12 01:30:57 EST
(In reply to comment #5)
> (In reply to comment #4)
> 
> As a short term fix we can hardcode some logic to make single-CPU systems
> least preferred in the scheduler. But it will be defeated by anyone who uses
> <autopick random="True" /> in their job, and it will need re-working for the
> Beaker 1.0 groups features as well. It will just be a terrible hack.
> Estimate: 5
> 
> Estimate for the proper fix (order by rarity): 25

Let's try the short term fix in 1.0, please.
Comment 10 Raymond Mancy 2013-02-18 01:47:58 EST
Hi Jeff,

Which are we talking about here, cores/processors/sockets ?
Comment 11 Jeff Burke 2013-02-18 07:54:50 EST
Raymond,
 It is processors <cpu_count op="=" value="1"/>

Thanks,
Jeff
Comment 12 Raymond Mancy 2013-02-20 01:32:34 EST
http://gerrit.beaker-project.org/#/c/1741/
Comment 14 Luigi Toscano 2013-03-13 05:58:31 EDT
I don't know if it is too late, but I would suggest, if possible, to exclude virtual systems from this (i.e. priority to systems > 1 CPU for real systems, no special priority for virtual machines). 

If I understand it correctly, the original requirement comes from the lack of bare metal systems with 1 CPU. I suspect that it would not be so strange to have, on the other side, many "small" 1 CPU virtual systems.
Comment 17 Dan Callaghan 2013-04-11 00:57:35 EDT
Beaker 0.12 has been released.

Note You need to log in before you can comment on or make changes to this bug.