Bug 1519589
Summary: | schedule_queued_recipes() query is too complicated and fragile | ||
---|---|---|---|
Product: | [Retired] Beaker | Reporter: | Dan Callaghan <dcallagh> |
Component: | general | Assignee: | Dan Callaghan <dcallagh> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | tools-bugs <tools-bugs> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 24 | CC: | achatter, dcallagh, mjia, rjoost |
Target Milestone: | 25.0 | Keywords: | FutureFeature, Patch |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-03-19 04:18:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 911515 |
Description
Dan Callaghan
2017-12-01 00:36:55 UTC
Our best bet for solving this is still the old "event-driven scheduler" proposal: https://beaker-project.org/dev/proposals/event-driven-scheduler.html or at least the key idea from it, which is to have the scheduler look at newly-freed systems and find a recipe which can run on each system, instead of constantly polling for queued recipes which have runnable systems as it does now. This requires us to track a new piece of state on a system: whether the system has become available and there might be a queued recipe which can be scheduled onto it (this is the "pending" state for systems, in the design proposal doc). Rather than continuing to abuse the already-overloaded system "status" / "condition" this might require us to also clean up that mess (see bug 1039842). There is actually another possible type of "event" which the original design proposal didn't consider. When a system becomes free and we look for any queued recipes which can be run on it, we need to account for lab controller restrictions (that is, a recipe is only runnable if its distro, and all its guest recipe distros, are available in the lab where the system is located). However if those restrictions means that we *don't* find any queued recipe which is runnable, and we put the system back into the free state -- then it's possible that a distro later being imported into that lab could now make some queued recipes runnable again. So effectively, importing a distro is also an event which could have scheduling implications. The problem is that the design assumes there is a simple way to poll for things needing action -- either recipes in New state, or systems in Pending state. But there is no nice way to write a loop which polls the database looking for "a tree is now in a lab that it wasn't in before"... Another "event" to consider is when a system loan is returned. While it's loaned, it is only eligible to run recipes owned by the loanee. But once the loan is returned, it may now be eligible to run other recipes. That one can easily be handled by just flipping the system status back to "pending" when the loan is returned, to force the scheduler to reconsider it. This implements sort of the bare minimum idea from the "event driven scheduler" proposal: https://gerrit.beaker-project.org/5939 Most of the existing beakerd mechanisms are left as is (messy as they are). There are opportunities for further cleanup down the road. I still need to find a solution for the problem in comment 2 (it's broken in my initial version of the patch I just posted). I am thinking to hijack the "dirty" state of a job as a way of flagging beakerd to re-examine queued recipes after their distro tree has been added to a new lab. Not sure how ugly that will be, I'll try a version of it tomorrow. (In reply to Dan Callaghan from comment #4) > I still need to find a solution for the problem in comment 2 (it's broken in > my initial version of the patch I just posted). I am thinking to hijack the > "dirty" state of a job as a way of flagging beakerd to re-examine queued > recipes after their distro tree has been added to a new lab. Not sure how > ugly that will be, I'll try a version of it tomorrow. Instead of messing around with dirty jobs (which would also require a bunch of special handling to exist in the _update_status methods, just for this case) I realised a simpler approach would be to use the same @event.listens_for trick as I have in the patch for the other cases, like where a system's loan is changed etc. Flip any potentially affected systems back to "pending" and let the scheduler re-evaluate them to see if there is now any queued recipe for them, otherwise they will go back to "idle". Looks like we have a viable solution here. The latest version of the patch covers ever possible corner case I can think of and seems to work. We will know for sure once it soaks on beaker-devel for a while. Beaker 25.0 has been released. Release notes are available upstream: https://beaker-project.org/docs/whats-new/release-25.html |