Bug 1470959
Summary: | deadlock between multi-host recipe sets with host requirements can sometimes occur if jobs are dirty | ||
---|---|---|---|
Product: | [Retired] Beaker | Reporter: | Dan Callaghan <dcallagh> |
Component: | scheduler | Assignee: | Dan Callaghan <dcallagh> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Dan Callaghan <dcallagh> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 24 | CC: | achatter, dcallagh, junichi.nomura, mjia, rjoost |
Target Milestone: | 24.4 | Keywords: | Patch, Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-10-03 03:57:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Dan Callaghan
2017-07-14 06:57:31 UTC
The obvious solution is to just remove the criterion which is filtering out dirty jobs from the scheduler queries. By definition the harness cannot be sending updates for a recipe that is still queued. But I'm not sure if that could break things in some other way. (In reply to Dan Callaghan from comment #1) I dug into this option in more detail and I think it is safe. That is, allowing the scheduler queries (for moving recipes from New through to Scheduled) to operate on jobs which are "dirty", instead of filtering down to "clean" jobs. These scheduler passes only select the individual recipes or recipe sets, and then operate on those alone without touching other parts of the job. So there should be no way the scheduler can conflict with harness updates happening on other recipes in the job which are already running. Anyway, we'll find out soon enough if it causes any problems... https://gerrit.beaker-project.org/5847 Given that this is very timing-sensitive across the scheduler *and* harness updates running concurrently in other recipe sets, there is no good way to reproduce this outside the test suite. Therefore I think the best way can do is to mark this VERIFIED based on the fact that the scheduler hasn't shown any odd behaviour in testing. Beaker 24.4 has been released. |