Bug 1375035
Summary: | Machine is not reserved if a task is finished too quickly | ||
---|---|---|---|
Product: | [Retired] Beaker | Reporter: | Roman Joost <rjoost> |
Component: | scheduler | Assignee: | Roman Joost <rjoost> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | tools-bugs <tools-bugs> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 22 | CC: | dcallagh, dowang, mjia, rjoost |
Target Milestone: | 23.3 | Keywords: | Patch, Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-11-07 06:44:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Roman Joost
2016-09-12 01:37:35 UTC
(In reply to Roman Joost from comment #0) > 3. Kill/Pause beaker-watchdog in order to keep the Recipe in a state of > TaskStatus.waiting > 4. The system should end up in a state where the recipe status in not > installing or running. (see Server/bkr/server/model/scheduler.py:2509) Right so the reason this is happening for us occasionally in production is that normally: * while Anaconda is installing, recipe status is installing * then, when Anaconda finishes installing and reboots, the next iteration of update_dirty_jobs will set recipe status to Waiting * then, when the system has rebooted and beah starts the first task, the next iteration of update_dirty_jobs will set recipe status to Running * finally, when beah finishes the final task in the recipe, the next iteration of update_dirty_jobs will set recipe status to Completed -- or Reserved, if the user requested a reservation This bug is a regression in 23.0 because the above is new as of 23.0, due to the Installing status. Previously the status would be Running as soon as Anaconda starts and then it stays that way until the end of the recipe. The problem here is that line of code, which is testing the recipe status against Installing or Running states (but not Waiting). However, in case there is only one task in the recipe and beah finishes it very quickly, it means there is only a very short space of time between beah starting the first task and beah stopping the last task (in the above example, 24 seconds). If beakerd doesn't finish a complete loop of update_dirty_jobs in that time, meaning that it never set the recipe to Running, then it will hit this bug. Workaround for this bug would be to make the tasks take slightly longer -- even 5 minutes should be plenty of time. If the recipe has a single /distribution/command task then simply putting "; sleep 300" at the end of the command would be enough. Patch available: https://gerrit.beaker-project.org/#/c/5230/ Beaker 23.3 has been released. |