Description of problem: For a few users jobs with reserve requests are going straight to finished instead of actually reserving the machine. Version-Release number of selected component (if applicable): 23.2 How reproducible: 100% Steps to Reproduce: (Take this with a grain of salt, since I'm not fully understanding the big picture here) 1. Create a job with a task (e.g. /distribution/command) 2. Make sure the command is executed in a couple of seconds 3. Kill/Pause beaker-watchdog in order to keep the Recipe in a state of TaskStatus.waiting 4. The system should end up in a state where the recipe status in not installing or running. (see Server/bkr/server/model/scheduler.py:2509) Actual results: System goes straight to finished and is not reserved. Expected results: System is reserved. Additional info: Task completing in 24s: server-debug.log.3:Sep 8 13:13:55 beaker-server beaker-server[52352]: bkr.server.xmlrpccontroller DEBUG Time: 0:00:00.011501 recipes.tasks.start ('45404156', 0) server-debug.log.3:Sep 8 13:14:19 beaker-server beaker-server[52510]: bkr.server.xmlrpccontroller DEBUG Time: 0:00:00.008961 recipes.tasks.stop ('45404156', 'stop', 'OK') On the lab controller: watchdog.log.1.gz:Sep 8 13:17:13 lab-02 beaker-watchdog[19947]: bkr.labcontroller.proxy INFO Removed Monitor for labcontroller.beaker.example:3049406
(In reply to Roman Joost from comment #0) > 3. Kill/Pause beaker-watchdog in order to keep the Recipe in a state of > TaskStatus.waiting > 4. The system should end up in a state where the recipe status in not > installing or running. (see Server/bkr/server/model/scheduler.py:2509) Right so the reason this is happening for us occasionally in production is that normally: * while Anaconda is installing, recipe status is installing * then, when Anaconda finishes installing and reboots, the next iteration of update_dirty_jobs will set recipe status to Waiting * then, when the system has rebooted and beah starts the first task, the next iteration of update_dirty_jobs will set recipe status to Running * finally, when beah finishes the final task in the recipe, the next iteration of update_dirty_jobs will set recipe status to Completed -- or Reserved, if the user requested a reservation This bug is a regression in 23.0 because the above is new as of 23.0, due to the Installing status. Previously the status would be Running as soon as Anaconda starts and then it stays that way until the end of the recipe. The problem here is that line of code, which is testing the recipe status against Installing or Running states (but not Waiting). However, in case there is only one task in the recipe and beah finishes it very quickly, it means there is only a very short space of time between beah starting the first task and beah stopping the last task (in the above example, 24 seconds). If beakerd doesn't finish a complete loop of update_dirty_jobs in that time, meaning that it never set the recipe to Running, then it will hit this bug.
Workaround for this bug would be to make the tasks take slightly longer -- even 5 minutes should be plenty of time. If the recipe has a single /distribution/command task then simply putting "; sleep 300" at the end of the command would be enough.
Patch available: https://gerrit.beaker-project.org/#/c/5230/
Beaker 23.3 has been released.