Bug 1375035 - Machine is not reserved if a task is finished too quickly
Summary: Machine is not reserved if a task is finished too quickly
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: scheduler
Version: 22
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: 23.3
Assignee: Roman Joost
QA Contact: tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-12 01:37 UTC by Roman Joost
Modified: 2016-11-07 06:44 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-11-07 06:44:30 UTC
Embargoed:


Attachments (Terms of Use)

Description Roman Joost 2016-09-12 01:37:35 UTC
Description of problem:

For a few users jobs with reserve requests are going straight to finished instead of actually reserving the machine.


Version-Release number of selected component (if applicable):

23.2


How reproducible:

100%

Steps to Reproduce:
(Take this with a grain of salt, since I'm not fully understanding the big picture here)

1. Create a job with a task (e.g. /distribution/command) 
2. Make sure the command is executed in a couple of seconds
3. Kill/Pause beaker-watchdog in order to keep the Recipe in a state of TaskStatus.waiting
4. The system should end up in a state where the recipe status in not installing or running. (see Server/bkr/server/model/scheduler.py:2509)


Actual results:
System goes straight to finished and is not reserved.

Expected results:
System is reserved.

Additional info:

Task completing in 24s:

server-debug.log.3:Sep  8 13:13:55 beaker-server beaker-server[52352]: bkr.server.xmlrpccontroller DEBUG Time: 0:00:00.011501 recipes.tasks.start ('45404156', 0)
server-debug.log.3:Sep  8 13:14:19 beaker-server beaker-server[52510]: bkr.server.xmlrpccontroller DEBUG Time: 0:00:00.008961 recipes.tasks.stop ('45404156', 'stop', 'OK')

On the lab controller:
watchdog.log.1.gz:Sep  8 13:17:13 lab-02 beaker-watchdog[19947]: bkr.labcontroller.proxy INFO Removed Monitor for labcontroller.beaker.example:3049406

Comment 2 Dan Callaghan 2016-09-12 05:47:42 UTC
(In reply to Roman Joost from comment #0)
> 3. Kill/Pause beaker-watchdog in order to keep the Recipe in a state of
> TaskStatus.waiting
> 4. The system should end up in a state where the recipe status in not
> installing or running. (see Server/bkr/server/model/scheduler.py:2509)

Right so the reason this is happening for us occasionally in production is that normally:

* while Anaconda is installing, recipe status is installing
* then, when Anaconda finishes installing and reboots, the next iteration of update_dirty_jobs will set recipe status to Waiting
* then, when the system has rebooted and beah starts the first task, the next iteration of update_dirty_jobs will set recipe status to Running
* finally, when beah finishes the final task in the recipe, the next iteration of update_dirty_jobs will set recipe status to Completed -- or Reserved, if the user requested a reservation

This bug is a regression in 23.0 because the above is new as of 23.0, due to the Installing status. Previously the status would be Running as soon as Anaconda starts and then it stays that way until the end of the recipe.

The problem here is that line of code, which is testing the recipe status against Installing or Running states (but not Waiting). However, in case there is only one task in the recipe and beah finishes it very quickly, it means there is only a very short space of time between beah starting the first task and beah stopping the last task (in the above example, 24 seconds). If beakerd doesn't finish a complete loop of update_dirty_jobs in that time, meaning that it never set the recipe to Running, then it will hit this bug.

Comment 3 Dan Callaghan 2016-09-12 05:48:36 UTC
Workaround for this bug would be to make the tasks take slightly longer -- even 5 minutes should be plenty of time. If the recipe has a single /distribution/command task then simply putting "; sleep 300" at the end of the command would be enough.

Comment 4 Roman Joost 2016-09-15 04:27:25 UTC
Patch available:

https://gerrit.beaker-project.org/#/c/5230/

Comment 7 Dan Callaghan 2016-11-07 06:44:30 UTC
Beaker 23.3 has been released.


Note You need to log in before you can comment on or make changes to this bug.