Description of problem: Create a job, when job's status trying to change from "Updating" to "Running", cancel this job. Beaker show 500 error. 2013-06-08 07:11:59,140 cherrypy.msg INFO HTTP: Page handler: <bound method Jobs.really_cancel of <bkr.server.jobs.Jobs object at 0x7f9a4af67050>> Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/_cphttptools.py", line 121, in _run self.main() File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/_cphttptools.py", line 264, in main body = page_handler(*virtual_path, **self.params) File "<string>", line 3, in really_cancel File "/usr/lib/python2.6/site-packages/turbogears/identity/conditions.py", line 249, in require return fn(self, *args, **kwargs) File "<string>", line 3, in really_cancel File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 361, in expose *args, **kw) File "<generated code>", line 0, in run_with_transaction File "/usr/lib/python2.6/site-packages/peak/rules/core.py", line 153, in __call__ return self.body(*args, **kw) File "/usr/lib/python2.6/site-packages/turbogears/database.py", line 458, in sa_rwt retval = func(*args, **kw) File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 244, in _expose @abstract() File "<generated code>", line 0, in _expose File "/usr/lib/python2.6/site-packages/peak/rules/core.py", line 153, in __call__ return self.body(*args, **kw) File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 390, in <lambda> fragment, options, args, kw))) File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 244, in _expose @abstract() File "<generated code>", line 0, in _expose File "/usr/lib/python2.6/site-packages/peak/rules/core.py", line 153, in __call__ return self.body(*args, **kw) File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 390, in <lambda> fragment, options, args, kw))) File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 425, in _execute_func output = errorhandling.try_call(func, *args, **kw) File "/usr/lib/python2.6/site-packages/turbogears/errorhandling.py", line 77, in try_call return func(self, *args, **kw) File "/usr/lib/python2.6/site-packages/bkr/server/jobs.py", line 824, in really_cancel job.cancel(msg) File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 4620, in cancel task._abort_cancel(TaskStatus.cancelled, msg) File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6128, in _abort_cancel self._change_status(status) File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 4064, in _change_status % self.t_id) StaleTaskStatusException: Status for T:39920 updated in another transaction Version-Release number of selected component (if applicable): 0.13 How reproducible: Randomly Steps to Reproduce: 1.bkr workflow-simple --distro RHEL6.4-20130130.0 --task //kernel/kdump/regression-602033 --prettyxml 2.cancel this job Actual results: Beaker shows 500 error Expected results: No error Additional info:
We should be catching this and issuing a rollback.
On Gerrit: http://gerrit.beaker-project.org/#/c/2022/1 It is to be noted that this isn't a fix by any means. It simply returns a nice error message to the user. The real fix would be to cancel the job, considering that some other DB transaction may be in progress and accounting correctly for it.
Verified: 1. Submitted two jobs at the same time. 2. Watched the beakerd logs until the first recipe was scheduled by schedule_queued_recipes ("Created watchdog for recipe id: ...") 3. Hit the "Yes" button on the cancel page for the first recipe, before beakerd finished schedule_queued_recipes. Saw a pleasant flash message saying: "Could not cancel job id 568. Please try later" instead of a 500 error. The schedule_queued_recipes routine is the only window large enough to ever hit this problem, since it is the only one which does all its work in one big transaction, and takes several seconds for each recipe.
(In reply to Dan Callaghan from comment #4) > The schedule_queued_recipes routine is the only window large enough to ever > hit this problem, since it is the only one which does all its work in one > big transaction, and takes several seconds for each recipe. This ticket was filed against 'Updating' to 'Running', which is not the schedule_queued_recipes routine. The update_dirty_jobs routine would have caused this original condition I would think.
(In reply to Raymond Mancy from comment #5) > This ticket was filed against 'Updating' to 'Running', which is not the > schedule_queued_recipes routine. > > The update_dirty_jobs routine would have caused this original condition I > would think. update_dirty_jobs doesn't change recipe-task status, only recipe, recipe set, and job status (as well as computing results and counts and returning systems, etc). So cancelling cannot conflict with update_dirty_jobs. I'm guessing that in Xuan's original report when he said ‘trying to change from "Updating" to "Running"’ he just meant ‘while beakerd was scheduling the recipe’.
(In reply to Dan Callaghan from comment #6) Actually. I just described what I had done before it happened. Not sure what cased this problem. It's difficult to reproduce it. but I think you could guess or image what had happened from the code stack.
Beaker 0.13.1 has been released.