Bug 972290 - Beaker shows a 500 error when cancelling a job while it is being scheduled
Summary: Beaker shows a 500 error when cancelling a job while it is being scheduled
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: command line
Version: develop
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: 0.13
Assignee: Amit Saha
QA Contact: Dan Callaghan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-06-08 07:27 UTC by xjia
Modified: 2018-02-06 00:41 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-06-25 06:27:05 UTC
Embargoed:


Attachments (Terms of Use)

Description xjia 2013-06-08 07:27:36 UTC
Description of problem:
Create a job, when job's status trying to change from "Updating" to "Running", cancel this job. Beaker show 500 error.

2013-06-08 07:11:59,140 cherrypy.msg INFO HTTP: Page handler: <bound method Jobs.really_cancel of <bkr.server.jobs.Jobs object at 0x7f9a4af67050>>
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/_cphttptools.py", line 121, in _run
    self.main()
  File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/_cphttptools.py", line 264, in main
    body = page_handler(*virtual_path, **self.params)
  File "<string>", line 3, in really_cancel
  File "/usr/lib/python2.6/site-packages/turbogears/identity/conditions.py", line 249, in require
    return fn(self, *args, **kwargs)
  File "<string>", line 3, in really_cancel
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 361, in expose
    *args, **kw)
  File "<generated code>", line 0, in run_with_transaction
  File "/usr/lib/python2.6/site-packages/peak/rules/core.py", line 153, in __call__
    return self.body(*args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/database.py", line 458, in sa_rwt
    retval = func(*args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 244, in _expose
    @abstract()
  File "<generated code>", line 0, in _expose
  File "/usr/lib/python2.6/site-packages/peak/rules/core.py", line 153, in __call__
    return self.body(*args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 390, in <lambda>
    fragment, options, args, kw)))
File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 244, in _expose
    @abstract()
  File "<generated code>", line 0, in _expose
  File "/usr/lib/python2.6/site-packages/peak/rules/core.py", line 153, in __call__
    return self.body(*args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 390, in <lambda>
    fragment, options, args, kw)))
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 425, in _execute_func
    output = errorhandling.try_call(func, *args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/errorhandling.py", line 77, in try_call
    return func(self, *args, **kw)
  File "/usr/lib/python2.6/site-packages/bkr/server/jobs.py", line 824, in really_cancel
    job.cancel(msg)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 4620, in cancel
    task._abort_cancel(TaskStatus.cancelled, msg)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6128, in _abort_cancel
    self._change_status(status)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 4064, in _change_status
    % self.t_id)
StaleTaskStatusException: Status for T:39920 updated in another transaction


Version-Release number of selected component (if applicable):
0.13

How reproducible:
Randomly

Steps to Reproduce:
1.bkr workflow-simple --distro RHEL6.4-20130130.0 --task //kernel/kdump/regression-602033 --prettyxml
2.cancel this job


Actual results:
Beaker shows 500 error

Expected results:
No error

Additional info:

Comment 1 Raymond Mancy 2013-06-11 01:00:05 UTC
We should be catching this and issuing a rollback.

Comment 2 Amit Saha 2013-06-12 07:06:03 UTC
On Gerrit: http://gerrit.beaker-project.org/#/c/2022/1 

It is to be noted that this isn't a fix by any means. It simply returns a nice error message to the user. The real fix would be to cancel the job, considering that some other DB transaction may be in progress and accounting correctly for it.

Comment 4 Dan Callaghan 2013-06-13 01:36:39 UTC
Verified:
1. Submitted two jobs at the same time.
2. Watched the beakerd logs until the first recipe was scheduled by schedule_queued_recipes ("Created watchdog for recipe id: ...")
3. Hit the "Yes" button on the cancel page for the first recipe, before beakerd finished schedule_queued_recipes.

Saw a pleasant flash message saying:
"Could not cancel job id 568. Please try later"
instead of a 500 error.

The schedule_queued_recipes routine is the only window large enough to ever hit this problem, since it is the only one which does all its work in one big transaction, and takes several seconds for each recipe.

Comment 5 Raymond Mancy 2013-06-13 22:58:05 UTC
(In reply to Dan Callaghan from comment #4)
> The schedule_queued_recipes routine is the only window large enough to ever
> hit this problem, since it is the only one which does all its work in one
> big transaction, and takes several seconds for each recipe.

This ticket was filed against 'Updating' to 'Running', which is not the schedule_queued_recipes routine.

The update_dirty_jobs routine would have caused this original condition I would think.

Comment 6 Dan Callaghan 2013-06-17 05:16:38 UTC
(In reply to Raymond Mancy from comment #5)
> This ticket was filed against 'Updating' to 'Running', which is not the
> schedule_queued_recipes routine.
> 
> The update_dirty_jobs routine would have caused this original condition I
> would think.

update_dirty_jobs doesn't change recipe-task status, only recipe, recipe set, and job status (as well as computing results and counts and returning systems, etc). So cancelling cannot conflict with update_dirty_jobs.

I'm guessing that in Xuan's original report when he said ‘trying to change from "Updating" to "Running"’ he just meant ‘while beakerd was scheduling the recipe’.

Comment 7 xjia 2013-06-17 05:36:14 UTC
(In reply to Dan Callaghan from comment #6)
Actually. I just described what I had done before it happened. Not sure what cased this problem. It's difficult to reproduce it. but I think you could guess or image what had happened from the code stack.

Comment 9 Amit Saha 2013-06-25 06:27:05 UTC
Beaker 0.13.1 has been released.


Note You need to log in before you can comment on or make changes to this bug.