Bug 972290 - Beaker shows a 500 error when cancelling a job while it is being scheduled
Beaker shows a 500 error when cancelling a job while it is being scheduled
Status: CLOSED CURRENTRELEASE
Product: Beaker
Classification: Community
Component: command line (Show other bugs)
develop
Unspecified Unspecified
medium Severity medium (vote)
: 0.13
: ---
Assigned To: Amit Saha
Dan Callaghan
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-08 03:27 EDT by xjia
Modified: 2015-07-26 18:15 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-06-25 02:27:05 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description xjia 2013-06-08 03:27:36 EDT
Description of problem:
Create a job, when job's status trying to change from "Updating" to "Running", cancel this job. Beaker show 500 error.

2013-06-08 07:11:59,140 cherrypy.msg INFO HTTP: Page handler: <bound method Jobs.really_cancel of <bkr.server.jobs.Jobs object at 0x7f9a4af67050>>
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/_cphttptools.py", line 121, in _run
    self.main()
  File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/_cphttptools.py", line 264, in main
    body = page_handler(*virtual_path, **self.params)
  File "<string>", line 3, in really_cancel
  File "/usr/lib/python2.6/site-packages/turbogears/identity/conditions.py", line 249, in require
    return fn(self, *args, **kwargs)
  File "<string>", line 3, in really_cancel
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 361, in expose
    *args, **kw)
  File "<generated code>", line 0, in run_with_transaction
  File "/usr/lib/python2.6/site-packages/peak/rules/core.py", line 153, in __call__
    return self.body(*args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/database.py", line 458, in sa_rwt
    retval = func(*args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 244, in _expose
    @abstract()
  File "<generated code>", line 0, in _expose
  File "/usr/lib/python2.6/site-packages/peak/rules/core.py", line 153, in __call__
    return self.body(*args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 390, in <lambda>
    fragment, options, args, kw)))
File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 244, in _expose
    @abstract()
  File "<generated code>", line 0, in _expose
  File "/usr/lib/python2.6/site-packages/peak/rules/core.py", line 153, in __call__
    return self.body(*args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 390, in <lambda>
    fragment, options, args, kw)))
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 425, in _execute_func
    output = errorhandling.try_call(func, *args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/errorhandling.py", line 77, in try_call
    return func(self, *args, **kw)
  File "/usr/lib/python2.6/site-packages/bkr/server/jobs.py", line 824, in really_cancel
    job.cancel(msg)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 4620, in cancel
    task._abort_cancel(TaskStatus.cancelled, msg)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6128, in _abort_cancel
    self._change_status(status)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 4064, in _change_status
    % self.t_id)
StaleTaskStatusException: Status for T:39920 updated in another transaction


Version-Release number of selected component (if applicable):
0.13

How reproducible:
Randomly

Steps to Reproduce:
1.bkr workflow-simple --distro RHEL6.4-20130130.0 --task //kernel/kdump/regression-602033 --prettyxml
2.cancel this job


Actual results:
Beaker shows 500 error

Expected results:
No error

Additional info:
Comment 1 Raymond Mancy 2013-06-10 21:00:05 EDT
We should be catching this and issuing a rollback.
Comment 2 Amit Saha 2013-06-12 03:06:03 EDT
On Gerrit: http://gerrit.beaker-project.org/#/c/2022/1 

It is to be noted that this isn't a fix by any means. It simply returns a nice error message to the user. The real fix would be to cancel the job, considering that some other DB transaction may be in progress and accounting correctly for it.
Comment 4 Dan Callaghan 2013-06-12 21:36:39 EDT
Verified:
1. Submitted two jobs at the same time.
2. Watched the beakerd logs until the first recipe was scheduled by schedule_queued_recipes ("Created watchdog for recipe id: ...")
3. Hit the "Yes" button on the cancel page for the first recipe, before beakerd finished schedule_queued_recipes.

Saw a pleasant flash message saying:
"Could not cancel job id 568. Please try later"
instead of a 500 error.

The schedule_queued_recipes routine is the only window large enough to ever hit this problem, since it is the only one which does all its work in one big transaction, and takes several seconds for each recipe.
Comment 5 Raymond Mancy 2013-06-13 18:58:05 EDT
(In reply to Dan Callaghan from comment #4)
> The schedule_queued_recipes routine is the only window large enough to ever
> hit this problem, since it is the only one which does all its work in one
> big transaction, and takes several seconds for each recipe.

This ticket was filed against 'Updating' to 'Running', which is not the schedule_queued_recipes routine.

The update_dirty_jobs routine would have caused this original condition I would think.
Comment 6 Dan Callaghan 2013-06-17 01:16:38 EDT
(In reply to Raymond Mancy from comment #5)
> This ticket was filed against 'Updating' to 'Running', which is not the
> schedule_queued_recipes routine.
> 
> The update_dirty_jobs routine would have caused this original condition I
> would think.

update_dirty_jobs doesn't change recipe-task status, only recipe, recipe set, and job status (as well as computing results and counts and returning systems, etc). So cancelling cannot conflict with update_dirty_jobs.

I'm guessing that in Xuan's original report when he said ‘trying to change from "Updating" to "Running"’ he just meant ‘while beakerd was scheduling the recipe’.
Comment 7 xjia 2013-06-17 01:36:14 EDT
(In reply to Dan Callaghan from comment #6)
Actually. I just described what I had done before it happened. Not sure what cased this problem. It's difficult to reproduce it. but I think you could guess or image what had happened from the code stack.
Comment 9 Amit Saha 2013-06-25 02:27:05 EDT
Beaker 0.13.1 has been released.

Note You need to log in before you can comment on or make changes to this bug.