Bug 972290

Summary: Beaker shows a 500 error when cancelling a job while it is being scheduled
Product: [Retired] Beaker Reporter: xjia <xjia>
Component: command lineAssignee: Amit Saha <asaha>
Status: CLOSED CURRENTRELEASE QA Contact: Dan Callaghan <dcallagh>
Severity: medium Docs Contact:
Priority: medium    
Version: developCC: asaha, dcallagh, ebaak, llim, qwan, rglasz, rmancy, xtian
Target Milestone: 0.13   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-06-25 06:27:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description xjia 2013-06-08 07:27:36 UTC
Description of problem:
Create a job, when job's status trying to change from "Updating" to "Running", cancel this job. Beaker show 500 error.

2013-06-08 07:11:59,140 cherrypy.msg INFO HTTP: Page handler: <bound method Jobs.really_cancel of <bkr.server.jobs.Jobs object at 0x7f9a4af67050>>
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/_cphttptools.py", line 121, in _run
    self.main()
  File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/_cphttptools.py", line 264, in main
    body = page_handler(*virtual_path, **self.params)
  File "<string>", line 3, in really_cancel
  File "/usr/lib/python2.6/site-packages/turbogears/identity/conditions.py", line 249, in require
    return fn(self, *args, **kwargs)
  File "<string>", line 3, in really_cancel
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 361, in expose
    *args, **kw)
  File "<generated code>", line 0, in run_with_transaction
  File "/usr/lib/python2.6/site-packages/peak/rules/core.py", line 153, in __call__
    return self.body(*args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/database.py", line 458, in sa_rwt
    retval = func(*args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 244, in _expose
    @abstract()
  File "<generated code>", line 0, in _expose
  File "/usr/lib/python2.6/site-packages/peak/rules/core.py", line 153, in __call__
    return self.body(*args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 390, in <lambda>
    fragment, options, args, kw)))
File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 244, in _expose
    @abstract()
  File "<generated code>", line 0, in _expose
  File "/usr/lib/python2.6/site-packages/peak/rules/core.py", line 153, in __call__
    return self.body(*args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 390, in <lambda>
    fragment, options, args, kw)))
  File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 425, in _execute_func
    output = errorhandling.try_call(func, *args, **kw)
  File "/usr/lib/python2.6/site-packages/turbogears/errorhandling.py", line 77, in try_call
    return func(self, *args, **kw)
  File "/usr/lib/python2.6/site-packages/bkr/server/jobs.py", line 824, in really_cancel
    job.cancel(msg)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 4620, in cancel
    task._abort_cancel(TaskStatus.cancelled, msg)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6128, in _abort_cancel
    self._change_status(status)
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 4064, in _change_status
    % self.t_id)
StaleTaskStatusException: Status for T:39920 updated in another transaction


Version-Release number of selected component (if applicable):
0.13

How reproducible:
Randomly

Steps to Reproduce:
1.bkr workflow-simple --distro RHEL6.4-20130130.0 --task //kernel/kdump/regression-602033 --prettyxml
2.cancel this job


Actual results:
Beaker shows 500 error

Expected results:
No error

Additional info:

Comment 1 Raymond Mancy 2013-06-11 01:00:05 UTC
We should be catching this and issuing a rollback.

Comment 2 Amit Saha 2013-06-12 07:06:03 UTC
On Gerrit: http://gerrit.beaker-project.org/#/c/2022/1 

It is to be noted that this isn't a fix by any means. It simply returns a nice error message to the user. The real fix would be to cancel the job, considering that some other DB transaction may be in progress and accounting correctly for it.

Comment 4 Dan Callaghan 2013-06-13 01:36:39 UTC
Verified:
1. Submitted two jobs at the same time.
2. Watched the beakerd logs until the first recipe was scheduled by schedule_queued_recipes ("Created watchdog for recipe id: ...")
3. Hit the "Yes" button on the cancel page for the first recipe, before beakerd finished schedule_queued_recipes.

Saw a pleasant flash message saying:
"Could not cancel job id 568. Please try later"
instead of a 500 error.

The schedule_queued_recipes routine is the only window large enough to ever hit this problem, since it is the only one which does all its work in one big transaction, and takes several seconds for each recipe.

Comment 5 Raymond Mancy 2013-06-13 22:58:05 UTC
(In reply to Dan Callaghan from comment #4)
> The schedule_queued_recipes routine is the only window large enough to ever
> hit this problem, since it is the only one which does all its work in one
> big transaction, and takes several seconds for each recipe.

This ticket was filed against 'Updating' to 'Running', which is not the schedule_queued_recipes routine.

The update_dirty_jobs routine would have caused this original condition I would think.

Comment 6 Dan Callaghan 2013-06-17 05:16:38 UTC
(In reply to Raymond Mancy from comment #5)
> This ticket was filed against 'Updating' to 'Running', which is not the
> schedule_queued_recipes routine.
> 
> The update_dirty_jobs routine would have caused this original condition I
> would think.

update_dirty_jobs doesn't change recipe-task status, only recipe, recipe set, and job status (as well as computing results and counts and returning systems, etc). So cancelling cannot conflict with update_dirty_jobs.

I'm guessing that in Xuan's original report when he said ‘trying to change from "Updating" to "Running"’ he just meant ‘while beakerd was scheduling the recipe’.

Comment 7 xjia 2013-06-17 05:36:14 UTC
(In reply to Dan Callaghan from comment #6)
Actually. I just described what I had done before it happened. Not sure what cased this problem. It's difficult to reproduce it. but I think you could guess or image what had happened from the code stack.

Comment 9 Amit Saha 2013-06-25 06:27:05 UTC
Beaker 0.13.1 has been released.