Bug 950895 - database error in schedule_queued_recipe causes beakerd to die: SAVEPOINT sa_savepoint_9 does not exist
Summary: database error in schedule_queued_recipe causes beakerd to die: SAVEPOINT sa_...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: scheduler
Version: 0.11
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: 0.13
Assignee: Nick Coghlan
QA Contact: tools-bugs
URL:
Whiteboard: Scheduler
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-04-11 07:27 UTC by Dan Callaghan
Modified: 2018-02-06 00:41 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-06-25 06:27:39 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 958362 0 unspecified CLOSED database deadlock in beakerd 2021-02-22 00:41:40 UTC

Internal Links: 958362

Description Dan Callaghan 2013-04-11 07:27:46 UTC
Version-Release number of selected component (if applicable):
0.12.0-3

The schedule_queued_recipe function in beakerd failed with a database error:

2013-04-11 03:16:31,772 beakerd ERROR Error in schedule_queued_recipe(846457)
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/bkr/server/tools/beakerd.py", line 397, in schedule_queued_recipes
    schedule_queued_recipe(recipe_id)
  File "/usr/lib/python2.6/site-packages/bkr/server/tools/beakerd.py", line 494, in schedule_queued_recipe
    recipe.resource.allocate()
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 6192, in allocate
    reservation_type=u'recipe')
  File "/usr/lib/python2.6/site-packages/bkr/server/model.py", line 2675, in reserve
    user_id=user.user_id).rowcount != 1:
[...]
OperationalError: (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE system SET user_id=%s WHERE system.id = %s AND system.user_id IS NULL' (1480L, 4510L)

The cause of that exception is a separate problem, the bug here is that this exception then caused the following unhandled exception which led to beakerd dying:

2013-04-11 03:16:31,789 beakerd ERROR Uncaught exception in main_recipes_loop
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/bkr/server/util.py", line 159, in decorated
    func(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/bkr/server/tools/beakerd.py", line 732, in main_recipes_loop
    queued = schedule_queued_recipes()
  File "/usr/lib/python2.6/site-packages/bkr/server/tools/beakerd.py", line 426, in schedule_queued_recipes
    session.commit()
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/scoping.py", line 139, in do
    return getattr(self.registry(), name)(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 614, in commit
    self.transaction.commit()
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 389, in commit
    t[1].commit()
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 1535, in commit
    self._do_commit()
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 1578, in _do_commit
    self._savepoint, self._parent)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 1118, in _release_savepoint_impl
    self.engine.dialect.do_release_savepoint(self, name)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/default.py", line 293, in do_release_savepoint
    connection.execute(expression.ReleaseSavepointClause(name))
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 1191, in execute
    params)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 1271, in _execute_clauseelement
    return self.__execute_context(context)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 1302, in __execute_context
    context.parameters[0], context=context)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 1401, in _cursor_execute
    context)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 1394, in _cursor_execute
    context)
  File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/default.py", line 299, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/lib64/python2.6/site-packages/MySQLdb/cursors.py", line 173, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
OperationalError: (OperationalError) (1305, 'SAVEPOINT sa_savepoint_9 does not exist') 'RELEASE SAVEPOINT sa_savepoint_9' ()
[...]
2013-04-11 03:16:38,544 beakerd CRITICAL a thread has died, shutting down

It seems like the nested transaction handling in schedule_queued_recipes is not working properly, perhaps due to a sqlalchemy bug?

Comment 2 Nick Coghlan 2013-05-21 04:45:28 UTC
Looking into this, what I suspect may be happening is that because of the deadlock, MySQL has thrown out our entire transaction, but beakerd is blissfully unaware of this and attempts to commit the no longer valid transaction and goes boom.

We can mitigate the consequences by wrapping the commit in a try/except that just logs the error if it dies.

I will also change the code so that we log any exceptions that escape the inner scheduling loop.

Comment 3 Nick Coghlan 2013-05-21 05:37:24 UTC
On Gerrit: http://gerrit.beaker-project.org/#/c/1954/

Comment 8 Amit Saha 2013-06-25 06:27:39 UTC
Beaker 0.13.1 has been released.


Note You need to log in before you can comment on or make changes to this bug.