Bug 1652287 - Deployment may fail because nova_wait_for_db_sync runs simultaneously on all three controllers
Summary: Deployment may fail because nova_wait_for_db_sync runs simultaneously on all ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 14.0 (Rocky)
Assignee: Martin Schuppert
QA Contact: Gurenko Alex
URL:
Whiteboard:
: 1636399 1651475 1653816 (view as bug list)
Depends On:
Blocks: 1625732
TreeView+ depends on / blocked
 
Reported: 2018-11-21 19:30 UTC by Marius Cornea
Modified: 2019-01-11 11:54 UTC (History)
10 users (show)

Fixed In Version: openstack-tripleo-heat-templates-9.0.1-0.20181013060895.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-11 11:54:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1804652 0 None None None 2018-11-22 14:06:11 UTC
OpenStack gerrit 619586 0 'None' 'MERGED' 'Change step to start nova placement and make compute wait for it' 2019-11-26 14:15:20 UTC
OpenStack gerrit 619607 0 'None' 'MERGED' 'Revert "Verify nova api migration finished before start placement"' 2019-11-26 14:15:22 UTC
OpenStack gerrit 621501 0 'None' 'MERGED' 'Change step to start nova placement and make compute wait for it' 2019-11-26 14:15:20 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:54:58 UTC

Description Marius Cornea 2018-11-21 19:30:55 UTC
Description of problem:
Deployment may fail because nova_wait_for_db_sync runs simultaneously on all three controllers, one of the controllers failing with:

        "ERROR:oslo_db.sqlalchemy.exc_filters:DBAPIError exception wrapped from (pymysql.err.InternalError) (1050, u\"Table 'migrate_version' already exists\") [SQL: u'\\nCREATE TABLE migrate_version (\\n\\trepository_id VARCHAR(250) NOT NULL, \\n\\trepository_path TEXT, \\n\\tversion INTEGER, \\n\\tPRIMARY KEY (repository_id)\\n)\\n\\n'] (Background on this error at: http://sqlalche.me/e/2j85)", 
        "Traceback (most recent call last):", 
        "  File \"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\", line 1193, in _execute_context", 
        "    context)", 
        "  File \"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py\", line 507, in do_execute", 
        "    cursor.execute(statement, parameters)", 
        "  File \"/usr/lib/python2.7/site-packages/pymysql/cursors.py\", line 166, in execute", 
        "    result = self._query(query)", 
        "  File \"/usr/lib/python2.7/site-packages/pymysql/cursors.py\", line 322, in _query", 
        "    conn.query(q)", 
        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 856, in query", 
        "    self._affected_rows = self._read_query_result(unbuffered=unbuffered)", 
        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1057, in _read_query_result", 
        "    result.read()", 
        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1340, in read", 
        "    first_packet = self.connection._read_packet()", 
        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1014, in _read_packet", 
        "    packet.check_error()", 
        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 393, in check_error", 
        "    err.raise_mysql_exception(self._data)", 
        "  File \"/usr/lib/python2.7/site-packages/pymysql/err.py\", line 107, in raise_mysql_exception", 
        "    raise errorclass(errno, errval)", 
        "InternalError: (1050, u\"Table 'migrate_version' already exists\")", 
        "ERROR:nova_wait_for_db_sync:uuups something went wrong: %s (pymysql.err.InternalError) (1050, u\"Table 'migrate_version' already exists\") [SQL: u'\\nCREATE TABLE migrate_version (\\n\\trepository_id VARCHAR(250) NOT NULL, \\n\\trepository_path TEXT, \\n\\tversion INTEGER, \\n\\tPRIMARY KEY (repository_id)\\n)\\n\\n'] (Background on this error at: http://sqlalche.me/e/2j85)", 

Version-Release number of selected component (if applicable):
2018-11-21.2 puddle

How reproducible:
not always

Steps to Reproduce:
1. Deploy OSP14 with 3 controllers

Actual results:
It looks like nova_wait_for_db_sync runs db migrations simultaneously on all 3 controllers.

Expected results:
It should run the db migrations on 1/3 controllers.

Additional info:
Attaching ansible.log

Comment 4 Matthew Booth 2018-11-22 13:12:42 UTC
I think these 2 errors represent 2 slightly different manifestations of the same race in nova's db_version(). We do:

try:
  get db version
except:
  count how many tables we've got
  if no tables:
    create a version table
  else:
    complain about an Essex-era db

So if 2 threads execute this concurrently and both fail to get db version, we've then got a different behaviour based on whether the race winner has started creating the schema yet or not. If not, we get a 1050 as in the description. If yes, we get the Essex warning as in comment 3. The cause is the same, though.

Comment 5 Martin Schuppert 2018-11-26 12:31:18 UTC
*** Bug 1636399 has been marked as a duplicate of this bug. ***

Comment 6 Martin Schuppert 2018-11-27 07:18:08 UTC
*** Bug 1651475 has been marked as a duplicate of this bug. ***

Comment 7 Lee Yarwood 2018-11-28 12:55:36 UTC
*** Bug 1653816 has been marked as a duplicate of this bug. ***

Comment 18 Joe H. Rahme 2018-12-03 14:02:13 UTC
Marking this as verified, after running multiple deployments with 3 controllers and didn't observe the faulty behavior.

If this bug starts reappearing do not hesitate to reopen the bug.

Comment 21 errata-xmlrpc 2019-01-11 11:54:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.