Description of problem: Deployment may fail because nova_wait_for_db_sync runs simultaneously on all three controllers, one of the controllers failing with: "ERROR:oslo_db.sqlalchemy.exc_filters:DBAPIError exception wrapped from (pymysql.err.InternalError) (1050, u\"Table 'migrate_version' already exists\") [SQL: u'\\nCREATE TABLE migrate_version (\\n\\trepository_id VARCHAR(250) NOT NULL, \\n\\trepository_path TEXT, \\n\\tversion INTEGER, \\n\\tPRIMARY KEY (repository_id)\\n)\\n\\n'] (Background on this error at: http://sqlalche.me/e/2j85)", "Traceback (most recent call last):", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\", line 1193, in _execute_context", " context)", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py\", line 507, in do_execute", " cursor.execute(statement, parameters)", " File \"/usr/lib/python2.7/site-packages/pymysql/cursors.py\", line 166, in execute", " result = self._query(query)", " File \"/usr/lib/python2.7/site-packages/pymysql/cursors.py\", line 322, in _query", " conn.query(q)", " File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 856, in query", " self._affected_rows = self._read_query_result(unbuffered=unbuffered)", " File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1057, in _read_query_result", " result.read()", " File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1340, in read", " first_packet = self.connection._read_packet()", " File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1014, in _read_packet", " packet.check_error()", " File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 393, in check_error", " err.raise_mysql_exception(self._data)", " File \"/usr/lib/python2.7/site-packages/pymysql/err.py\", line 107, in raise_mysql_exception", " raise errorclass(errno, errval)", "InternalError: (1050, u\"Table 'migrate_version' already exists\")", "ERROR:nova_wait_for_db_sync:uuups something went wrong: %s (pymysql.err.InternalError) (1050, u\"Table 'migrate_version' already exists\") [SQL: u'\\nCREATE TABLE migrate_version (\\n\\trepository_id VARCHAR(250) NOT NULL, \\n\\trepository_path TEXT, \\n\\tversion INTEGER, \\n\\tPRIMARY KEY (repository_id)\\n)\\n\\n'] (Background on this error at: http://sqlalche.me/e/2j85)", Version-Release number of selected component (if applicable): 2018-11-21.2 puddle How reproducible: not always Steps to Reproduce: 1. Deploy OSP14 with 3 controllers Actual results: It looks like nova_wait_for_db_sync runs db migrations simultaneously on all 3 controllers. Expected results: It should run the db migrations on 1/3 controllers. Additional info: Attaching ansible.log
I think these 2 errors represent 2 slightly different manifestations of the same race in nova's db_version(). We do: try: get db version except: count how many tables we've got if no tables: create a version table else: complain about an Essex-era db So if 2 threads execute this concurrently and both fail to get db version, we've then got a different behaviour based on whether the race winner has started creating the schema yet or not. If not, we get a 1050 as in the description. If yes, we get the Essex warning as in comment 3. The cause is the same, though.
*** Bug 1636399 has been marked as a duplicate of this bug. ***
*** Bug 1651475 has been marked as a duplicate of this bug. ***
*** Bug 1653816 has been marked as a duplicate of this bug. ***
Marking this as verified, after running multiple deployments with 3 controllers and didn't observe the faulty behavior. If this bug starts reappearing do not hesitate to reopen the bug.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045