Description of problem: During the upgrade process a wrong table has been found in the Heat database. ERROR: (pymysql.err.InternalError) (1050, u"Table 'raw_template_files' already exists") [SQL: u'\nCREATE TABLE raw_template_files (\n\tid INTEGER NOT NULL AUTO_INCREMENT, \n\tfiles LONGTEXT, \n\tcreated_at DATETIME, \n\tupdated_at DATETIME, \n\tPRIMARY KEY (id)\n)ENGINE=InnoDB CHARSET=utf8\n\n'] Version-Release number of selected component (if applicable): OSP9 -> OSP10 upgrade (previously upgraded from OSP7->OSP8->OSP9) How reproducible: was also reported in https://bugzilla.redhat.com/show_bug.cgi?id=1406380#c4 Actual results: Expected results: Additional info: Since table was empty, it was removed and migration worked MariaDB [heat]> explain raw_template_files; +------------+----------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+----------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | files | longtext | YES | | NULL | | | created_at | datetime | YES | | NULL | | | updated_at | datetime | YES | | NULL | | +------------+----------+------+-----+---------+----------------+ 4 rows in set (0.00 sec) MariaDB [heat]> select count(id) from raw_template_files; +-----------+ | count(id) | +-----------+ | 0 | +-----------+ 1 row in set (0.00 sec) MariaDB [heat]> drop table raw_template_files; Query OK, 0 rows affected (0.23 sec) Then the Heat database migration has been executed manually heat-manage --config-file /etc/heat/heat.conf db_sync 2017-02-08 11:35:02.271 29739 INFO migrate.versioning.api [-] 71 -> 72... 2017-02-08 11:36:57.673 29739 INFO migrate.versioning.api [-] done 2017-02-08 11:36:57.675 29739 INFO migrate.versioning.api [-] 72 -> 73... 2017-02-08 11:36:59.326 29739 INFO migrate.versioning.api [-] done
This is inexplicable, because "raw_template_files" does not appear in any migration in RHOS 7, 8, or 9, and it appears only once in RHOS 10: http://git.openstack.org/cgit/openstack/heat/tree/heat/db/sqlalchemy/migrate_repo/versions/072_raw_template_files.py?h=stable/newton I can't see any other way for that table to get created without also updating the migration version.
Agree with zaneb here. This could happen if simultaneously calls to 'heat-manage db_sync' are made (or an initial one is made and doesn't complete). Is there anyway to tell if that happened in the logs?
(In reply to Zane Bitter from comment #1) > This is inexplicable, because "raw_template_files" does not appear in any > migration in RHOS 7, 8, or 9, and it appears only once in RHOS 10: > http://git.openstack.org/cgit/openstack/heat/tree/heat/db/sqlalchemy/ > migrate_repo/versions/072_raw_template_files.py?h=stable/newton > > I can't see any other way for that table to get created without also > updating the migration version. Ok, so the situation was a bit different. It seems heat-manage db_sync failed at 73 because of [1] and [2]. After clean up of the DB and run again the db_sync you can't pass anymore 72 as there is no check if the table got created previously. Probably [3] should be modified to create the table only if it does not exist. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1349111 [2] https://access.redhat.com/solutions/2215131 [3] http://git.openstack.org/cgit/openstack/heat/tree/heat/db/sqlalchemy/migrate_repo/versions/072_raw_template_files.py?h=stable/newton#n21
So, what is the first error that is encountered, something in 73 upgrade? If so, it would help to see the actual error from that. In any case, I don't see how a 73 upgrade (which operates on the resource/resource_data relationship - http://git.openstack.org/cgit/openstack/heat/tree/heat/db/sqlalchemy/migrate_repo/versions/073_resource_data_fk_ondelete_cascade.py?h=stable/newton ) would fail due to [1] (extra entries in the raw_template table). Another puzzling thing here, if the 72 migrate script succeeded and the 73 script failed, heat-manage db_sync should not be re-attempting 72 the next time it is called... unless some other script or sql command is editing heat's migrate_version table. I'm still suspicious that multiple near-simultaneous calls are made to heat-manage db_sync, whether from the same node or not.
(In reply to Crag Wolfe from comment #4) > So, what is the first error that is encountered, something in 73 upgrade? If > so, it would help to see the actual error from that. > > In any case, I don't see how a 73 upgrade (which operates on the > resource/resource_data relationship - > http://git.openstack.org/cgit/openstack/heat/tree/heat/db/sqlalchemy/ > migrate_repo/versions/073_resource_data_fk_ondelete_cascade.py?h=stable/ > newton ) would fail due to [1] (extra entries in the raw_template table). > > Another puzzling thing here, if the 72 migrate script succeeded and the 73 > script failed, heat-manage db_sync should not be re-attempting 72 the next > time it is called... unless some other script or sql command is editing > heat's migrate_version table. > > I'm still suspicious that multiple near-simultaneous calls are made to > heat-manage db_sync, whether from the same node or not. I'm sorry but the upgrade logs about heat 73 are gone. From reporter the suspicion is that it failed due to a timeout (> 70% I/O WAIT).
We suspect this must be happening due to either simultaneous calls to heat-manage or a failure in the middle of heat-manage running. There's nothing special about this particular migration, so putting in special handling to deal with double-creates for this one table is not going to solve the problem in general. The workaround is easy and safe. For those reasons, we don't intend to modify the migration. It's possible that bug 1428845 (OSP 10 incarnation: bug 1428877) may have been the cause of having two simultaneous DB syncs. If that were the case then the patch for that would have prevented the issue.
We might also have to do the following before dropping the table: alter table raw_template drop foreign key raw_tmpl_files_fkey_ref;
And also : alter table raw_template drop column files_id;