1420351 – During upgrade from OSP9 -> OSP10 db migration fails with "Table 'raw_template_files' already exists"

Bug 1420351 - During upgrade from OSP9 -> OSP10 db migration fails with "Table 'raw_template_files' already exists"

Summary: During upgrade from OSP9 -> OSP10 db migration fails with "Table 'raw_templat...

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-heat
Sub Component:
Version:	10.0 (Newton)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Crag Wolfe
QA Contact:	Amit Ugol
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-02-08 14:03 UTC by Martin Schuppert
Modified:	2024-06-13 20:46 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-03-10 00:31:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Martin Schuppert 2017-02-08 14:03:36 UTC

Description of problem:

During the upgrade process a wrong table has been found in the Heat database.

ERROR: (pymysql.err.InternalError) (1050, u"Table 'raw_template_files' already exists") [SQL: u'\nCREATE TABLE raw_template_files (\n\tid INTEGER NOT NULL AUTO_INCREMENT, \n\tfiles LONGTEXT, \n\tcreated_at DATETIME, \n\tupdated_at DATETIME, \n\tPRIMARY KEY (id)\n)ENGINE=InnoDB CHARSET=utf8\n\n']

Version-Release number of selected component (if applicable):
OSP9 -> OSP10 upgrade (previously upgraded from OSP7->OSP8->OSP9)

How reproducible:

was also reported in https://bugzilla.redhat.com/show_bug.cgi?id=1406380#c4


Actual results:


Expected results:


Additional info:

Since table was empty, it was removed and migration worked

MariaDB [heat]> explain raw_template_files;
+------------+----------+------+-----+---------+----------------+
| Field      | Type     | Null | Key | Default | Extra          |
+------------+----------+------+-----+---------+----------------+
| id         | int(11)  | NO   | PRI | NULL    | auto_increment |
| files      | longtext | YES  |     | NULL    |                |
| created_at | datetime | YES  |     | NULL    |                |
| updated_at | datetime | YES  |     | NULL    |                |
+------------+----------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

MariaDB [heat]> select count(id) from raw_template_files;
+-----------+
| count(id) |
+-----------+
|         0 |
+-----------+
1 row in set (0.00 sec)

MariaDB [heat]> drop table raw_template_files;
Query OK, 0 rows affected (0.23 sec)

Then the Heat database migration has been executed manually

heat-manage --config-file /etc/heat/heat.conf db_sync
2017-02-08 11:35:02.271 29739 INFO migrate.versioning.api [-] 71 -> 72... 
2017-02-08 11:36:57.673 29739 INFO migrate.versioning.api [-] done
2017-02-08 11:36:57.675 29739 INFO migrate.versioning.api [-] 72 -> 73... 
2017-02-08 11:36:59.326 29739 INFO migrate.versioning.api [-] done

Comment 1 Zane Bitter 2017-02-08 14:35:42 UTC

This is inexplicable, because "raw_template_files" does not appear in any migration in RHOS 7, 8, or 9, and it appears only once in RHOS 10: http://git.openstack.org/cgit/openstack/heat/tree/heat/db/sqlalchemy/migrate_repo/versions/072_raw_template_files.py?h=stable/newton

I can't see any other way for that table to get created without also updating the migration version.

Comment 2 Crag Wolfe 2017-02-08 21:07:12 UTC

Agree with zaneb here. This could happen if simultaneously calls to 'heat-manage db_sync' are made (or an initial one is made and doesn't complete). Is there anyway to tell if that happened in the logs?

Comment 3 Martin Schuppert 2017-02-09 07:37:41 UTC

(In reply to Zane Bitter from comment #1)
> This is inexplicable, because "raw_template_files" does not appear in any
> migration in RHOS 7, 8, or 9, and it appears only once in RHOS 10:
> http://git.openstack.org/cgit/openstack/heat/tree/heat/db/sqlalchemy/
> migrate_repo/versions/072_raw_template_files.py?h=stable/newton
> 
> I can't see any other way for that table to get created without also
> updating the migration version.

Ok, so the situation was a bit different. It seems heat-manage db_sync failed at 73 because of [1] and [2]. After clean up of the DB and run again the db_sync you can't pass anymore 72 as there is no check if the table got created previously.

Probably [3] should be modified to create the table only if it does not exist.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1349111
[2] https://access.redhat.com/solutions/2215131
[3] http://git.openstack.org/cgit/openstack/heat/tree/heat/db/sqlalchemy/migrate_repo/versions/072_raw_template_files.py?h=stable/newton#n21

Comment 4 Crag Wolfe 2017-02-09 16:22:51 UTC

So, what is the first error that is encountered, something in 73 upgrade? If so, it would help to see the actual error from that.

In any case, I don't see how a 73 upgrade (which operates on the resource/resource_data relationship - http://git.openstack.org/cgit/openstack/heat/tree/heat/db/sqlalchemy/migrate_repo/versions/073_resource_data_fk_ondelete_cascade.py?h=stable/newton ) would fail due to [1] (extra entries in the raw_template table).

Another puzzling thing here, if the 72 migrate script succeeded and the 73 script failed, heat-manage db_sync should not be re-attempting 72 the next time it is called... unless some other script or sql command is editing heat's migrate_version table.

I'm still suspicious that multiple near-simultaneous calls are made to heat-manage db_sync, whether from the same node or not.

Comment 5 Martin Schuppert 2017-02-13 07:13:42 UTC

(In reply to Crag Wolfe from comment #4)
> So, what is the first error that is encountered, something in 73 upgrade? If
> so, it would help to see the actual error from that.
> 
> In any case, I don't see how a 73 upgrade (which operates on the
> resource/resource_data relationship -
> http://git.openstack.org/cgit/openstack/heat/tree/heat/db/sqlalchemy/
> migrate_repo/versions/073_resource_data_fk_ondelete_cascade.py?h=stable/
> newton ) would fail due to [1] (extra entries in the raw_template table).
> 
> Another puzzling thing here, if the 72 migrate script succeeded and the 73
> script failed, heat-manage db_sync should not be re-attempting 72 the next
> time it is called... unless some other script or sql command is editing
> heat's migrate_version table.
> 
> I'm still suspicious that multiple near-simultaneous calls are made to
> heat-manage db_sync, whether from the same node or not.

I'm sorry but the upgrade logs about heat 73 are gone. From reporter the suspicion is that it failed due to a timeout (> 70% I/O WAIT).

Comment 6 Zane Bitter 2017-03-10 00:31:09 UTC

We suspect this must be happening due to either simultaneous calls to heat-manage or a failure in the middle of heat-manage running.

There's nothing special about this particular migration, so putting in special handling to deal with double-creates for this one table is not going to solve the problem in general. The workaround is easy and safe. For those reasons, we don't intend to modify the migration.

It's possible that bug 1428845 (OSP 10 incarnation: bug 1428877) may have been the cause of having two simultaneous DB syncs. If that were the case then the patch for that would have prevented the issue.

Comment 7 David Hill 2018-01-13 22:16:04 UTC

We might also have to do the following before dropping the table:

alter table raw_template drop foreign key raw_tmpl_files_fkey_ref;

Comment 8 David Hill 2018-01-13 22:19:29 UTC

And also :

alter table raw_template drop column files_id;

Note You need to log in before you can comment on or make changes to this bug.