Description of problem: Customer is doing a major upgrade from OSP9-->OSP10. They just completed a major upgrade from OSP8-->OSP9 and immediately moved into an upgrade from OSP9-->OSP10. They are currently trying to perform the "openstack undercloud upgrade" step [1] and are encountering errors in the heat db_sync. [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/9/html-single/upgrading_red_hat_openstack_platform/#sect-Major-Updating_Director_Packages Version-Release number of selected component (if applicable): OSP9 $ grep openstack-heat installed-rpms openstack-heat-api-7.0.5-1.el7ost.noarch Fri Oct 20 23:34:07 2017 openstack-heat-api-cfn-7.0.5-1.el7ost.noarch Fri Oct 20 23:34:07 2017 openstack-heat-api-cloudwatch-7.0.5-1.el7ost.noarch Fri Oct 20 23:34:07 2017 openstack-heat-common-7.0.5-1.el7ost.noarch Fri Oct 20 23:31:07 2017 openstack-heat-engine-7.0.5-1.el7ost.noarch Fri Oct 20 23:34:07 2017 openstack-heat-templates-0-0.12.1e6015dgit.el7ost.noarch Fri Oct 20 23:28:46 2017 How reproducible: Everytime Steps to Reproduce: 1. Run "opsenstack undercloud upgrade" 2. Use workaround described in https://bugzilla.redhat.com/show_bug.cgi?id=1420351 to drop the raw_templates_files table; 3. Manually run "heat-manage --config-file /etc/heat/heat.conf db_sync" Actual results: Original error, which was found in above BZ: $ sudo heat-manage --config-file /etc/heat/heat.conf db_sync 2017-10-21 13:48:10.334 29119 INFO migrate.versioning.api [-] 71 -> 72... 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters [-] DBAPIError exception wrapped from (pymysql.err.InternalError) (1050, u"Table 'raw_template_files' already exists") [SQL: u'\nCREATE TABLE raw_template_files (\n\tid INTEGER NOT NULL AUTO_INCREMENT, \n\tfiles LONGTEXT, \n\tcreated_at DATETIME, \n\tupdated_at DATETIME, \n\tPRIMARY KEY (id)\n)ENGINE=InnoDB CHARSET=utf8\n\n'] 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last): 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters context) 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters cursor.execute(statement, parameters) 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters result = self._query(query) 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters conn.query(q) 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 841, in query 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters self._affected_rows = self._read_query_result(unbuffered=unbuffered) 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1029, in _read_query_result 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters result.read() 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1312, in read 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters first_packet = self.connection._read_packet() 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 991, in _read_packet 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters packet.check_error() 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters err.raise_mysql_exception(self._data) 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters raise errorclass(errno, errval) 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters InternalError: (1050, u"Table 'raw_template_files' already exists") 2017-10-21 13:48:10.338 29119 ERROR oslo_db.sqlalchemy.exc_filters ERROR: (pymysql.err.InternalError) (1050, u"Table 'raw_template_files' already exists") [SQL: u'\nCREATE TABLE raw_template_files (\n\tid INTEGER NOT NULL AUTO_INCREMENT, \n\tfiles LONGTEXT, \n\tcreated_at DATETIME, \n\tupdated_at DATETIME, \n\tPRIMARY KEY (id)\n)ENGINE=InnoDB CHARSET=utf8\n\n'] After workaround steps of dropping the raw_template_files table: sudo heat-manage --config-file /etc/heat/heat.conf db_sync 2017-10-21 14:00:58.417 31627 INFO migrate.versioning.api [-] 71 -> 72... 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters [-] DBAPIError exception wrapped from (pymysql.err.InternalError) (1060, u"Duplicate column name 'files_id'") [SQL: u'\nALTER TABLE raw_template ADD files_id INTEGER'] 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last): 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters context) 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters cursor.execute(statement, parameters) 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters result = self._query(query) 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters conn.query(q) 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 841, in query 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters self._affected_rows = self._read_query_result(unbuffered=unbuffered) 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1029, in _read_query_result 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters result.read() 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1312, in read 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters first_packet = self.connection._read_packet() 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 991, in _read_packet 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters packet.check_error() 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters err.raise_mysql_exception(self._data) 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters raise errorclass(errno, errval) 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters InternalError: (1060, u"Duplicate column name 'files_id'") 2017-10-21 14:00:58.450 31627 ERROR oslo_db.sqlalchemy.exc_filters ERROR: (pymysql.err.InternalError) (1060, u"Duplicate column name 'files_id'") [SQL: u'\nALTER TABLE raw_template ADD files_id INTEGER'] Expected results: "openstack undercloud upgrade" completes successfully. Additional info: Prior to db_sync we also ran purge_deleted: # heat-manage purge_deleted -g days 2 Heat DB is larger than usual, disk IO on the Director seems OK though. # du -hs /var/lib/mysql/* 16K /var/lib/mysql/aria_log.00000001 4.0K /var/lib/mysql/aria_log_control 21G /var/lib/mysql/ceilometer 3.1M /var/lib/mysql/glance 3.1G /var/lib/mysql/heat 26M /var/lib/mysql/ibdata1 5.0M /var/lib/mysql/ib_logfile0 5.0M /var/lib/mysql/ib_logfile1 1.3M /var/lib/mysql/ironic 457M /var/lib/mysql/keystone 14M /var/lib/mysql/mistral 1012K /var/lib/mysql/mysql 0 /var/lib/mysql/mysql.sock 23M /var/lib/mysql/neutron 23M /var/lib/mysql/nova 2.8M /var/lib/mysql/nova_api 212K /var/lib/mysql/performance_schema 0 /var/lib/mysql/test # du -hs /var/lib/mysql/heat/* 4.0K /var/lib/mysql/heat/db.opt 16K /var/lib/mysql/heat/event.frm 81M /var/lib/mysql/heat/event.ibd 12K /var/lib/mysql/heat/migrate_version.frm 96K /var/lib/mysql/heat/migrate_version.ibd 12K /var/lib/mysql/heat/raw_template_files.frm 96K /var/lib/mysql/heat/raw_template_files.ibd 12K /var/lib/mysql/heat/raw_template.frm 3.0G /var/lib/mysql/heat/raw_template.ibd 12K /var/lib/mysql/heat/resource_data.frm 560K /var/lib/mysql/heat/resource_data.ibd 16K /var/lib/mysql/heat/resource.frm 17M /var/lib/mysql/heat/resource.ibd 12K /var/lib/mysql/heat/service.frm 96K /var/lib/mysql/heat/service.ibd 16K /var/lib/mysql/heat/snapshot.frm 128K /var/lib/mysql/heat/snapshot.ibd 12K /var/lib/mysql/heat/software_config.frm 16M /var/lib/mysql/heat/software_config.ibd 16K /var/lib/mysql/heat/software_deployment.frm 12M /var/lib/mysql/heat/software_deployment.ibd 20K /var/lib/mysql/heat/stack.frm 12M /var/lib/mysql/heat/stack.ibd 12K /var/lib/mysql/heat/stack_lock.frm 96K /var/lib/mysql/heat/stack_lock.ibd 12K /var/lib/mysql/heat/stack_tag.frm 112K /var/lib/mysql/heat/stack_tag.ibd 12K /var/lib/mysql/heat/sync_point.frm 112K /var/lib/mysql/heat/sync_point.ibd 16K /var/lib/mysql/heat/user_creds.frm 96K /var/lib/mysql/heat/user_creds.ibd 12K /var/lib/mysql/heat/watch_data.frm 112K /var/lib/mysql/heat/watch_data.ibd 12K /var/lib/mysql/heat/watch_rule.frm 112K /var/lib/mysql/heat/watch_rule.ibd $ sudo iostat -mdx 1 Linux 3.10.0-693.2.2.el7.x86_64 (hostname) 10/21/2017 _x86_64_ (8 CPU) Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util vda 0.25 0.63 47.63 31.13 6.07 0.38 167.79 2.71 34.37 5.08 79.20 0.99 7.78 dm-0 0.00 0.00 0.01 0.00 0.00 0.00 30.12 0.00 0.00 0.00 0.00 0.00 0.00
Updated the attached KCS article to reflect the steps taken to fix the issue. Essentially, they came down to these, but note these steps are very specific for where in the db_sync the failure occurred. These steps might not apply for every single db_sync failure. # mysql MariaDB [none]> use heat; MariaDB [heat]> alter table raw_template drop key raw_tmpl_files_fkey_ref;" MariaDB [heat]> alter table raw_template drop foreign key raw_tmpl_files_fkey_ref;" MariaDB [heat]> alter table raw_template drop column files_id;" MariaDB [heat]> drop table raw_template_files;" Additionally, the puppet timeout can be increased to allow for a larger heat DB time to sync: For future deployments, you could edit etc/puppet/manifests/puppet-stack-config.pp and change: Exec<| title == 'heat-dbsync' |> { refreshonly => false } to have: Exec<| title == 'heat-dbsync' |> { refreshonly => false, timeout => 900 } So the timeout will be set to 15 min for the Heat dbsync command. Keep in mind the undercloud upgrade command override this file, so you might need to edit the file right after running the upgrade command when instack-undercloud package is upgraded. You can also stop the Puppet run with CTLR+C before the Heat dbsync resource, edit the file, and run the upgrade command again.
Reassigning, as the fix is to increase the timeout. Emilien can you ack? Thanks.
Verified on puddle 2018-04-26.3 [stack@undercloud-0 ~]$ rpm -q instack-undercloud instack-undercloud-8.4.1-2.el7ost.noarch 2018-04-30 06:11:31,872 INFO: ############################################################################# Undercloud upgrade complete. openstack undercloud upgrade completed without any issues
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086