Created attachment 1405378 [details] gnocchi_db_sync logs Description of problem: FFU: deploy_steps_playbook.yaml playbook fails while running '/usr/bin/gnocchi-upgrade --sacks-number=128' command on 2/3 controllers Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Deploy OSP10 with 3 controllers + 2 computes 2. Upgrade undercloud to OSP13 3. Apply FFU patches in https://review.openstack.org/#/q/topic:bp/fast-forward-upgrades+(status:open+OR+status:merged) Actual results: While running deploy_steps_playbook.yaml Expected results: Additional info:
I now noticed that the initial report was incomplete for some reason so I am updating here: Actual results: While running deploy_steps_playbook.yaml gnocchi_db_sync container fails on 2/3 controllers while running the '/usr/bin/gnocchi-upgrade --sacks-number=128' command with: DBError: (pymysql.err.InternalError) (1138, u'Invalid use of NULL value') [SQL: u'ALTER TABLE resource CHANGE started_at_ts started_at DATETIME(6) NOT NULL'] (Background on this error at: http://sqlalche.me/e/2j85) Expected results: Additional info: Attaching the gnocchi_db_sync container output from all 3 controllers.
Needs change to package update -> db sync, debugging.
Seems like this gnocchi update issue happens to me on BM on **clean deployment* of OSP13 regardless to FFU : (puddle 2018-03-02.2) overcloud.AllNodesDeploySteps.ControllerDeployment_Step4.1: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 01066fee-123d-47bd-b55f-e24055ebd051 status: CREATE_FAILED status_reason: | Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | PLAY [localhost] *************************************************************** TASK [Gathering Facts] ********************************************************* ok: [localhost] TASK [Create /var/lib/tripleo-config directory] ******************************** skipping: [localhost] TASK [Write the puppet step_config manifest] *********************************** skipping: [localhost] TASK [Create /var/lib/docker-puppet] ******************************************* skipping: [localhost] TASK [Write docker-puppet-tasks json files] ************************************ skipping: [localhost] TASK [Create /var/lib/docker-config-scripts] *********************************** skipping: [localhost] TASK [Clean old /var/lib/docker-container-startup-configs.json file] *********** skipping: [localhost] TASK [Write docker config scripts] ********************************************* skipping: [localhost] => (item={'value': {u'content': u'#!/bin/bash\nexport OS_PROJECT_DOMAIN_NAME=$(crudini --get /etc/nova/nova.conf keystone_authtoken project_domain_name)\nexport OS_USER_DOMAIN_NAME=$(crudini --get /etc/nova/nova.conf keystone_authtoken user_domain_name)\nexport OS_PROJECT_NAME=$(crudini --get /etc/nova/nova.conf keystone_authtoken project_name)\nexport OS_USERNAME=$(crudini --get /etc/nova/nova.conf keystone_authtoken username)\nexport OS_PASSWORD=$(crudini --get /etc/nova/nova.conf keystone_authtoken password)\nexport OS_AUTH_URL=$(crudini --get /etc/nova/nova.conf keystone_authtoken auth_url)\nexport OS_AUTH_TYPE=password\nexport OS_IDENTITY_API_VERSION=3\n\necho "(cellv2) Running cell_v2 host discovery"\ntimeout=600\nloop_wait=30\ndeclare -A discoverable_hosts\nfor host in $(hiera -c /etc/puppet/hiera.yaml cellv2_discovery_hosts | sed -e \'/^nil$/d\' | tr "," " "); do discoverable_hosts[$host]=1; done\ntimeout_at=$(( $(date +"%s") + ${timeout} ))\necho "(cellv2) Waiting ${timeout} seconds for hosts to register"\nfinished=0\nwhile : ; do\n for host in $(openstack -q compute service list -c \'Host\' -c \'Zone\' -f value | awk \'$2 != "internal" { print $1 }\'); do\n if (( discoverable_hosts[$host] == 1 )); then\n echo "(cellv2) compute node $host has registered"\n unset discoverable_hosts[$host]\n fi\n done\n finished=1\n for host in "${!discover
Adding the error : "2018-03-12 10:50:34,561 [1] DEBUG gnocchi.service: archive_policy.default_aggregation_methods = ['mean', 'min', 'max', 'sum', 'std', 'count']", "2018-03-12 10:50:34,561 [1] DEBUG gnocchi.service: ********************************************************************************", "2018-03-12 10:50:34,935 [1] INFO gnocchi.cli.manage: Upgrading indexer SQLAlchemyIndexer: mysql+pymysql://gnocchi:24XgRTjBPpWYCpTerytNA3McC.104.12/gnocchi ?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf", "2018-03-12 10:50:35,998 [1] ERROR oslo_db.sqlalchemy.exc_filters: DBAPIError exception wrapped from (pymysql.err.InternalError) (1050, u\"Table 'resource' alread y exists\") [SQL: u'\\nCREATE TABLE resource (\\n\\tcreator VARCHAR(255), \\n\\tstarted_at DATETIME(6) NOT NULL, \\n\\trevision_start DATETIME(6) NOT NULL, \\n\\tended_at DATETI ME(6), \\n\\tuser_id VARCHAR(255), \\n\\tproject_id VARCHAR(255), \\n\\toriginal_resource_id VARCHAR(255) NOT NULL, \\n\\tid BINARY(16) NOT NULL, \\n\\ttype VARCHAR(255) NOT NUL L, \\n\\tPRIMARY KEY (id), \\n\\tCONSTRAINT ck_started_before_ended CHECK (started_at <= ended_at), \\n\\tCONSTRAINT fk_resource_resource_type_name FOREIGN KEY(type) REFERENCES resource_type (name) ON DELETE RESTRICT\\n)ENGINE=InnoDB CHARSET=utf8\\n\\n'] (Background on this error at: http://sqlalche.me/e/2j85)", "Traceback (most recent call last):", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\", line 1193, in _execute_context", " context)", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py\", line 507, in do_execute", " cursor.execute(statement, parameters)", " File \"/usr/lib/python2.7/site-packages/pymysql/cursors.py\", line 166, in execute",
Potential upstream fix is being tested here: https://review.openstack.org/#/c/553051/
Bandini already had this patch: https://review.openstack.org/#/c/553028/. We can go with his.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086