Bug 1552685 - FFU: deploy_steps_playbook.yaml playbook fails while running '/usr/bin/gnocchi-upgrade --sacks-number=128' command on 2/3 controllers
Summary: FFU: deploy_steps_playbook.yaml playbook fails while running '/usr/bin/gnocch...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: beta
: 13.0 (Queens)
Assignee: Emilien Macchi
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-07 14:33 UTC by Marius Cornea
Modified: 2018-06-27 13:35 UTC (History)
11 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.0.0-0.20180326192239.e59fd2c.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-27 13:35:05 UTC
Target Upstream Version:
mbracho: needinfo+


Attachments (Terms of Use)
gnocchi_db_sync logs (6.75 KB, application/x-gzip)
2018-03-07 14:33 UTC, Marius Cornea
no flags Details


Links
System ID Priority Status Summary Last Updated
OpenStack gerrit 553028 None master: MERGED tripleo-heat-templates: Fix gnocchi-upgrade Table <..> already exists errors (I106512eeffff3425608a543f9bc5e6a9508d15e5) 2018-03-22 17:03:33 UTC
OpenStack gerrit 553328 None stable/queens: MERGED tripleo-heat-templates: Fix gnocchi-upgrade Table <..> already exists errors (I106512eeffff3425608a543f9bc5e6a9508d15e5) 2018-03-22 17:03:25 UTC
Red Hat Product Errata RHEA-2018:2086 None None None 2018-06-27 13:35:52 UTC

Description Marius Cornea 2018-03-07 14:33:09 UTC
Created attachment 1405378 [details]
gnocchi_db_sync logs

Description of problem:
FFU: deploy_steps_playbook.yaml playbook fails while running '/usr/bin/gnocchi-upgrade --sacks-number=128' command on 2/3 controllers

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP10 with 3 controllers + 2 computes
2. Upgrade undercloud to OSP13
3. Apply FFU patches in https://review.openstack.org/#/q/topic:bp/fast-forward-upgrades+(status:open+OR+status:merged)

Actual results:
While running deploy_steps_playbook.yaml 

Expected results:


Additional info:

Comment 2 Marius Cornea 2018-03-08 19:34:29 UTC
I now noticed that the initial report was incomplete for some reason so I am updating here:

Actual results:
While running deploy_steps_playbook.yaml  gnocchi_db_sync container fails on 2/3 controllers while running the  '/usr/bin/gnocchi-upgrade --sacks-number=128' command with:

DBError: (pymysql.err.InternalError) (1138, u'Invalid use of NULL value') [SQL: u'ALTER TABLE resource CHANGE started_at_ts started_at DATETIME(6) NOT NULL'] (Background on this error at: http://sqlalche.me/e/2j85)


Expected results:


Additional info:
Attaching the gnocchi_db_sync container output from all 3 controllers.

Comment 3 Lukas Bezdicka 2018-03-12 13:38:38 UTC
Needs change to package update -> db sync, debugging.

Comment 4 Omri Hochman 2018-03-14 12:00:08 UTC
Seems like this gnocchi update issue happens to me on BM on
 **clean deployment* of OSP13 regardless to FFU : 
(puddle 2018-03-02.2) 


overcloud.AllNodesDeploySteps.ControllerDeployment_Step4.1:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: 01066fee-123d-47bd-b55f-e24055ebd051
  status: CREATE_FAILED
  status_reason: |
    Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |

    PLAY [localhost] ***************************************************************

    TASK [Gathering Facts] *********************************************************
    ok: [localhost]

    TASK [Create /var/lib/tripleo-config directory] ********************************
    skipping: [localhost]

    TASK [Write the puppet step_config manifest] ***********************************
    skipping: [localhost]

    TASK [Create /var/lib/docker-puppet] *******************************************
    skipping: [localhost]

    TASK [Write docker-puppet-tasks json files] ************************************
    skipping: [localhost]

    TASK [Create /var/lib/docker-config-scripts] ***********************************
    skipping: [localhost]

    TASK [Clean old /var/lib/docker-container-startup-configs.json file] ***********
    skipping: [localhost]

    TASK [Write docker config scripts] *********************************************
    skipping: [localhost] => (item={'value': {u'content': u'#!/bin/bash\nexport OS_PROJECT_DOMAIN_NAME=$(crudini --get /etc/nova/nova.conf keystone_authtoken project_domain_name)\nexport OS_USER_DOMAIN_NAME=$(crudini --get /etc/nova/nova.conf keystone_authtoken user_domain_name)\nexport OS_PROJECT_NAME=$(crudini --get /etc/nova/nova.conf keystone_authtoken project_name)\nexport OS_USERNAME=$(crudini --get /etc/nova/nova.conf keystone_authtoken username)\nexport OS_PASSWORD=$(crudini --get /etc/nova/nova.conf keystone_authtoken password)\nexport OS_AUTH_URL=$(crudini --get /etc/nova/nova.conf keystone_authtoken auth_url)\nexport OS_AUTH_TYPE=password\nexport OS_IDENTITY_API_VERSION=3\n\necho "(cellv2) Running cell_v2 host discovery"\ntimeout=600\nloop_wait=30\ndeclare -A discoverable_hosts\nfor host in $(hiera -c /etc/puppet/hiera.yaml cellv2_discovery_hosts | sed -e \'/^nil$/d\' |  tr "," " "); do discoverable_hosts[$host]=1; done\ntimeout_at=$(( $(date +"%s") + ${timeout} ))\necho "(cellv2) Waiting ${timeout} seconds for hosts to register"\nfinished=0\nwhile : ; do\n  for host in $(openstack -q compute service list -c \'Host\' -c \'Zone\' -f value | awk \'$2 != "internal" { print $1 }\'); do\n    if (( discoverable_hosts[$host] == 1 )); then\n      echo "(cellv2) compute node $host has registered"\n      unset discoverable_hosts[$host]\n    fi\n  done\n  finished=1\n  for host in "${!discover

Comment 5 Omri Hochman 2018-03-14 12:04:50 UTC
Adding the error : 

    "2018-03-12 10:50:34,561 [1] DEBUG    gnocchi.service: archive_policy.default_aggregation_methods = ['mean', 'min', 'max', 'sum', 'std', 'count']",
            "2018-03-12 10:50:34,561 [1] DEBUG    gnocchi.service: ********************************************************************************",
            "2018-03-12 10:50:34,935 [1] INFO     gnocchi.cli.manage: Upgrading indexer SQLAlchemyIndexer: mysql+pymysql://gnocchi:24XgRTjBPpWYCpTerytNA3McC@10.19.104.12/gnocchi
?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf",
            "2018-03-12 10:50:35,998 [1] ERROR    oslo_db.sqlalchemy.exc_filters: DBAPIError exception wrapped from (pymysql.err.InternalError) (1050, u\"Table 'resource' alread
y exists\") [SQL: u'\\nCREATE TABLE resource (\\n\\tcreator VARCHAR(255), \\n\\tstarted_at DATETIME(6) NOT NULL, \\n\\trevision_start DATETIME(6) NOT NULL, \\n\\tended_at DATETI
ME(6), \\n\\tuser_id VARCHAR(255), \\n\\tproject_id VARCHAR(255), \\n\\toriginal_resource_id VARCHAR(255) NOT NULL, \\n\\tid BINARY(16) NOT NULL, \\n\\ttype VARCHAR(255) NOT NUL
L, \\n\\tPRIMARY KEY (id), \\n\\tCONSTRAINT ck_started_before_ended CHECK (started_at <= ended_at), \\n\\tCONSTRAINT fk_resource_resource_type_name FOREIGN KEY(type) REFERENCES
resource_type (name) ON DELETE RESTRICT\\n)ENGINE=InnoDB CHARSET=utf8\\n\\n'] (Background on this error at: http://sqlalche.me/e/2j85)",
            "Traceback (most recent call last):",
            "  File \"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\", line 1193, in _execute_context",
            "    context)",
            "  File \"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py\", line 507, in do_execute",
            "    cursor.execute(statement, parameters)",
            "  File \"/usr/lib/python2.7/site-packages/pymysql/cursors.py\", line 166, in execute",

Comment 8 Dan Prince 2018-03-14 20:35:55 UTC
Potential upstream fix is being tested here: https://review.openstack.org/#/c/553051/

Comment 9 Dan Prince 2018-03-14 20:39:08 UTC
Bandini already had this patch: https://review.openstack.org/#/c/553028/. We can go with his.

Comment 17 errata-xmlrpc 2018-06-27 13:35:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086


Note You need to log in before you can comment on or make changes to this bug.