Bug 1538336 - [OSP13][Deployment] Redeployment of overcloud fails during ControllerDeployment_Step4.2 when /usr/bin/gnocchi-upgrade fails badly.
Summary: [OSP13][Deployment] Redeployment of overcloud fails during ControllerDeployme...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: beta
: 13.0 (Queens)
Assignee: Pradeep Kilambi
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-24 20:51 UTC by Omri Hochman
Modified: 2018-06-27 13:44 UTC (History)
13 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.0.2-0.20180327213843.f25e2d8.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-27 13:43:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 553028 0 'None' MERGED Fix gnocchi-upgrade Table <..> already exists errors 2021-01-12 10:20:37 UTC
OpenStack gerrit 553328 0 'None' MERGED Fix gnocchi-upgrade Table <..> already exists errors 2021-01-12 10:20:37 UTC
Red Hat Product Errata RHEA-2018:2086 0 None None None 2018-06-27 13:44:09 UTC

Description Omri Hochman 2018-01-24 20:51:49 UTC
[OSP13][Deployment]  Openstack overcloud ControllerDeployment_Step4.2 fails after /usr/bin/gnocchi-upgrade fails badly

Environment:
-------------
openstack-tripleo-heat-templates-8.0.0-0.20180103192341.el7ost.noarch
openstack-tripleo-ui-7.4.3-4.el7ost.noarch
openstack-tripleo-common-containers-8.3.1-0.20180103233643.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.0-0.20171228195253.002c4ca.el7ost.noarch
openstack-tripleo-common-8.3.1-0.20180103233643.el7ost.noarch
python-tripleoclient-8.1.1-0.20171231084755.el7ost.noarch
openstack-tripleo-validations-8.1.1-0.20171221173840.ac39a91.el7ost.noarch
puppet-tripleo-8.1.1-0.20180102165828.el7ost.noarch
openstack-tripleo-image-elements-8.0.0-0.20180103224254.aad6322.el7ost.noarch


Completed upload for docker image docker-registry.engineering.redhat.com/rhosp13/openstack-gnocchi-api:2018-01-22.1
imagename: docker-registry.engineering.redhat.com/rhosp13/openstack-gnocchi-metricd:2018-01-22.1

Steps : 
--------
- Attempt to deploy osp13 (on Bare-Metal)
  3 controller 1 compute 3 ceph   


Results :
---------
status_code : Deployment exited with non-zero status code: 2
2018-01-24 19:18:18Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Resource CREATE failed: Error: resources.ControllerDeployment_Step4.resources[1]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
2018-01-24 19:18:18Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step4.resources[1]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
2018-01-24 19:18:19Z [overcloud]: CREATE_FAILED  Resource CREATE failed: Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step4.resources[1]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2

 Stack overcloud CREATE_FAILED

`
overcloud.AllNodesDeploySteps.ControllerDeployment_Step4.1:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: 007164e8-e03b-428c-b9e0-1376b721e897
  status: CREATE_FAILED
  status_reason: |
    Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "b2d7a40a9667: Download complete",
            "b2d7a40a9667: Pull complete",
            "Digest: sha256:041c65774210c6eba133bc5b87ad90cf1654d40e1a04d58fde9bd8ccb0950040"
        ]
    }
    	to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/91669d3e-6174-469b-9557-0ceced3c3121_playbook.retry

    PLAY RECAP *********************************************************************
    localhost                  : ok=7    changed=2    unreachable=0    failed=1

    (truncated, view all with --long)
  deploy_stderr: |

overcloud.AllNodesDeploySteps.ControllerDeployment_Step4.2:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: 3161a784-d937-448e-9fd6-0c54b4bcd405
  status: CREATE_FAILED
  status_reason: |
    Error: resources[2]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "b2d7a40a9667: Download complete",
            "b2d7a40a9667: Pull complete",
            "Digest: sha256:041c65774210c6eba133bc5b87ad90cf1654d40e1a04d58fde9bd8ccb0950040"
        ]
    }
    	to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/41dbea7d-5ed6-49d7-bb7c-60e6a3eb8c4a_playbook.retry

    PLAY RECAP *********************************************************************
    localhost                  : ok=7    changed=2    unreachable=0    failed=1

    (truncated, view all with --long)
  deploy_stderr: |

Heat Stack create failed.
Heat Stack create failed.
(undercloud) [stack@undercloud74 ~]$ 




ast login: Wed Jan 24 19:44:50 UTC 2018 on pts/0
[root@overcloud-controller-1 ~]# journalctl -u os-collect-config | grep ERROR
Jan 24 19:18:13 overcloud-controller-1 os-collect-config[3582]: ,626 [1] DEBUG    gnocchi.service: ********************************************************************************\", \n        \"2018-01-24 19:17:34,970 [1] INFO     gnocchi.cli.manage: Upgrading indexer SQLAlchemyIndexer: mysql+pymysql://gnocchi:Vudqvzm8RPjTQ4YPMPwbuTyD9.104.11/gnocchi?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf\", \n        \"2018-01-24 19:17:35,293 [1] ERROR    oslo_db.sqlalchemy.exc_filters: DBAPIError exception wrapped from (pymysql.err.InternalError) (1050, u\\\"Table 'resource_history' already exists\\\") [SQL: u'\\\\nCREATE TABLE resource_history (\\\\n\\\\tcreator VARCHAR(255), \\\\n\\\\tstarted_at DATETIME(6) NOT NULL, \\\\n\\\\trevision_start DATETIME(6) NOT NULL, \\\\n\\\\tended_at DATETIME(6), \\\\n\\\\tuser_id VARCHAR(255), \\\\n\\\\tproject_id VARCHAR(255), \\\\n\\\\toriginal_resource_id VARCHAR(255) NOT NULL, \\\\n\\\\trevision INTEGER NOT NULL AUTO_INCREMENT, \\\\n\\\\tid BINARY(16) NOT NULL, \\\\n\\\\trevision_end DATETIME(6) NOT NULL, \\\\n\\\\ttype VARCHAR(255) NOT NULL, \\\\n\\\\tPRIMARY KEY (revision), \\\\n\\\\tCONSTRAINT ck_started_before_ended CHECK (started_at <= ended_at), \\\\n\\\\tCONSTRAINT fk_rh_id_resource_id FOREIGN KEY(id) REFERENCES resource (id) ON DELETE CASCADE, \\\\n\\\\tCONSTRAINT fk_resource_history_resource_type_name FOREIGN KEY(type) REFERENCES resource_type (name) ON DELETE RESTRICT\\\\n)ENGINE=InnoDB CHARSET=utf8\\\\n\\\\n']\", \n        \"Traceback (most recent call last):\", \n        \"  File \\\"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\\\", line 1182, in _execute_context\", \n        \"    context)\", \n        \"  File \\\"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py\\\", line 470, in do_execute\", \n        \"    cursor.execute(statement, parameters)\", \n        \"  File \\\"/usr/lib/python2.7/site-packages/pymysql/cursors.py\\\", line 166, in execute\", \n        \"    result = self._query(query)\", \n        \"  File \\\"/usr/lib/python2.7/site-packages/pymysql/cursors.p
[root@overcloud-controller-1 ~]#



log from controller 1:
----------------------
Jan 24 14:17:35 localhost journal: InternalError: (1050, u"Table 'resource_history' already exists")
Jan 24 14:17:35 localhost journal: 2018-01-24 19:17:35,300 [1] CRITICAL root: Traceback (most recent call last):
Jan 24 14:17:35 localhost journal:  File "/usr/bin/gnocchi-upgrade", line 10, in <module>
Jan 24 14:17:35 localhost journal:    sys.exit(upgrade())
Jan 24 14:17:35 localhost journal:  File "/usr/lib/python2.7/site-packages/gnocchi/cli/manage.py", line 62, in upgrade
Jan 24 14:17:35 localhost journal:    index.upgrade()
Jan 24 14:17:35 localhost journal:  File "/usr/lib/python2.7/site-packages/gnocchi/indexer/sqlalchemy.py", line 323, in upgrade
Jan 24 14:17:35 localhost journal:    Base.metadata.create_all(connection)
Jan 24 14:17:35 localhost journal:  File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/schema.py", line 3934, in create_all
Jan 24 14:17:35 localhost journal:    tables=tables)
Jan 24 14:17:35 localhost journal:  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1538, in _run_visitor
Jan 24 14:17:35 localhost journal:    **kwargs).traverse_single(element)
Jan 24 14:17:35 localhost journal:  File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/visitors.py", line 121, in traverse_single
Jan 24 14:17:35 localhost journal:    return meth(obj, **kw)
Jan 24 14:17:35 localhost journal:  File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/ddl.py", line 733, in visit_metadata
Jan 24 14:17:35 localhost journal:    _is_metadata_operation=True)
Jan 24 14:17:35 localhost journal:  File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/visitors.py", line 121, in traverse_single
Jan 24 14:17:35 localhost journal:    return meth(obj, **kw)
Jan 24 14:17:35 localhost journal:  File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/ddl.py", line 767, in visit_table
Jan 24 14:17:35 localhost journal:    include_foreign_key_constraints=include_foreign_key_constraints
Jan 24 14:17:35 localhost journal:  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 945, in execute
Jan 24 14:17:35 localhost journal:    return meth(self, multiparams, params)
Jan 24 14:17:35 localhost journal:  File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/ddl.py", line 68, in _execute_on_connection
Jan 24 14:17:35 localhost journal:    return connection._execute_ddl(self, multiparams, params)
Jan 24 14:17:35 localhost journal:  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1002, in _execute_ddl
Jan 24 14:17:35 localhost journal:    compiled
Jan 24 14:17:35 localhost journal:  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
Jan 24 14:17:35 localhost journal:    context)
Jan 24 14:17:35 localhost journal:  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1398, in _handle_dbapi_exception
Jan 24 14:17:35 localhost journal:    util.raise_from_cause(newraise, exc_info)
Jan 24 14:17:35 localhost journal:  File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
Jan 24 14:17:35 localhost journal:    reraise(type(exception), exception, tb=exc_tb, cause=cause)
Jan 24 14:17:35 localhost journal:  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
Jan 24 14:17:35 localhost journal:    context)
Jan 24 14:17:35 localhost journal:  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
Jan 24 14:17:35 localhost journal:    cursor.execute(statement, parameters)
Jan 24 14:17:35 localhost journal:  File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute
Jan 24 14:17:35 localhost journal:    result = self._query(query)
Jan 24 14:17:35 localhost journal:  File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query
Jan 24 14:17:35 localhost journal:    conn.query(q)
Jan 24 14:17:35 localhost journal:  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 856, in query
Jan 24 14:17:35 localhost journal:    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
Jan 24 14:17:35 localhost journal:  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1057, in _read_query_result
Jan 24 14:17:35 localhost journal:    result.read()
Jan 24 14:17:35 localhost journal:  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1340, in read
Jan 24 14:17:35 localhost journal:    first_packet = self.connection._read_packet()
Jan 24 14:17:35 localhost journal:  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1014, in _read_packet
Jan 24 14:17:35 localhost journal:    packet.check_error()
Jan 24 14:17:35 localhost journal:  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error
Jan 24 14:17:35 localhost journal:    err.raise_mysql_exception(self._data)
Jan 24 14:17:35 localhost journal:  File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
Jan 24 14:17:35 localhost journal:    raise errorclass(errno, errval)
Jan 24 14:17:35 localhost journal: DBError: (pymysql.err.InternalError) (1050, u"Table 'resource_history' already exists") [SQL: u'\nCREATE TABLE resource_history (\n\tcreator VARCHAR(255), \n\tstarted_at DATETIME(6) NOT NULL, \n\trevision_start DATETIME(6) NOT NULL, \n\tended_at DATETIME(6), \n\tuser_id VARCHAR(255), \n\tproject_id VARCHAR(255), \n\toriginal_resource_id VARCHAR(255) NOT NULL, \n\trevision INTEGER NOT NULL AUTO_INCREMENT, \n\tid BINARY(16) NOT NULL, \n\trevision_end DATETIME(6) NOT NULL, \n\ttype VARCHAR(255) NOT NULL, \n\tPRIMARY KEY (revision), \n\tCONSTRAINT ck_started_before_ended CHECK (started_at <= ended_at), \n\tCONSTRAINT fk_rh_id_resource_id FOREIGN KEY(id) REFERENCES resource (id) ON DELETE CASCADE, \n\tCONSTRAINT fk_resource_history_resource_type_name FOREIGN KEY(type) REFERENCES resource_type (name) ON DELETE RESTRICT\n)ENGINE=InnoDB CHARSET=utf8\n\n']

Comment 2 Omri Hochman 2018-01-24 21:01:12 UTC
[root@undercloud74 ~]# skopeo inspect --tls-verify=false docker://docker-registry.engineering.redhat.com/rhosp13/openstack-gnocchi-metricd:2018-01-22.1
{
    "Name": "docker-registry.engineering.redhat.com/rhosp13/openstack-gnocchi-metricd",
    "Tag": "latest",
    "Digest": "sha256:2cfc66bc2b99de2d358653f8d5200da38ae85b8e3e2d501944e129506be4b821",
    "RepoTags": [
        "latest",
        "20180112.1",
        "13.0",
        "13.0-20180113.1",
        "2018-01-22.1",
        "13.0-20180112.1",
        "2017-12-20.1",
        "2018-01-03.2",
        "2018-01-10.4",
        "2018-01-12.2",
        "2018-01-17.2",
        "2018-01-19.1"
    ],
    "Created": "2018-01-16T16:58:28.936808Z",
    "DockerVersion": "1.12.6",
    "Labels": {
        "Kolla-SHA": "6.0.0.0b2-54-gbee9ea39",
        "architecture": "x86_64",
        "authoritative-source-url": "registry.access.redhat.com",
        "build-date": "2018-01-16T16:43:56.917775",
        "com.redhat.build-host": "ip-10-29-120-186.ec2.internal",
        "com.redhat.component": "openstack-gnocchi-metricd-docker",
        "description": "Red Hat OpenStack Platform 13.0 gnocchi-metricd",
        "distribution-scope": "public",
        "io.k8s.description": "Red Hat OpenStack Platform 13.0 gnocchi-metricd",
        "io.k8s.display-name": "Red Hat OpenStack Platform 13.0 gnocchi-metricd",
        "io.openshift.tags": "rhosp osp openstack osp-13.0",
        "kolla_version": "bee9ea39ff1b1c960c5f4f8f1a26fcced71a4ec3",
        "name": "rhosp13/openstack-gnocchi-metricd",
        "release": "3",
        "summary": "Red Hat OpenStack Platform 13.0 gnocchi-metricd",
        "tripleo-common_version": "8.3.0-2-g04317ff",
        "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/rhosp13/openstack-gnocchi-metricd/images/13.0-3",
        "vcs-ref": "987c66824e1f7a0072f2c6ad8284a6b1d03762fa",
        "vcs-type": "git",
        "vendor": "Red Hat, Inc.",
        "version": "13.0",
        "version-release": "13.0-20180112.1"
    },
    "Architecture": "amd64",
    "Os": "linux",
    "Layers": [
        "sha256:9cadd93b16ff2a0c51ac967ea2abfadfac50cfa3af8b5bf983d89b8f8647f3e4",
        "sha256:4aa565ad8b7a87248163ce7dba1dd3894821aac97e846b932ff6b8ef9a8a508a",
        "sha256:a45298a8cdc01417dbd6afa9bd3fd661e746b2537a3a952f4bba91fc1c78e824",
        "sha256:50414ef47dd94a43681dc858f0ccfcfc7c8bf26f6cd182807ccafb372ded330b",
        "sha256:602f31cb60adedc3c9d9f8f1e5907142731a7691aabf353b8d16bad9a6a002f1",
        "sha256:d91fde1655b99c5958b36f01c4c448fa8920cf939c0102c5d1870e05f8f02f7c"
    ]
}

Comment 4 Omri Hochman 2018-01-25 16:33:32 UTC
(In reply to Pradeep Kilambi from comment #3)
> This doesnt seem like a fresh deploy or even an upgrade scenario? Looks like
> there is already a gnocchi db on disk, when you started the install. Did
> your initial deploy fail for some reason or was aborted and you re-launched
> the deploy? Can you try another fresh deploy and see if you can reproduce
> this?


The issue reproduce the second time on the same environment,  but You are right, the initial deployment failed because there was an issue to pull containers from the local registry, after fixing that registry issue, I executed -->
  #openstack overcloud delete overcloud 

When the stack was removed I run deployment again and then the Bug Happened. 

- Now after chatting about it in the IRC this is still a valid bug although it happened only after deleting the Stack and running deployment again. 
the theory for root cause of the problem could be that the delete leaves unclean environment (DB or containers on the nodes) and therefore the issue occurs.

Comment 5 Omri Hochman 2018-01-25 19:47:54 UTC
(In reply to Omri Hochman from comment #4)
> - Now after chatting about it in the IRC this is still a valid bug although
> it happened only after deleting the Stack and running deployment again. 
> the theory for root cause of the problem could be that the delete leaves
> unclean environment (DB or containers on the nodes) and therefore the issue
> occurs.


After running :
-----------------
- openstack overcloud delete overcloud 
- start the nodes using ironic power on 
- ssh the nodes 
- run sudo docker ps  / run sudo docker images 

Results :
---------
It seems that we still have docker containers and images on the overcloud nodes after running "#openstack overcloud delete overcloud". 


whether or not that is the source of the issue in this bug , I'll open a separate bug for that. https://bugzilla.redhat.com/show_bug.cgi?id=1538777

Comment 7 Omri Hochman 2018-01-29 15:53:58 UTC
Re-opened the issue reproduced on clean deployment :

\"ObjectNotFound: error opening pool 'metrics'\",
(undercloud) [stack@undercloud74 ~]$ echo -e `heat deployment-show 37bc65f5-6986-4f27-b583-986333b648a4`|grep -i error
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:344: SubjectAltNameWarning: Certificate for 192.168.0.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:344: SubjectAltNameWarning: Certificate for 192.168.0.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
 \"Error running ['docker', 'run', '--name', 'gnocchi_db_sync', '--label', 'config_id=tripleo_step4', '--label', 'container_name=gnocchi_db_sync', '--label', 'managed_by=paunch', '--label', 'config_data={\\"environment\\": [\\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\\", \\"TRIPLEO_CONFIG_HASH=1a569d012dc804939398b671bf257703\\"], \\"user\\": \\"root\\", \\"volumes\\": [\\"/etc/hosts:/etc/hosts:ro\\", \\"/etc/localtime:/etc/localtime:ro\\", \\"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\\", \\"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\\", \\"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\\", \\"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\\", \\"/dev/log:/dev/log\\", \\"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\\", \\"/etc/puppet:/etc/puppet:ro\\", \\"/var/lib/kolla/config_files/gnocchi_db_sync.json:/var/lib/kolla/config_files/config.json:ro\\", \\"/var/lib/config-data/puppet-generated/gnocchi/:/var/lib/kolla/config_files/src:ro\\", \\"/var/log/containers/gnocchi:/var/log/gnocchi\\", \\"/var/log/containers/httpd/gnocchi-api:/var/log/httpd\\", \\"/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro\\"], \\"image\\": \\"192.168.0.1:8787/rhosp13/openstack-gnocchi-api:13.0-20180112.1\\", \\"detach\\": false, \\"net\\": \\"host\\", \\"privileged\\": false}', '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS', '--env=TRIPLEO_CONFIG_HASH=1a569d012dc804939398b671bf257703', '--net=host', '--privileged=false', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/kolla/config_files/gnocchi_db_sync.json:/var/lib/kolla/config_files/config.json:ro', '--volume=/var/lib/config-data/puppet-generated/gnocchi/:/var/lib/kolla/config_files/src:ro', '--volume=/var/log/containers/gnocchi:/var/log/gnocchi', '--volume=/var/log/containers/httpd/gnocchi-api:/var/log/httpd', '--volume=/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro', '192.168.0.1:8787/rhosp13/openstack-gnocchi-api:13.0-20180112.1']. [1]\",
 \"ObjectNotFound: error opening pool 'metrics'\",
(undercloud) [stack@undercloud74 ~]$
(undercloud) [stack@undercloud74 ~]$
(undercloud) [stack@undercloud74 ~]$ openstack stack list
+--------------------------------------+------------+----------------------------------+---------------+----------------------+--------------+
| ID                                   | Stack Name | Project                          | Stack Status  | Creation Time        | Updated Time |
+--------------------------------------+------------+----------------------------------+---------------+----------------------+--------------+
| 3b94d14f-b2cf-4fbc-9cff-e4533293c1a3 | overcloud  | d2ad266cecf9419f9fd906d2c916d998 | CREATE_FAILED | 2018-01-28T15:25:13Z | None         |
+--------------------------------------+------------+----------------------------------+---------------+----------------------+--------------+

Comment 8 Pradeep Kilambi 2018-01-29 16:11:00 UTC
(In reply to Omri Hochman from comment #7)
> Re-opened the issue reproduced on clean deployment :

>  \"Error running ['docker', 'run', '--name', 'gnocchi_db_sync', '--label',
> 'config_id=tripleo_step4', '--label', 'container_name=gnocchi_db_sync',
> '--label', 'managed_by=paunch', '--label', 'config_data={\\"environment\\":
> [\\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\\",
> \\"TRIPLEO_CONFIG_HASH=1a569d012dc804939398b671bf257703\\"], \\"user\\":
> \\"root\\", \\"volumes\\": [\\"/etc/hosts:/etc/hosts:ro\\",
> \\"/etc/localtime:/etc/localtime:ro\\",
> \\"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\\",
> \\"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\\",
> \\"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.
> crt:ro\\", \\"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\\",
> \\"/dev/log:/dev/log\\",
> \\"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\\",
> \\"/etc/puppet:/etc/puppet:ro\\",
> \\"/var/lib/kolla/config_files/gnocchi_db_sync.json:/var/lib/kolla/
> config_files/config.json:ro\\",
> \\"/var/lib/config-data/puppet-generated/gnocchi/:/var/lib/kolla/
> config_files/src:ro\\", \\"/var/log/containers/gnocchi:/var/log/gnocchi\\",
> \\"/var/log/containers/httpd/gnocchi-api:/var/log/httpd\\",
> \\"/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro\\"], \\"image\\":
> \\"192.168.0.1:8787/rhosp13/openstack-gnocchi-api:13.0-20180112.1\\",
> \\"detach\\": false, \\"net\\": \\"host\\", \\"privileged\\": false}',
> '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS',
> '--env=TRIPLEO_CONFIG_HASH=1a569d012dc804939398b671bf257703', '--net=host',
> '--privileged=false', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro',
> '--volume=/etc/localtime:/etc/localtime:ro',
> '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro',
> '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:
> ro',
> '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-
> bundle.trust.crt:ro',
> '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro',
> '--volume=/dev/log:/dev/log',
> '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro',
> '--volume=/etc/puppet:/etc/puppet:ro',
> '--volume=/var/lib/kolla/config_files/gnocchi_db_sync.json:/var/lib/kolla/
> config_files/config.json:ro',
> '--volume=/var/lib/config-data/puppet-generated/gnocchi/:/var/lib/kolla/
> config_files/src:ro',
> '--volume=/var/log/containers/gnocchi:/var/log/gnocchi',
> '--volume=/var/log/containers/httpd/gnocchi-api:/var/log/httpd',
> '--volume=/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro',
> '192.168.0.1:8787/rhosp13/openstack-gnocchi-api:13.0-20180112.1']. [1]\",
>  \"ObjectNotFound: error opening pool 'metrics'\",



This is not the same issue as the original. IN this case looks like your ceph cluster does not have metrics pool. Is this external ceph? if so, make sure you create those pools accordingly.

Comment 9 Omri Hochman 2018-01-29 16:16:37 UTC
(In reply to Pradeep Kilambi from comment #8)
> (In reply to Omri Hochman from comment #7)
> > Re-opened the issue reproduced on clean deployment :

> 
> This is not the same issue as the original. IN this case looks like your
> ceph cluster does not have metrics pool. Is this external ceph? if so, make
> sure you create those pools accordingly.

Thanks checking on that. It's internal ceph with standart: 
3X controller 1X compute 3X ceph 
deployment.

Comment 10 John Fulton 2018-01-29 17:57:06 UTC
The problem is how tripleo set up the ceph-ansible deployment. 

- ceph-ansible call has no input via extra vars [1]
- inventory has no input [2]
- thus no arguments were passed to ceph-ansible
- thus, ceph-ansible skipped all of its tasks, no input provided, and the playbook run returned no error [3]

[1]
2018-01-28 11:27:28.996 31461 DEBUG oslo_concurrency.processutils [req-1b57e863-20e4-414b-8b0c-62d37514f64b f8716113cb2d44259eeebf97c3570146 d2ad266cecf9419f9fd906d2c916d998 - default default] CMD "ansible-playbook /usr/share/ceph-ansible/site-docker.yml.sample --user tripleo-admin --become --become-user root --inventory-file /tmp/ansible-mistral-action60aVA0/inventory.yaml --private-key /tmp/ansible-mistral-action60aVA0/ssh_private_key --skip-tags package-install,with_pkg" returned: 0 in 873.287s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409


[2]
As per https://github.com/fultonj/tripleo-ceph-ansible/blob/master/get-inventory.sh:

Inventory from 2018-01-28 16:27:30

{
    "mgr_ips": [
        "192.168.0.8", 
        "192.168.0.19", 
        "192.168.0.17"
    ], 
    "mon_ips": [
        "192.168.0.8", 
        "192.168.0.19", 
        "192.168.0.17"
    ], 
    "mds_ips": [], 
    "osd_ips": [
        "192.168.0.16", 
        "192.168.0.12", 
        "192.168.0.13"
    ], 
    "rbdmirror_ips": [], 
    "rgw_ips": [], 
    "client_ips": [
        "192.168.0.15"
    ], 
    "nfs_ips": []
}

[3]

2018-01-28 11:16:15,456 p=3697 u=mistral |  TASK [ceph-mon : create openstack pool(s)] *************************************
2018-01-28 11:16:15,513 p=3697 u=mistral |  skipping: [192.168.0.8] => (item={u'rule_name': u'', u'pg_num': 128, u'name': u'images'}) 
2018-01-28 11:16:15,536 p=3697 u=mistral |  skipping: [192.168.0.8] => (item={u'rule_name': u'', u'pg_num': 128, u'name': u'metrics'}) 
2018-01-28 11:16:15,559 p=3697 u=mistral |  skipping: [192.168.0.8] => (item={u'rule_name': u'', u'pg_num': 128, u'name': u'backups'}) 
2018-01-28 11:16:15,581 p=3697 u=mistral |  skipping: [192.168.0.8] => (item={u'rule_name': u'', u'pg_num': 128, u'name': u'vms'}) 
2018-01-28 11:16:15,600 p=3697 u=mistral |  skipping: [192.168.0.8] => (item={u'rule_name': u'', u'pg_num': 128, u'name': u'volumes'})

Comment 11 Omri Hochman 2018-01-29 18:09:50 UTC
the issue from last comments is going to be taking care of by tracking : https://bugzilla.redhat.com/show_bug.cgi?id=1539852

Comment 12 Omri Hochman 2018-01-29 18:10:33 UTC

*** This bug has been marked as a duplicate of bug 1538777 ***

Comment 13 Omri Hochman 2018-03-14 20:09:18 UTC
reopen , as it reproduced again with clean deploy. 

we thought it might be related to this #1552685 but the error is different, 

Next move, trying to W/A it by adding to the deploy_command: 
 -e /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml

Comment 17 Omri Hochman 2018-03-14 23:25:33 UTC
(In reply to Omri Hochman from comment #13)
> reopen , as it reproduced again with clean deploy. 
> 
> we thought it might be related to this #1552685 but the error is different, 
> 
> Next move, trying to W/A it by adding to the deploy_command: 
>  -e
> /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.
> yaml

With disabled gnocchi CREATE_COMPLETE

Comment 21 Gurenko Alex 2018-03-26 11:46:00 UTC
Seeing it last 2 days with all RHOS 13 IR deployments with latest puddle. I've added workaround to the CI jobs and re-triggered all of them.

Comment 23 Omri Hochman 2018-04-04 02:12:23 UTC
unable to reproduce with: openstack-tripleo-heat-templates-8.0.2-0.20180327213843.f25e2d8.el7ost.noarch

Comment 25 errata-xmlrpc 2018-06-27 13:43:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086


Note You need to log in before you can comment on or make changes to this bug.