Bug 1623417 - ceph-ansible breaks keyrings created from old mon_cap, osd_cap, ... params
Summary: ceph-ansible breaks keyrings created from old mon_cap, osd_cap, ... params
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 3.1
Assignee: Guillaume Abrioux
QA Contact: Rachana Patel
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-29 10:11 UTC by karan singh
Modified: 2019-10-24 05:38 UTC (History)
22 users (show)

Fixed In Version: RHEL: ceph-ansible-3.1.3-1.el7cp Ubuntu: ceph-ansible_3.1.3-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-26 18:24:01 UTC
Embargoed:


Attachments (Terms of Use)
auth_list_3.1.2 (9.39 KB, text/plain)
2018-09-06 17:48 UTC, Giulio Fidente
no flags Details
ansible_log_3.1.2 (1009.88 KB, text/plain)
2018-09-06 18:02 UTC, Giulio Fidente
no flags Details
inventory_3.1.2 (4.16 KB, text/plain)
2018-09-06 18:35 UTC, Giulio Fidente
no flags Details
auth_list_3.1.2 (9.40 KB, text/plain)
2018-09-06 18:37 UTC, Giulio Fidente
no flags Details
ansible_log_3.1.2 (5.08 MB, text/plain)
2018-09-06 19:01 UTC, Giulio Fidente
no flags Details
auth_list_3.1.0rc10 (9.37 KB, text/plain)
2018-09-06 20:16 UTC, Giulio Fidente
no flags Details
inventory_3.1.0rc10 (4.16 KB, text/plain)
2018-09-06 20:17 UTC, Giulio Fidente
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 3106 0 'None' closed Revert "client: add quotes to the dict values" 2020-09-24 20:34:18 UTC
Github ceph ceph-ansible pull 3107 0 'None' closed Automatic backport of pull request #3106 2020-09-24 20:34:21 UTC
Red Hat Product Errata RHBA-2018:2819 0 None None None 2018-09-26 18:24:50 UTC

Description karan singh 2018-08-29 10:11:47 UTC
Description of problem:

openstack overcloud deploy command ran for 4 hours (247 Minutes) but failed eventually with error the following

"[overcloud.AllNodesDeploySteps]: CREATE_FAILED  CREATE aborted (Task create from TemplateResource "AllNodesDeploySteps" Stack "overcloud" [404edbf5-6788-4255-95cd-68f0b1044d27] Timed out)"


Version-Release number of selected component (if applicable):

[root@refarch-r220-02 ~]# rpm -qa | grep -i openstack
openstack-nova-placement-api-17.0.3-0.20180420001141.el7ost.noarch
puppet-openstacklib-12.4.0-0.20180329042555.4b30e6f.el7ost.noarch
openstack-glance-16.0.1-2.el7ost.noarch
openstack-tripleo-common-containers-8.6.1-23.el7ost.noarch
openstack-heat-api-10.0.1-0.20180411125640.el7ost.noarch
python2-openstackclient-3.14.1-1.el7ost.noarch
openstack-tempest-18.0.0-2.el7ost.noarch
openstack-mistral-api-6.0.2-1.el7ost.noarch
openstack-zaqar-6.0.1-1.el7ost.noarch
openstack-swift-container-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-neutron-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-ironic-api-10.1.2-4.el7ost.noarch
openstack-tripleo-ui-8.3.1-3.el7ost.noarch
python-openstackclient-lang-3.14.1-1.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.0-2.el7ost.noarch
openstack-nova-scheduler-17.0.3-0.20180420001141.el7ost.noarch
openstack-swift-object-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-neutron-common-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-neutron-ml2-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-heat-engine-10.0.1-0.20180411125640.el7ost.noarch
openstack-ironic-common-10.1.2-4.el7ost.noarch
openstack-mistral-executor-6.0.2-1.el7ost.noarch
openstack-tripleo-heat-templates-8.0.2-43.el7ost.noarch
openstack-selinux-0.8.14-12.el7ost.noarch
python2-openstacksdk-0.11.3-1.el7ost.noarch
openstack-nova-api-17.0.3-0.20180420001141.el7ost.noarch
openstack-nova-compute-17.0.3-0.20180420001141.el7ost.noarch
openstack-keystone-13.0.1-0.20180420194847.7bd6454.el7ost.noarch
openstack-nova-common-17.0.3-0.20180420001141.el7ost.noarch
openstack-neutron-openvswitch-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-heat-api-cfn-10.0.1-0.20180411125640.el7ost.noarch
openstack-ironic-staging-drivers-0.9.0-4.el7ost.noarch
openstack-ironic-inspector-7.2.1-0.20180409163360.el7ost.noarch
openstack-mistral-engine-6.0.2-1.el7ost.noarch
puppet-openstack_extras-12.4.1-0.20180413042250.2634296.el7ost.noarch
openstack-swift-account-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-tripleo-common-8.6.1-23.el7ost.noarch
openstack-swift-proxy-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-ironic-conductor-10.1.2-4.el7ost.noarch
openstack-tripleo-validations-8.4.1-5.el7ost.noarch
openstack-nova-conductor-17.0.3-0.20180420001141.el7ost.noarch
openstack-tripleo-image-elements-8.0.1-1.el7ost.noarch
openstack-heat-common-10.0.1-0.20180411125640.el7ost.noarch
openstack-mistral-common-6.0.2-1.el7ost.noarch
[root@refarch-r220-02 ~]#

[root@refarch-r220-02 ~]# rpm -qa | grep -i ceph
puppet-ceph-2.5.0-1.el7ost.noarch
ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch
[root@refarch-r220-02 ~]#

How reproducible:

100% in my setup 

Steps to Reproduce:
1. Deploy overcloud

time openstack overcloud deploy \
   --templates /usr/share/openstack-tripleo-heat-templates \
   -r /home/stack/templates/roles_data.yaml \
   -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
   -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
   -e /home/stack/templates/overcloud_images.yaml \
   -e /home/stack/templates/network-environment.yaml \
   -e /home/stack/templates/global-config.yaml \
   -e /home/stack/templates/ceph-config.yaml > /tmp/overcloud.logs 2>&1


Actual results:

Overcloud stack creation is failing


Expected results:

overcloud stack creation must be successful

Additional info:

## Log - 1 : openstack overcloud deploy command ran 247 Minutes (4 hours)

(undercloud) [stack@refarch-r220-02 ~]$ time openstack overcloud deploy \
>    --templates /usr/share/openstack-tripleo-heat-templates \
>    -r /home/stack/templates/roles_data.yaml \
>    -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
>    -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
>    -e /home/stack/templates/overcloud_images.yaml \
>    -e /home/stack/templates/network-environment.yaml \
>    -e /home/stack/templates/global-config.yaml \
>    -e /home/stack/templates/ceph-config.yaml > /tmp/overcloud.logs 2>&1



real    247m19.386s
user    0m23.972s
sys     0m1.158s
(undercloud) [stack@refarch-r220-02 ~]$

## Log - 2 : openstack overcloud deploy output

2018-08-28 17:14:18Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.1]: CREATE_IN_PROGRESS  state changed
2018-08-28 17:14:19Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.2]: CREATE_IN_PROGRESS  state changed
2018-08-28 17:14:20Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.4]: CREATE_IN_PROGRESS  state changed
2018-08-28 17:15:0
Heat Stack create failed.
Heat Stack create failed.
4Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.4]: SIGNAL_IN_PROGRESS  Signal: deployment b5dcd557-6c14-4261-859d-aeae801c29b5 succeeded
2018-08-28 17:15:05Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.4]: CREATE_COMPLETE  state changed
2018-08-28 17:15:11Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.1]: SIGNAL_IN_PROGRESS  Signal: deployment a61c6110-1771-42f1-9a5d-bd06162c6e8a succeeded
2018-08-28 17:15:12Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.0]: SIGNAL_IN_PROGRESS  Signal: deployment ebbf92e1-1151-4249-bd7f-e20ee2d0640e succeeded
2018-08-28 17:15:13Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.0]: CREATE_COMPLETE  state changed
2018-08-28 17:15:13Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.3]: SIGNAL_IN_PROGRESS  Signal: deployment a63f1d2f-e9ef-4180-a97e-280d78440d60 succeeded
2018-08-28 17:15:13Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.2]: SIGNAL_IN_PROGRESS  Signal: deployment 69baae25-76fc-40d8-998e-611ffd4d1a7b succeeded
2018-08-28 17:15:13Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.1]: CREATE_COMPLETE  state changed
2018-08-28 17:15:13Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.3]: CREATE_COMPLETE  state changed
2018-08-28 17:15:14Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.2]: CREATE_COMPLETE  state changed
2018-08-28 17:15:14Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4]: CREATE_COMPLETE  Stack CREATE completed successfully
2018-08-28 17:15:14Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4]: CREATE_COMPLETE  state changed
2018-08-28 20:07:11Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  CREATE aborted (Task create from TemplateResource "AllNodesDeploySteps" Stack "overcloud" [404edbf5-6788-4255-95cd-68f0b1044d27] Timed out)
2018-08-28 20:07:11Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Stack CREATE cancelled
2018-08-28 20:07:11Z [overcloud]: CREATE_FAILED  Timed out
2018-08-28 20:07:12Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step4]: CREATE_FAILED  Stack CREATE cancelled
2018-08-28 20:07:12Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step4]: CREATE_FAILED  resources.ControllerDeployment_Step4: Stack CREATE cancelled
2018-08-28 20:07:12Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Resource CREATE failed: resources.ControllerDeployment_Step4: Stack CREATE cancelled

 Stack overcloud CREATE_FAILED

overcloud.AllNodesDeploySteps.ControllerDeployment_Step4:
  resource_type: OS::TripleO::DeploymentSteps
  physical_resource_id: 8bbfde33-eba0-4e1f-b656-8b2f4fd7daf5
  status: CREATE_FAILED
  status_reason: |
    resources.ControllerDeployment_Step4: Stack CREATE cancelled


## Log - 3 : openstack stack list & openstack stack resource list output


(undercloud) [stack@refarch-r220-02 ~]$ openstack stack list
+--------------------------------------+------------+----------------------------------+---------------+----------------------+--------------+
| ID                                   | Stack Name | Project                          | Stack Status  | Creation Time        | Updated Time |
+--------------------------------------+------------+----------------------------------+---------------+----------------------+--------------+
| 404edbf5-6788-4255-95cd-68f0b1044d27 | overcloud  | d4ae8726a690413cbc62ade7e5e70763 | CREATE_FAILED | 2018-08-28T16:07:11Z | None         |
+--------------------------------------+------------+----------------------------------+---------------+----------------------+--------------+
(undercloud) [stack@refarch-r220-02 ~]$

(undercloud) [stack@refarch-r220-02 ~]$ openstack stack resource list -n5 overcloud | grep FAILED

| AllNodesDeploySteps                    | e446e19a-b580-48ef-bea8-cdebb69b1480                                                                                                                                                 | OS::TripleO::PostDeploySteps                                                                                                    | CREATE_FAILED   | 2018-08-28T16:07:14Z | overcloud                                                                                                                                                |
| ControllerDeployment_Step4             | 8bbfde33-eba0-4e1f-b656-8b2f4fd7daf5                                                                                                                                                 | OS::TripleO::DeploymentSteps                                                                                                    | CREATE_FAILED   | 2018-08-28T16:33:29Z | overcloud-AllNodesDeploySteps-rawlxuoiltb7                                                                                                               |
| 0                                      | 3aaa2d17-aac9-4cd1-a8cd-5408ef530a93                                                                                                                                                 | OS::Heat::StructuredDeployment                                                                                                  | CREATE_FAILED   | 2018-08-28T17:14:15Z | overcloud-AllNodesDeploySteps-rawlxuoiltb7-ControllerDeployment_Step4-jjzyhon333i4                                                                       |
(undercloud) [stack@refarch-r220-02 ~]$


## Log - 4 :  stack-list --show-nested & openstack stack resource list output


(undercloud) [stack@refarch-r220-02 ~]$ heat stack-list --show-nested -f "status=FAILED"

WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+--------------------------------------+------------------------------------------------------------------------------------+---------------+----------------------+--------------+--------------------------------------+----------------------------------+
| id                                   | stack_name                                                                         | stack_status  | creation_time        | updated_time | parent                               | project                          |
+--------------------------------------+------------------------------------------------------------------------------------+---------------+----------------------+--------------+--------------------------------------+----------------------------------+
| 404edbf5-6788-4255-95cd-68f0b1044d27 | overcloud                                                                          | CREATE_FAILED | 2018-08-28T16:07:11Z | None         | None                                 | d4ae8726a690413cbc62ade7e5e70763 |
| e446e19a-b580-48ef-bea8-cdebb69b1480 | overcloud-AllNodesDeploySteps-rawlxuoiltb7                                         | CREATE_FAILED | 2018-08-28T16:33:28Z | None         | 404edbf5-6788-4255-95cd-68f0b1044d27 | d4ae8726a690413cbc62ade7e5e70763 |
| 8bbfde33-eba0-4e1f-b656-8b2f4fd7daf5 | overcloud-AllNodesDeploySteps-rawlxuoiltb7-ControllerDeployment_Step4-jjzyhon333i4 | CREATE_FAILED | 2018-08-28T17:14:14Z | None         | e446e19a-b580-48ef-bea8-cdebb69b1480 | d4ae8726a690413cbc62ade7e5e70763 |
+--------------------------------------+------------------------------------------------------------------------------------+---------------+----------------------+--------------+--------------------------------------+----------------------------------+
(undercloud) [stack@refarch-r220-02 ~]$



(undercloud) [stack@refarch-r220-02 ~]$ openstack stack resource list overcloud
+----------------------------------------+--------------------------------------------------------+--------------------------------------------------+-----------------+----------------------+
| resource_name                          | physical_resource_id                                   | resource_type                                    | resource_status | updated_time         |
+----------------------------------------+--------------------------------------------------------+--------------------------------------------------+-----------------+----------------------+
| ControllerServers                      | overcloud-ControllerServers-mbo7qqn65mjg               | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:15Z |
| UpdateWorkflow                         | 93891cf0-be2c-462c-b0e6-1fa557458038                   | OS::TripleO::Tasks::UpdateWorkflow               | CREATE_COMPLETE | 2018-08-28T16:07:14Z |
| ComputeHCI                             | 1e9689bb-7f61-4c4a-9910-d1fc7bd49b97                   | OS::Heat::ResourceGroup                          | CREATE_COMPLETE | 2018-08-28T16:07:16Z |
| StorageVirtualIP                       | f813b501-c35d-4eeb-b1ea-258182cfd0ed                   | OS::TripleO::Network::Ports::StorageVipPort      | CREATE_COMPLETE | 2018-08-28T16:07:19Z |
| ControllerServiceChain                 | 49654455-3641-4d0a-aea5-76d51fc5edfd                   | OS::TripleO::Services                            | CREATE_COMPLETE | 2018-08-28T16:07:17Z |
| ComputeHCIAllNodesValidationDeployment | a27d16e9-6458-446e-8add-7c81f7463d30                   | OS::Heat::StructuredDeployments                  | CREATE_COMPLETE | 2018-08-28T16:07:14Z |
| ServerIdMap                            | overcloud-ServerIdMap-2rjhsgt2txn4                     | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:14Z |
| PcsdPassword                           | overcloud-PcsdPassword-sxf5ktf2chz2                    | OS::TripleO::RandomString                        | CREATE_COMPLETE | 2018-08-28T16:07:17Z |
| hostsConfig                            | 4e46b143-e189-428e-8192-1a5c2d4e6577                   | OS::TripleO::Hosts::SoftwareConfig               | CREATE_COMPLETE | 2018-08-28T16:07:16Z |
| ServerOsCollectConfigData              | overcloud-ServerOsCollectConfigData-dbiinxmr7asc       | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:14Z |
| AllNodesExtraConfig                    | 7d44cff7-b5f5-4c83-8bdd-07e690b66b72                   | OS::TripleO::AllNodesExtraConfig                 | CREATE_COMPLETE | 2018-08-28T16:07:14Z |
| allNodesConfig                         | 94db604b-8fc1-4012-803c-4d8913025c6d                   | OS::TripleO::AllNodes::SoftwareConfig            | CREATE_COMPLETE | 2018-08-28T16:07:15Z |
| ControllerServiceNames                 | overcloud-ControllerServiceNames-ewnpbi7x5hv3          | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:16Z |
| ComputeHCIMergedConfigSettings         | overcloud-ComputeHCIMergedConfigSettings-satfbs3ugqiz  | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:14Z |
| ControllerIpListMap                    | a4631385-fd49-49ad-aefe-9b3bb3bed31e                   | OS::TripleO::Network::Ports::NetIpListMap        | CREATE_COMPLETE | 2018-08-28T16:07:15Z |
| RedisVirtualIP                         | b469a70b-4aec-4dc1-9912-534218d27e88                   | OS::TripleO::Network::Ports::RedisVipPort        | CREATE_COMPLETE | 2018-08-28T16:07:15Z |
| ControllerSshKnownHostsDeployment      | 322091cd-f558-41f6-9d7a-f241e4554edd                   | OS::Heat::StructuredDeployments                  | CREATE_COMPLETE | 2018-08-28T16:07:15Z |
| NetCidrMapValue                        | overcloud-NetCidrMapValue-6hphspcljofv                 | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:17Z |
| SshKnownHostsConfig                    | 0069ed5c-1253-42c4-ade7-b7b1b976bf56                   | OS::TripleO::Ssh::KnownHostsConfig               | CREATE_COMPLETE | 2018-08-28T16:07:15Z |
| DefaultPasswords                       | 5fd79a85-27e0-4354-82d2-2abc1907f565                   | OS::TripleO::DefaultPasswords                    | CREATE_COMPLETE | 2018-08-28T16:07:17Z |
| BlacklistedHostnames                   | overcloud-BlacklistedHostnames-hxjbfsz3cjoq            | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:14Z |
| ControllerAllNodesValidationDeployment | 2498d999-0ad4-4e9e-89b4-b1d412c1ce65                   | OS::Heat::StructuredDeployments                  | CREATE_COMPLETE | 2018-08-28T16:07:14Z |
| VipMap                                 | ec52e32c-605a-4736-b9fc-f411e2c16f86                   | OS::TripleO::Network::Ports::NetVipMap           | CREATE_COMPLETE | 2018-08-28T16:07:18Z |
| ServiceNetMap                          | cfb17c59-6a86-48ae-9ed8-15c66922cfa3                   | OS::TripleO::ServiceNetMap                       | CREATE_COMPLETE | 2018-08-28T16:07:18Z |
| RabbitCookie                           | overcloud-RabbitCookie-ke2fu7othjj6                    | OS::TripleO::RandomString                        | CREATE_COMPLETE | 2018-08-28T16:07:17Z |
| ControlVirtualIP                       | 09805e65-4943-471c-aae9-9830727a9156                   | OS::TripleO::Network::Ports::ControlPlaneVipPort | CREATE_COMPLETE | 2018-08-28T16:07:19Z |
| EndpointMapData                        | overcloud-EndpointMapData-wirieu5zv5wy                 | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:17Z |
| Controller                             | 887a4b0b-c913-48c0-9111-40f83a9f4169                   | OS::Heat::ResourceGroup                          | CREATE_COMPLETE | 2018-08-28T16:07:16Z |
| ControllerServiceConfigSettings        | overcloud-ControllerServiceConfigSettings-sw5yalbnrxof | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:16Z |
| MysqlRootPassword                      | overcloud-MysqlRootPassword-nohdczm2jb2z               | OS::TripleO::RandomString                        | CREATE_COMPLETE | 2018-08-28T16:07:17Z |
| ControllerServiceChainRoleData         | overcloud-ControllerServiceChainRoleData-aa2lsqlzx44x  | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:17Z |
| DeployedServerEnvironment              | 5b356dd7-a7c1-47ef-90b6-017645a20468                   | OS::TripleO::DeployedServerEnvironment           | CREATE_COMPLETE | 2018-08-28T16:07:14Z |
| ComputeHCIServiceChainRoleData         | overcloud-ComputeHCIServiceChainRoleData-e3ltkegmzz7r  | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:17Z |
| AllNodesDeploySteps                    | e446e19a-b580-48ef-bea8-cdebb69b1480                   | OS::TripleO::PostDeploySteps                     | CREATE_FAILED   | 2018-08-28T16:07:14Z |
| ComputeHCISshKnownHostsDeployment      | 44d662d5-93ec-469f-a758-4f65165da48d                   | OS::Heat::StructuredDeployments                  | CREATE_COMPLETE | 2018-08-28T16:07:15Z |
| AllNodesValidationConfig               | 3f5bb8d4-f6ec-449d-8498-e3dbf6610ef4                   | OS::TripleO::AllNodes::Validation                | CREATE_COMPLETE | 2018-08-28T16:07:16Z |
| InternalApiVirtualIP                   | c6e811f7-37e3-44dc-b334-b0a06bb43be7                   | OS::TripleO::Network::Ports::InternalApiVipPort  | CREATE_COMPLETE | 2018-08-28T16:07:18Z |
| ComputeHCIHostsDeployment              | aa3ee2da-7f89-4800-9c67-09d416ae422c                   | OS::Heat::StructuredDeployments                  | CREATE_COMPLETE | 2018-08-28T16:07:15Z |
| ControllerAllNodesDeployment           | d2feca33-0a4d-4c14-9aae-a84b1ea4e0f4                   | OS::TripleO::AllNodesDeployment                  | CREATE_COMPLETE | 2018-08-28T16:07:15Z |
| HeatAuthEncryptionKey                  | overcloud-HeatAuthEncryptionKey-dx3ijqv3fegf           | OS::TripleO::RandomString                        | CREATE_COMPLETE | 2018-08-28T16:07:17Z |
| ComputeHCIServiceChain                 | cc16cd91-5c0b-4012-9924-eb218334d7a5                   | OS::TripleO::Services                            | CREATE_COMPLETE | 2018-08-28T16:07:17Z |
| EndpointMap                            | 81ed4610-732b-474a-b355-1f8f7d841d2f                   | OS::TripleO::EndpointMap                         | CREATE_COMPLETE | 2018-08-28T16:07:17Z |
| ControllerHostsDeployment              | 758d948b-31b9-4fd0-b668-ca2cbca3bc90                   | OS::Heat::StructuredDeployments                  | CREATE_COMPLETE | 2018-08-28T16:07:15Z |
| ControllerMergedConfigSettings         | overcloud-ControllerMergedConfigSettings-yc3lit4rmlex  | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:16Z |
| ComputeHCIServiceNames                 | overcloud-ComputeHCIServiceNames-wgvcb36tmzfo          | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:17Z |
| BlacklistedIpAddresses                 | overcloud-BlacklistedIpAddresses-5ogele4nytis          | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:16Z |
| ComputeHCIServers                      | overcloud-ComputeHCIServers-tswrhwpnjnol               | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:15Z |
| DeploymentServerBlacklistDict          | overcloud-DeploymentServerBlacklistDict-ycuunj7xcmkh   | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:16Z |
| StorageMgmtVirtualIP                   | 3a5f7d31-ecce-45d1-b0bd-6f717ab1f503                   | OS::TripleO::Network::Ports::StorageMgmtVipPort  | CREATE_COMPLETE | 2018-08-28T16:07:18Z |
| ComputeHCIIpListMap                    | c959a285-c128-4e6e-b8ba-bf1b401cd1c1                   | OS::TripleO::Network::Ports::NetIpListMap        | CREATE_COMPLETE | 2018-08-28T16:07:16Z |
| HorizonSecret                          | overcloud-HorizonSecret-rrpmv25ewxuc                   | OS::TripleO::RandomString                        | CREATE_COMPLETE | 2018-08-28T16:07:17Z |
| ComputeHCIAllNodesDeployment           | 89905832-bff3-41ce-926d-dae27c5d7192                   | OS::TripleO::AllNodesDeployment                  | CREATE_COMPLETE | 2018-08-28T16:07:15Z |
| ComputeHCINetworkHostnameMap           | overcloud-ComputeHCINetworkHostnameMap-mgveiiwu7wv2    | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:16Z |
| VipHosts                               | overcloud-VipHosts-2ovf5fkuncwm                        | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:16Z |
| ComputeHCIServiceConfigSettings        | overcloud-ComputeHCIServiceConfigSettings-trirhpiid7yc | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:16Z |
| PublicVirtualIP                        | 6ed28369-cfa1-4c6b-834b-c829573139a7                   | OS::TripleO::Network::Ports::ExternalVipPort     | CREATE_COMPLETE | 2018-08-28T16:07:19Z |
| Networks                               | d9bbce20-96b8-4f85-b20a-6760bb816ee1                   | OS::TripleO::Network                             | CREATE_COMPLETE | 2018-08-28T16:07:19Z |
| ControllerNetworkHostnameMap           | overcloud-ControllerNetworkHostnameMap-iuky4ljobrem    | OS::Heat::Value                                  | CREATE_COMPLETE | 2018-08-28T16:07:16Z |
+----------------------------------------+--------------------------------------------------------+--------------------------------------------------+-----------------+----------------------+
(undercloud) [stack@refarch-r220-02 ~]$


(undercloud) [stack@refarch-r220-02 ~]$ openstack stack failures list --long overcloud

overcloud.AllNodesDeploySteps.ControllerDeployment_Step4.0:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: 3aaa2d17-aac9-4cd1-a8cd-5408ef530a93
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted (Task create from StructuredDeployment "0" Stack "overcloud-AllNodesDeploySteps-rawlxuoiltb7-ControllerDeployment_Step4-jjzyhon333i4" [8bbfde33-eba0-4e1f-b656-8b2f4fd7daf5] Timed out)
  deploy_stdout: |
None
  deploy_stderr: |
None
(undercloud) [stack@refarch-r220-02 ~]$

## Log - 5 :  openstack action execution list shows ceph-install tasks are still running but in reality openstack overcloud deploy command has failed.

(undercloud) [stack@refarch-r220-02 ~]$ openstack action execution list | grep -v SUCCESS
+--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+
| ID                                   | Name                            | Workflow name                           | Workflow namespace | Task name                | Task ID                              | State   | Accepted | Created at          | Updated at          |
+--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+
| 35d57a63-08be-4a19-acd1-4014f9b2a211 | tripleo.ansible-playbook        | tripleo.storage.v1.ceph-install         |                    | ceph_install             | 8051cad9-32ec-45d5-ad6e-88ccd64f6f6a | RUNNING | False    | 2018-08-25 21:29:04 | <none>              |
| 7085d5fa-a1ac-4c9a-92c4-93bacdefafae | tripleo.ansible-playbook        | tripleo.storage.v1.ceph-install         |                    | ceph_install             | 0faf2620-748c-4de2-a0b6-0a0648864a89 | RUNNING | False    | 2018-08-26 05:31:27 | <none>              |
+--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+
(undercloud) [stack@refarch-r220-02 ~]$


## Log - 6 : /var/log/mistral/ceph-workflow.log shows ceph deployment was successfull

2018-08-28 12:54:33,055 p=15144 u=mistral |  TASK [set ceph client install 'Complete'] **************************************
2018-08-28 12:54:33,055 p=15144 u=mistral |  Tuesday 28 August 2018  12:54:33 -0400 (0:00:03.001)       0:09:17.184 ********
2018-08-28 12:54:33,202 p=15144 u=mistral |  ok: [192.168.120.7]
2018-08-28 12:54:33,208 p=15144 u=mistral |  PLAY RECAP *********************************************************************
2018-08-28 12:54:33,208 p=15144 u=mistral |  192.168.120.12             : ok=114  changed=20   unreachable=0    failed=0
2018-08-28 12:54:33,208 p=15144 u=mistral |  192.168.120.13             : ok=107  changed=15   unreachable=0    failed=0
2018-08-28 12:54:33,208 p=15144 u=mistral |  192.168.120.14             : ok=107  changed=15   unreachable=0    failed=0
2018-08-28 12:54:33,208 p=15144 u=mistral |  192.168.120.18             : ok=123  changed=22   unreachable=0    failed=0
2018-08-28 12:54:33,208 p=15144 u=mistral |  192.168.120.6              : ok=107  changed=15   unreachable=0    failed=0
2018-08-28 12:54:33,208 p=15144 u=mistral |  192.168.120.7              : ok=138  changed=19   unreachable=0    failed=0
2018-08-28 12:54:33,209 p=15144 u=mistral |  INSTALLER STATUS ***************************************************************
2018-08-28 12:54:33,228 p=15144 u=mistral |  Install Ceph Monitor        : Complete (0:01:24)
2018-08-28 12:54:33,228 p=15144 u=mistral |  Install Ceph Manager        : Complete (0:00:37)
2018-08-28 12:54:33,228 p=15144 u=mistral |  Install Ceph OSD            : Complete (0:06:02)
2018-08-28 12:54:33,228 p=15144 u=mistral |  Install Ceph Client         : Complete (0:00:54)
2018-08-28 12:54:33,229 p=15144 u=mistral |  Tuesday 28 August 2018  12:54:33 -0400 (0:00:00.173)       0:09:17.358 ********
2018-08-28 12:54:33,229 p=15144 u=mistral |  ===============================================================================


(undercloud) [stack@refarch-r220-02 ~]$ ssh heat-admin.120.16 ceph -s
  cluster:
    id:     214c329a-a79d-11e8-916e-2047478ccfaa
    health: HEALTH_WARN
            too few PGs per OSD (8 < min 30)

  services:
    mon: 1 daemons, quorum controller-0
    mgr: controller-0(active)
    osd: 60 osds: 60 up, 60 in

  data:
    pools:   5 pools, 160 pgs
    objects: 0 objects, 0 bytes
    usage:   6524 MB used, 218 TB / 218 TB avail
    pgs:     160 active+clean

(undercloud) [stack@refarch-r220-02 ~]$

Comment 1 karan singh 2018-08-29 10:15:04 UTC
For better viewing same logs are here : https://pastebin.com/raw/SkGknKjX

Comment 2 karan singh 2018-08-29 10:18:12 UTC
If you pay attention to the below output, only ceph-install task is getting stuck (running endlessly, causing the main openstack overcloud deploy command to run for 4 hours and never finish)


(undercloud) [stack@refarch-r220-02 ~]$ openstack action execution list | grep -v SUCCESS
+--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+
| ID                                   | Name                            | Workflow name                           | Workflow namespace | Task name                | Task ID                              | State   | Accepted | Created at          | Updated at          |
+--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+
| 35d57a63-08be-4a19-acd1-4014f9b2a211 | tripleo.ansible-playbook        | tripleo.storage.v1.ceph-install         |                    | ceph_install             | 8051cad9-32ec-45d5-ad6e-88ccd64f6f6a | RUNNING | False    | 2018-08-25 21:29:04 | <none>              |
| 7085d5fa-a1ac-4c9a-92c4-93bacdefafae | tripleo.ansible-playbook        | tripleo.storage.v1.ceph-install         |                    | ceph_install             | 0faf2620-748c-4de2-a0b6-0a0648864a89 | RUNNING | False    | 2018-08-26 05:31:27 | <none>              |
+--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+
(undercloud) [stack@refarch-r220-02 ~]$

Comment 3 karan singh 2018-08-29 10:22:43 UTC
@John Fulton : Do you think this issues is somewhat related to the one you helped in fixing https://bugzilla.redhat.com/show_bug.cgi?id=1619263#c6 ??

FYI, i never managed to get overcloud deployed successfully even after changes mentioned in BZ1619263

Comment 4 John Fulton 2018-08-29 14:18:03 UTC
I think those to stuck workflows are left overs from when you ran into 1619263 so I'd rather not conflate them with this bug. If they were an issue you wouldn't have gotten to step 4. Feel free to set the status of those two workflows to ERROR: 

 mistral execution-update -s ERROR <ID>


(In reply to karan singh from comment #2)
> If you pay attention to the below output, only ceph-install task is getting
> stuck (running endlessly, causing the main openstack overcloud deploy
> command to run for 4 hours and never finish)
> 
> 
> (undercloud) [stack@refarch-r220-02 ~]$ openstack action execution list |
> grep -v SUCCESS
> +--------------------------------------+---------------------------------+---
> --------------------------------------+--------------------+-----------------
> ---------+--------------------------------------+---------+----------+-------
> --------------+---------------------+
> | ID                                   | Name                            |
> Workflow name                           | Workflow namespace | Task name    
> | Task ID                              | State   | Accepted | Created at    
> | Updated at          |
> +--------------------------------------+---------------------------------+---
> --------------------------------------+--------------------+-----------------
> ---------+--------------------------------------+---------+----------+-------
> --------------+---------------------+
> | 35d57a63-08be-4a19-acd1-4014f9b2a211 | tripleo.ansible-playbook        |
> tripleo.storage.v1.ceph-install         |                    | ceph_install 
> | 8051cad9-32ec-45d5-ad6e-88ccd64f6f6a | RUNNING | False    | 2018-08-25
> 21:29:04 | <none>              |
> | 7085d5fa-a1ac-4c9a-92c4-93bacdefafae | tripleo.ansible-playbook        |
> tripleo.storage.v1.ceph-install         |                    | ceph_install 
> | 0faf2620-748c-4de2-a0b6-0a0648864a89 | RUNNING | False    | 2018-08-26
> 05:31:27 | <none>              |
> +--------------------------------------+---------------------------------+---
> --------------------------------------+--------------------+-----------------
> ---------+--------------------------------------+---------+----------+-------
> --------------+---------------------+
> (undercloud) [stack@refarch-r220-02 ~]$

Comment 5 karan singh 2018-08-29 16:17:59 UTC
JohnF graciously provided another pointer to deploy without telemetry.

So I am now performing a clean deployment by disabling telemetry [1]

[1] https://raw.githubusercontent.com/openstack/tripleo-heat-templates/master/environments/disable-telemetry.yaml

Comment 6 karan singh 2018-08-29 20:09:14 UTC
In a fresh overcloud deployment, after disabling telemetry, stack creation was successful


----------
2018-08-29 17:04:04Z [overcloud]: CREATE_COMPLETE  Stack CREATE completed successfully

 Stack overcloud CREATE_COMPLETE

Started Mistral Workflow tripleo.deployment.v1.get_horizon_url. Execution ID: 9558215f-d077-4828-b1a5-871fc8cbcb3f
Overcloud Endpoint: http://172.21.1.159:5000/
Overcloud Horizon Dashboard URL: http://172.21.1.159:80/dashboard
Overcloud rc file: /home/stack/overcloudrc
Overcloud Deployed
(undercloud) [stack@refarch-r220-02 ~]$ openstack stack list
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+
| ID                                   | Stack Name | Project                          | Stack Status    | Creation Time        | Updated Time |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+
| a936509f-8df6-44e5-9dc6-ebd85915d14c | overcloud  | d4ae8726a690413cbc62ade7e5e70763 | CREATE_COMPLETE | 2018-08-29T15:56:25Z | None         |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+
(undercloud) [stack@refarch-r220-02 ~]$

----------



However here are still workflow tasks for ceph_install which are still RUNNING (stuck in RUNNING state)



(undercloud) [stack@refarch-r220-02 ~]$ openstack action execution list | grep -v SUCCESS
+--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+
| ID                                   | Name                            | Workflow name                           | Workflow namespace | Task name                | Task ID                              | State   | Accepted | Created at          | Updated at          |
+--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+
| 35d57a63-08be-4a19-acd1-4014f9b2a211 | tripleo.ansible-playbook        | tripleo.storage.v1.ceph-install         |                    | ceph_install             | 8051cad9-32ec-45d5-ad6e-88ccd64f6f6a | RUNNING | False    | 2018-08-25 21:29:04 | <none>              |
| 7085d5fa-a1ac-4c9a-92c4-93bacdefafae | tripleo.ansible-playbook        | tripleo.storage.v1.ceph-install         |                    | ceph_install             | 0faf2620-748c-4de2-a0b6-0a0648864a89 | RUNNING | False    | 2018-08-26 05:31:27 | <none>              |
+--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+
(undercloud) [stack@refarch-r220-02 ~]$


(undercloud) [stack@refarch-r220-02 ~]$ openstack workflow execution list | grep -v SUCCESS
+--------------------------------------+--------------------------------------+------------------------------------------------------------------------+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+---------+------------+---------------------+---------------------+
| ID                                   | Workflow ID                          | Workflow name                                                          | Workflow namespace | Description                                                                                                                                                                                                                       | Task Execution ID                    | State   | State info | Created at          | Updated at          |
+--------------------------------------+--------------------------------------+------------------------------------------------------------------------+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+---------+------------+---------------------+---------------------+
| 2a21b6ca-18e6-4b02-acca-d689802d9458 | 7383b3f7-3923-489d-8f92-b8fa9306ed01 | tripleo.overcloud.workflow_tasks.step2                                 |                    | Heat managed                                                                                                                                                                                                                      | <none>                               | RUNNING | None       | 2018-08-25 21:27:54 | 2018-08-25 21:27:55 |
| 1bfef3df-d437-48f9-a5c4-fc306bb63cdd | bd484c0e-a8bf-4c5f-abbd-cdd96483affa | tripleo.storage.v1.ceph-install                                        |                    | sub-workflow execution                                                                                                                                                                                                            | 7f2d690c-75de-4221-819d-b8661870d94e | RUNNING | None       | 2018-08-25 21:27:55 | 2018-08-25 21:27:55 |
| cd5f6ac9-5c06-4bb3-8bd6-fb9702e7fbac | 7383b3f7-3923-489d-8f92-b8fa9306ed01 | tripleo.overcloud.workflow_tasks.step2                                 |                    | Heat managed                                                                                                                                                                                                                      | <none>                               | RUNNING | None       | 2018-08-26 05:30:27 | 2018-08-26 05:30:27 |
| ec24d232-54b2-410d-9825-04754d7e1e09 | bd484c0e-a8bf-4c5f-abbd-cdd96483affa | tripleo.storage.v1.ceph-install                                        |                    | sub-workflow execution                                                                                                                                                                                                            | 655d4562-88d7-47ac-833b-763b511fda93 | RUNNING | None       | 2018-08-26 05:30:27 | 2018-08-26 05:30:27 |
| 728a21a7-69a2-403a-995b-08ca7e5cf28b | bd484c0e-a8bf-4c5f-abbd-cdd96483affa | tripleo.storage.v1.ceph-install                                        |                    | sub-workflow execution                                                                                                                                                                                                            | 62c6c1eb-87ff-4f58-8244-5783f0979fe7 | RUNNING | None       | 2018-08-26 09:11:24 | 2018-08-26 09:11:24 |
| 76c9efff-bac5-415a-8988-bc0c21bc2f0a | 02b8310c-45ba-4c89-8989-2d63acb6e030 | tripleo.overcloud.workflow_tasks.step2                                 |                    | Heat managed                                                                                                                                                                                                                      | <none>                               | RUNNING | None       | 2018-08-26 09:11:24 | 2018-08-26 09:11:24 |
+--------------------------------------+--------------------------------------+------------------------------------------------------------------------+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+---------+------------+---------------------+---------------------+
(undercloud) [stack@refarch-r220-02 ~]$



This is causing issues with Glance, Cinder and Nova. I am not able to 
- Create Glance Image (image creation stuck in saving)
- Cinder volume creation throws error
- Volume service is down
- nova-compute container is unhealthy 



(overcloud) [stack@refarch-r220-02 tmp]$ openstack volume service list
+------------------+------------------------+------+---------+-------+----------------------------+
| Binary           | Host                   | Zone | Status  | State | Updated At                 |
+------------------+------------------------+------+---------+-------+----------------------------+
| cinder-scheduler | controller-0           | nova | enabled | up    | 2018-08-29T19:20:18.000000 |
| cinder-volume    | hostgroup@tripleo_ceph | nova | enabled | down  | 2018-08-29T17:04:05.000000 |
+------------------+------------------------+------+---------+-------+----------------------------+
(overcloud) [stack@refarch-r220-02 tmp]$



[root@osd-compute-0 nova]# docker ps | grep -i nova
5e667b10c658        192.168.120.1:8787/rhosp13/openstack-nova-compute:latest                "kolla_start"       2 hours ago         Up 2 hours                                   nova_migration_target
67f09d1d66bb        192.168.120.1:8787/rhosp13/openstack-nova-compute:latest                "kolla_start"       2 hours ago         Up 2 hours (unhealthy)                       nova_compute
fbbe60c4bc17        192.168.120.1:8787/rhosp13/openstack-nova-libvirt:latest                "kolla_start"       2 hours ago         Up 2 hours                                   nova_libvirt
bd5712d3f206        192.168.120.1:8787/rhosp13/openstack-nova-libvirt:latest                "kolla_start"       2 hours ago         Up 2 hours                                   nova_virtlogd
[root@osd-compute-0 nova]#


nova-compute container log
===========================

+ echo 'Running command: '\''/usr/bin/nova-compute '\'''
+ exec /usr/bin/nova-compute
Running command: '/usr/bin/nova-compute '
/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:332: NotSupportedWarning: Configuration option(s) ['use_tpool'] not supported
  exception.NotSupportedWarning
[root@osd-compute-0 nova]#

Comment 9 Giulio Fidente 2018-09-03 16:13:16 UTC
Heat won't move past Step2 if anything in that step remains stuck ... in addition to the ceph-ansible logs reporting success, the failures appear to be happening in Step4; this is definitely unrelated to the ceph-ansible workflow.

The issue with the client.openstack keyring instead (seen when using disable-telemetry env file) is tracked by BZ#1613474. Closing as duplicate.

*** This bug has been marked as a duplicate of bug 1613474 ***

Comment 10 karan singh 2018-09-04 05:23:18 UTC
@Giulio Fidente 

This issue which I have reported not only happens when overcloud is deployed using disable-telemetry env file. It occurs when deployed together with telemetry. So it's not a duplicate of BZ#1613474, we tried disable-telemetry env file to test if its a workaround for a functional overcloud.

Its a reproducible issue and I can provide access to the environment if anyone is interested to look it.

Comment 12 karan singh 2018-09-04 11:31:54 UTC
Thanks for understanding Giulio. Yes the environment is available for us to debug together. I will ping you on IRC.

Comment 13 John Fulton 2018-09-05 13:31:44 UTC
Changing subject as I don't think this has anything to do with previously running tasks.

Comment 14 John Fulton 2018-09-05 17:48:14 UTC
WORKAROUND: 

ceph auth get client.openstack > /etc/ceph/ceph.client.openstack.keyring
ceph auth del client.openstack
ceph auth import -i /etc/ceph/ceph.client.openstack.keyring
pcs resource restart openstack-cinder-volume

After running the above as root on the controller node, cinder and glance started working. So this seems to be caused by an corrupted openstack keyring. Variations of the above were also attempted, but it wasn't until the old key was deleted before re-importing that the following command used to troubleshoot started working:

rbd --keyring=/etc/ceph/ceph.client.openstack.keyring --id openstack -p images ls

E.g. we created a new keyring (ceph.client.john.keyring) and it worked right away and we restarted the monitor container but it made no difference for getting the openstack keyring. I suspect that some copy of the keyring inside of Ceph was corrupted and I had to delete it to ensure it was cleaned. 

The unresolved matter is WHY was the keyring corrupted. If you can reproduce this with a fresh deploy, then the next step would be to get someone better versed in ceph key permissions to help. I don't think it's anything to do with the way tripleo asks ceph-ansible to create the keys because I'm unable to reproduce this issue in my environment and neither is CI. That's why I think, if this continues, that someone better at Ceph internals should identify the root cause. Unless this is some environmental issue that goes away when you attempt to redeploy. Recall that you had run into a few other issues earlier including 1613474 so this is the first single deployment we've done ever since 1613474 was worked around.

Comment 15 karan singh 2018-09-06 07:32:21 UTC
Hi John

Really appreciate your help in troubleshooting this and finding a workaround. I confirm that cinder,glance and nova are now working with Ceph.

Agree, "The unresolved matter is WHY was the keyring corrupted". 

In response To your comment >> "Unless this is some environmental issue that goes away when you attempt to redeploy."

Do you remember on Friday 31st Aug, you have deleted my overcloud stack and redeployed it and the problem still existed. My gut feeling is that if we destroy this stack and redeploy it again, this problem will re-appear (at least in my environment). I want to help you guys fix this issue once for all, so let me know when should we attempt to reproduce this and involve someone from Ceph engineering to take a look.

Comment 16 karan singh 2018-09-06 08:31:20 UTC
Based on John's initial work, i did some series of tests, which shows single quotes (' ') in ceph auth capabilities is causing this problem. If i remove single quotes (manually) from ceph capabilities stored in Ceph. Things started to work.


Now i am wondering if tripleo is adding these single quotes while creating ceph users ? OR ceph itself has dropped support for single quotes ?? which was previously there (not sure)

In my environment we could probably change this behaviour in tripleo and try to redeploy the cluster to see if this problem gets fixed. 

Thoughts ??



## Ceph auth list output for client.openstack before JohnF's workaround, when cinder, glance and Nova was not working.

client.openstack
	key: AQCCAIBbAAAAABAABuJqVYY56PbuQ6KTu8C/Fg==
	caps: [mds] ''
	caps: [mgr] 'allow *'
	caps: [mon] 'profile rbd'
	caps: [osd] 'profile rbd pool=volumes, profile rbd pool=backups, profile rbd pool=vms, profile rbd pool=images, profile rbd pool=metrics'


## Ceph auth list output for client.openstack after JohnF deleted and imported client.openstack user

client.openstack
	key: AQCCAIBbAAAAABAABuJqVYY56PbuQ6KTu8C/Fg==
	caps: [mds]
	caps: [mgr] allow *
	caps: [mon] profile rbd
	caps: [osd] profile rbd pool=volumes, profile rbd pool=backups, profile rbd pool=vms, profile rbd pool=images, profile rbd pool=metrics


If you compare these 2 outputs, you could see that when capabilities are in single quotes (' ') , openstack services does not work. When JohnF deleted and re-imported client.openstack user these single quotes got removed and openstack services started to work.

## After workaround, client.openstack.keyring works fine

[heat-admin@controller-0 ~]$ ceph --keyring=/etc/ceph/ceph.client.openstack.keyring --id openstack -s
  cluster:
    id:     214c329a-a79d-11e8-916e-2047478ccfaa
    health: HEALTH_WARN
            application not enabled on 2 pool(s)

  services:
    mon: 1 daemons, quorum controller-0
    mgr: controller-0(active)
    osd: 60 osds: 60 up, 60 in

  data:
    pools:   5 pools, 1120 pgs
    objects: 1318 objects, 1731 MB
    usage:   7945 MB used, 218 TB / 218 TB avail
    pgs:     1120 active+clean


## If i try manila or radosgw keys, they don't work because in ceph auth list , they also have single quotes (' ') in capabilities

[heat-admin@controller-0 ceph]$ ceph --keyring=/etc/ceph/ceph.client.manila.keyring --id manila -s
Error EACCES: access denied
[heat-admin@controller-0 ceph]$
[heat-admin@controller-0 ceph]$
[heat-admin@controller-0 ceph]$ ceph --keyring=/etc/ceph/ceph.client.radosgw.keyring --id radosgw -s
Error EACCES: access denied
[heat-admin@controller-0 ceph]$

$ sudo ceph auth list

client.john
	key: AQCnfI5btYWfORAADiH22WsDDkB5v0782g0C2w==
	caps: [mds] allow
	caps: [mon] allow *
	caps: [osd] allow *
client.manila
	key: AQCCAIBbAAAAABAA4kShATMpeQ/aVG4a64VR2Q==
	caps: [mds] 'allow *'
	caps: [mgr] 'allow *'
	caps: [mon] 'allow r, allow command "auth del", allow command "auth caps", allow command "auth get", allow command "auth get-or-create"'
	caps: [osd] 'allow rw'
client.openstack
	key: AQCCAIBbAAAAABAABuJqVYY56PbuQ6KTu8C/Fg==
	caps: [mds]
	caps: [mgr] allow *
	caps: [mon] profile rbd
	caps: [osd] profile rbd pool=volumes, profile rbd pool=backups, profile rbd pool=vms, profile rbd pool=images, profile rbd pool=metrics
client.radosgw
	key: AQCCAIBbAAAAABAAT4DElonMQQbiqSy8CfZyvg==
	caps: [mds] ''
	caps: [mgr] 'allow *'
	caps: [mon] 'allow rw'
	caps: [osd] 'allow rwx'
mgr.controller-0
	key: AQC9eolbZlf1MxAAy+dBzzXJ1odg8C1Wh+4r7w==
	caps: [mds] allow *
	caps: [mon] allow profile mgr
	caps: [osd] allow *

## To prove this theory, lets remove single quotes (' ') from ceph capabilities 

[heat-admin@controller-0 tmp]$ ceph auth get client.radosgw > ceph.client.radosgw.keyring
exported keyring for client.radosgw
[heat-admin@controller-0 tmp]$ cat ceph.client.radosgw.keyring
[client.radosgw]
	key = AQCCAIBbAAAAABAAT4DElonMQQbiqSy8CfZyvg==
	caps mds = "''"
	caps mgr = "'allow *'"
	caps mon = "'allow rw'"
	caps osd = "'allow rwx'"
[heat-admin@controller-0 tmp]$

## You can see, there are single quotes. So lets edit ceph.client.radosgw.keyring and removed single quotes (' ')


[heat-admin@controller-0 tmp]$ cat ceph.client.radosgw.keyring
[client.radosgw]
	key = AQCCAIBbAAAAABAAT4DElonMQQbiqSy8CfZyvg==
	caps mds = ""
	caps mgr = "allow *"
	caps mon = "allow rw"
	caps osd = "allow rwx"
[heat-admin@controller-0 tmp]$


## Deleted client.radosgw user from ceph and reimported using the modified key without single quotes

sudo ceph auth del client.radosgw
sudo ceph auth import -i /tmp/ceph.client.radosgw.keyring

sudo ceph auth list

## Single quotes removed

client.radosgw
	key: AQCCAIBbAAAAABAAT4DElonMQQbiqSy8CfZyvg==
	caps: [mds]
	caps: [mgr] allow *
	caps: [mon] allow rw
	caps: [osd] allow rwx

## Able to run Ceph commands from client.radosgw user for which we have removed single quotes

[heat-admin@controller-0 ~]$  sudo ceph --keyring=/tmp/ceph.client.radosgw.keyring --id radosgw -s
  cluster:
    id:     214c329a-a79d-11e8-916e-2047478ccfaa
    health: HEALTH_WARN
            application not enabled on 2 pool(s)

  services:
    mon: 1 daemons, quorum controller-0
    mgr: controller-0(active)
    osd: 60 osds: 60 up, 60 in

  data:
    pools:   5 pools, 1120 pgs
    objects: 1318 objects, 1731 MB
    usage:   7945 MB used, 218 TB / 218 TB avail
    pgs:     1120 active+clean

[heat-admin@controller-0 ~]$


## Ceph commands still does not work with client.manila as in ceph auth list , manila capabilities are still using single quotes ('')


[heat-admin@controller-0 ~]$ sudo ceph --keyring=/etc/ceph/ceph.client.manila.keyring --id manila -s
Error EACCES: access denied
[heat-admin@controller-0 ~]$

$ sudo ceph auth list


client.manila
	key: AQCCAIBbAAAAABAA4kShATMpeQ/aVG4a64VR2Q==
	caps: [mds] 'allow *'
	caps: [mgr] 'allow *'
	caps: [mon] 'allow r, allow command "auth del", allow command "auth caps", allow command "auth get", allow command "auth get-or-create"'
	caps: [osd] 'allow rw'
client.openstack
	key: AQCCAIBbAAAAABAABuJqVYY56PbuQ6KTu8C/Fg==
	caps: [mds]
	caps: [mgr] allow *
	caps: [mon] profile rbd
	caps: [osd] profile rbd pool=volumes, profile rbd pool=backups, profile rbd pool=vms, profile rbd pool=images, profile rbd pool=metrics
client.radosgw
	key: AQCCAIBbAAAAABAAT4DElonMQQbiqSy8CfZyvg==
	caps: [mds]
	caps: [mgr] allow *
	caps: [mon] allow rw
	caps: [osd] allow rwx

Comment 17 Giulio Fidente 2018-09-06 11:26:21 UTC
This might be caused by a change in ceph-ansible; I think we need to test how the version currently shipping in OSP behaves [1] and how the upcoming build [2] shipping with the Ceph 3.1 update behaves.

Are you able to test these version on the same environment and with the same parameters these two?

1. ceph-ansible-3.1.0-0.1.rc10
2. ceph-ansible 3.1.2

Should the 3.1.2 version fail, then there has been a change in THT [3] (not included in the recent z2 update) which probably will workaround the issue.

3. https://review.openstack.org/#/c/589185 makes that work.

Comment 18 John Fulton 2018-09-06 11:53:30 UTC
(In reply to karan singh from comment #15)
> In response To your comment >> "Unless this is some environmental issue that
> goes away when you attempt to redeploy."
> 
> Do you remember on Friday 31st Aug, you have deleted my overcloud stack and
> redeployed it and the problem still existed. 

Yes, I remember and I believe that's when THIS problem was introduced. Prior to Aug 31, you were hitting 1613474.

> My gut feeling is that if we
> destroy this stack and redeploy it again, this problem will re-appear (at
> least in my environment). I want to help you guys fix this issue once for
> all, so let me know when should we attempt to reproduce this and involve
> someone from Ceph engineering to take a look.

Thanks for that. If you want to do another test, then let's try again with Giulio's suggestions from the last comment.

Comment 25 Giulio Fidente 2018-09-06 17:48:45 UTC
Created attachment 1481388 [details]
auth_list_3.1.2

This is the auth list output produced by the 3.1.2 deployment, which shows single quotes around the daemon caps

Comment 26 Giulio Fidente 2018-09-06 18:02:48 UTC
Created attachment 1481390 [details]
ansible_log_3.1.2

This is the ceph-ansible playbook log from the 3.1.2 deployment

Comment 27 Giulio Fidente 2018-09-06 18:35:16 UTC
Created attachment 1481394 [details]
inventory_3.1.2

This is the inventory file used with the 3.1.2 deployment

Comment 28 Giulio Fidente 2018-09-06 18:37:02 UTC
Created attachment 1481396 [details]
auth_list_3.1.2

This is the auth list output produced by the 3.1.2 deployment, which shows single quotes around the daemon caps

Comment 30 Giulio Fidente 2018-09-06 19:01:52 UTC
Created attachment 1481400 [details]
ansible_log_3.1.2

This is the ceph-ansible playbook log from the 3.1.2 deployment

Comment 31 Giulio Fidente 2018-09-06 20:16:46 UTC
Created attachment 1481412 [details]
auth_list_3.1.0rc10

This is the auth list output produced by the 3.1.0rc10 deployment, which doesn't have any single quote

Comment 32 Giulio Fidente 2018-09-06 20:17:42 UTC
Created attachment 1481413 [details]
inventory_3.1.0rc10

This is the inventory file used with the 3.1.0rc10 deployment

Comment 44 errata-xmlrpc 2018-09-26 18:24:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819


Note You need to log in before you can comment on or make changes to this bug.