Description of problem: openstack overcloud deploy command ran for 4 hours (247 Minutes) but failed eventually with error the following "[overcloud.AllNodesDeploySteps]: CREATE_FAILED CREATE aborted (Task create from TemplateResource "AllNodesDeploySteps" Stack "overcloud" [404edbf5-6788-4255-95cd-68f0b1044d27] Timed out)" Version-Release number of selected component (if applicable): [root@refarch-r220-02 ~]# rpm -qa | grep -i openstack openstack-nova-placement-api-17.0.3-0.20180420001141.el7ost.noarch puppet-openstacklib-12.4.0-0.20180329042555.4b30e6f.el7ost.noarch openstack-glance-16.0.1-2.el7ost.noarch openstack-tripleo-common-containers-8.6.1-23.el7ost.noarch openstack-heat-api-10.0.1-0.20180411125640.el7ost.noarch python2-openstackclient-3.14.1-1.el7ost.noarch openstack-tempest-18.0.0-2.el7ost.noarch openstack-mistral-api-6.0.2-1.el7ost.noarch openstack-zaqar-6.0.1-1.el7ost.noarch openstack-swift-container-2.17.1-0.20180314165245.caeeb54.el7ost.noarch openstack-neutron-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch openstack-ironic-api-10.1.2-4.el7ost.noarch openstack-tripleo-ui-8.3.1-3.el7ost.noarch python-openstackclient-lang-3.14.1-1.el7ost.noarch openstack-tripleo-puppet-elements-8.0.0-2.el7ost.noarch openstack-nova-scheduler-17.0.3-0.20180420001141.el7ost.noarch openstack-swift-object-2.17.1-0.20180314165245.caeeb54.el7ost.noarch openstack-neutron-common-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch openstack-neutron-ml2-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch openstack-heat-engine-10.0.1-0.20180411125640.el7ost.noarch openstack-ironic-common-10.1.2-4.el7ost.noarch openstack-mistral-executor-6.0.2-1.el7ost.noarch openstack-tripleo-heat-templates-8.0.2-43.el7ost.noarch openstack-selinux-0.8.14-12.el7ost.noarch python2-openstacksdk-0.11.3-1.el7ost.noarch openstack-nova-api-17.0.3-0.20180420001141.el7ost.noarch openstack-nova-compute-17.0.3-0.20180420001141.el7ost.noarch openstack-keystone-13.0.1-0.20180420194847.7bd6454.el7ost.noarch openstack-nova-common-17.0.3-0.20180420001141.el7ost.noarch openstack-neutron-openvswitch-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch openstack-heat-api-cfn-10.0.1-0.20180411125640.el7ost.noarch openstack-ironic-staging-drivers-0.9.0-4.el7ost.noarch openstack-ironic-inspector-7.2.1-0.20180409163360.el7ost.noarch openstack-mistral-engine-6.0.2-1.el7ost.noarch puppet-openstack_extras-12.4.1-0.20180413042250.2634296.el7ost.noarch openstack-swift-account-2.17.1-0.20180314165245.caeeb54.el7ost.noarch openstack-tripleo-common-8.6.1-23.el7ost.noarch openstack-swift-proxy-2.17.1-0.20180314165245.caeeb54.el7ost.noarch openstack-ironic-conductor-10.1.2-4.el7ost.noarch openstack-tripleo-validations-8.4.1-5.el7ost.noarch openstack-nova-conductor-17.0.3-0.20180420001141.el7ost.noarch openstack-tripleo-image-elements-8.0.1-1.el7ost.noarch openstack-heat-common-10.0.1-0.20180411125640.el7ost.noarch openstack-mistral-common-6.0.2-1.el7ost.noarch [root@refarch-r220-02 ~]# [root@refarch-r220-02 ~]# rpm -qa | grep -i ceph puppet-ceph-2.5.0-1.el7ost.noarch ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch [root@refarch-r220-02 ~]# How reproducible: 100% in my setup Steps to Reproduce: 1. Deploy overcloud time openstack overcloud deploy \ --templates /usr/share/openstack-tripleo-heat-templates \ -r /home/stack/templates/roles_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/templates/overcloud_images.yaml \ -e /home/stack/templates/network-environment.yaml \ -e /home/stack/templates/global-config.yaml \ -e /home/stack/templates/ceph-config.yaml > /tmp/overcloud.logs 2>&1 Actual results: Overcloud stack creation is failing Expected results: overcloud stack creation must be successful Additional info: ## Log - 1 : openstack overcloud deploy command ran 247 Minutes (4 hours) (undercloud) [stack@refarch-r220-02 ~]$ time openstack overcloud deploy \ > --templates /usr/share/openstack-tripleo-heat-templates \ > -r /home/stack/templates/roles_data.yaml \ > -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ > -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ > -e /home/stack/templates/overcloud_images.yaml \ > -e /home/stack/templates/network-environment.yaml \ > -e /home/stack/templates/global-config.yaml \ > -e /home/stack/templates/ceph-config.yaml > /tmp/overcloud.logs 2>&1 real 247m19.386s user 0m23.972s sys 0m1.158s (undercloud) [stack@refarch-r220-02 ~]$ ## Log - 2 : openstack overcloud deploy output 2018-08-28 17:14:18Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.1]: CREATE_IN_PROGRESS state changed 2018-08-28 17:14:19Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.2]: CREATE_IN_PROGRESS state changed 2018-08-28 17:14:20Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.4]: CREATE_IN_PROGRESS state changed 2018-08-28 17:15:0 Heat Stack create failed. Heat Stack create failed. 4Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.4]: SIGNAL_IN_PROGRESS Signal: deployment b5dcd557-6c14-4261-859d-aeae801c29b5 succeeded 2018-08-28 17:15:05Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.4]: CREATE_COMPLETE state changed 2018-08-28 17:15:11Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.1]: SIGNAL_IN_PROGRESS Signal: deployment a61c6110-1771-42f1-9a5d-bd06162c6e8a succeeded 2018-08-28 17:15:12Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.0]: SIGNAL_IN_PROGRESS Signal: deployment ebbf92e1-1151-4249-bd7f-e20ee2d0640e succeeded 2018-08-28 17:15:13Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.0]: CREATE_COMPLETE state changed 2018-08-28 17:15:13Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.3]: SIGNAL_IN_PROGRESS Signal: deployment a63f1d2f-e9ef-4180-a97e-280d78440d60 succeeded 2018-08-28 17:15:13Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.2]: SIGNAL_IN_PROGRESS Signal: deployment 69baae25-76fc-40d8-998e-611ffd4d1a7b succeeded 2018-08-28 17:15:13Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.1]: CREATE_COMPLETE state changed 2018-08-28 17:15:13Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.3]: CREATE_COMPLETE state changed 2018-08-28 17:15:14Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4.2]: CREATE_COMPLETE state changed 2018-08-28 17:15:14Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4]: CREATE_COMPLETE Stack CREATE completed successfully 2018-08-28 17:15:14Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step4]: CREATE_COMPLETE state changed 2018-08-28 20:07:11Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED CREATE aborted (Task create from TemplateResource "AllNodesDeploySteps" Stack "overcloud" [404edbf5-6788-4255-95cd-68f0b1044d27] Timed out) 2018-08-28 20:07:11Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED Stack CREATE cancelled 2018-08-28 20:07:11Z [overcloud]: CREATE_FAILED Timed out 2018-08-28 20:07:12Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step4]: CREATE_FAILED Stack CREATE cancelled 2018-08-28 20:07:12Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step4]: CREATE_FAILED resources.ControllerDeployment_Step4: Stack CREATE cancelled 2018-08-28 20:07:12Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED Resource CREATE failed: resources.ControllerDeployment_Step4: Stack CREATE cancelled Stack overcloud CREATE_FAILED overcloud.AllNodesDeploySteps.ControllerDeployment_Step4: resource_type: OS::TripleO::DeploymentSteps physical_resource_id: 8bbfde33-eba0-4e1f-b656-8b2f4fd7daf5 status: CREATE_FAILED status_reason: | resources.ControllerDeployment_Step4: Stack CREATE cancelled ## Log - 3 : openstack stack list & openstack stack resource list output (undercloud) [stack@refarch-r220-02 ~]$ openstack stack list +--------------------------------------+------------+----------------------------------+---------------+----------------------+--------------+ | ID | Stack Name | Project | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+----------------------------------+---------------+----------------------+--------------+ | 404edbf5-6788-4255-95cd-68f0b1044d27 | overcloud | d4ae8726a690413cbc62ade7e5e70763 | CREATE_FAILED | 2018-08-28T16:07:11Z | None | +--------------------------------------+------------+----------------------------------+---------------+----------------------+--------------+ (undercloud) [stack@refarch-r220-02 ~]$ (undercloud) [stack@refarch-r220-02 ~]$ openstack stack resource list -n5 overcloud | grep FAILED | AllNodesDeploySteps | e446e19a-b580-48ef-bea8-cdebb69b1480 | OS::TripleO::PostDeploySteps | CREATE_FAILED | 2018-08-28T16:07:14Z | overcloud | | ControllerDeployment_Step4 | 8bbfde33-eba0-4e1f-b656-8b2f4fd7daf5 | OS::TripleO::DeploymentSteps | CREATE_FAILED | 2018-08-28T16:33:29Z | overcloud-AllNodesDeploySteps-rawlxuoiltb7 | | 0 | 3aaa2d17-aac9-4cd1-a8cd-5408ef530a93 | OS::Heat::StructuredDeployment | CREATE_FAILED | 2018-08-28T17:14:15Z | overcloud-AllNodesDeploySteps-rawlxuoiltb7-ControllerDeployment_Step4-jjzyhon333i4 | (undercloud) [stack@refarch-r220-02 ~]$ ## Log - 4 : stack-list --show-nested & openstack stack resource list output (undercloud) [stack@refarch-r220-02 ~]$ heat stack-list --show-nested -f "status=FAILED" WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead +--------------------------------------+------------------------------------------------------------------------------------+---------------+----------------------+--------------+--------------------------------------+----------------------------------+ | id | stack_name | stack_status | creation_time | updated_time | parent | project | +--------------------------------------+------------------------------------------------------------------------------------+---------------+----------------------+--------------+--------------------------------------+----------------------------------+ | 404edbf5-6788-4255-95cd-68f0b1044d27 | overcloud | CREATE_FAILED | 2018-08-28T16:07:11Z | None | None | d4ae8726a690413cbc62ade7e5e70763 | | e446e19a-b580-48ef-bea8-cdebb69b1480 | overcloud-AllNodesDeploySteps-rawlxuoiltb7 | CREATE_FAILED | 2018-08-28T16:33:28Z | None | 404edbf5-6788-4255-95cd-68f0b1044d27 | d4ae8726a690413cbc62ade7e5e70763 | | 8bbfde33-eba0-4e1f-b656-8b2f4fd7daf5 | overcloud-AllNodesDeploySteps-rawlxuoiltb7-ControllerDeployment_Step4-jjzyhon333i4 | CREATE_FAILED | 2018-08-28T17:14:14Z | None | e446e19a-b580-48ef-bea8-cdebb69b1480 | d4ae8726a690413cbc62ade7e5e70763 | +--------------------------------------+------------------------------------------------------------------------------------+---------------+----------------------+--------------+--------------------------------------+----------------------------------+ (undercloud) [stack@refarch-r220-02 ~]$ (undercloud) [stack@refarch-r220-02 ~]$ openstack stack resource list overcloud +----------------------------------------+--------------------------------------------------------+--------------------------------------------------+-----------------+----------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | +----------------------------------------+--------------------------------------------------------+--------------------------------------------------+-----------------+----------------------+ | ControllerServers | overcloud-ControllerServers-mbo7qqn65mjg | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:15Z | | UpdateWorkflow | 93891cf0-be2c-462c-b0e6-1fa557458038 | OS::TripleO::Tasks::UpdateWorkflow | CREATE_COMPLETE | 2018-08-28T16:07:14Z | | ComputeHCI | 1e9689bb-7f61-4c4a-9910-d1fc7bd49b97 | OS::Heat::ResourceGroup | CREATE_COMPLETE | 2018-08-28T16:07:16Z | | StorageVirtualIP | f813b501-c35d-4eeb-b1ea-258182cfd0ed | OS::TripleO::Network::Ports::StorageVipPort | CREATE_COMPLETE | 2018-08-28T16:07:19Z | | ControllerServiceChain | 49654455-3641-4d0a-aea5-76d51fc5edfd | OS::TripleO::Services | CREATE_COMPLETE | 2018-08-28T16:07:17Z | | ComputeHCIAllNodesValidationDeployment | a27d16e9-6458-446e-8add-7c81f7463d30 | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2018-08-28T16:07:14Z | | ServerIdMap | overcloud-ServerIdMap-2rjhsgt2txn4 | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:14Z | | PcsdPassword | overcloud-PcsdPassword-sxf5ktf2chz2 | OS::TripleO::RandomString | CREATE_COMPLETE | 2018-08-28T16:07:17Z | | hostsConfig | 4e46b143-e189-428e-8192-1a5c2d4e6577 | OS::TripleO::Hosts::SoftwareConfig | CREATE_COMPLETE | 2018-08-28T16:07:16Z | | ServerOsCollectConfigData | overcloud-ServerOsCollectConfigData-dbiinxmr7asc | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:14Z | | AllNodesExtraConfig | 7d44cff7-b5f5-4c83-8bdd-07e690b66b72 | OS::TripleO::AllNodesExtraConfig | CREATE_COMPLETE | 2018-08-28T16:07:14Z | | allNodesConfig | 94db604b-8fc1-4012-803c-4d8913025c6d | OS::TripleO::AllNodes::SoftwareConfig | CREATE_COMPLETE | 2018-08-28T16:07:15Z | | ControllerServiceNames | overcloud-ControllerServiceNames-ewnpbi7x5hv3 | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:16Z | | ComputeHCIMergedConfigSettings | overcloud-ComputeHCIMergedConfigSettings-satfbs3ugqiz | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:14Z | | ControllerIpListMap | a4631385-fd49-49ad-aefe-9b3bb3bed31e | OS::TripleO::Network::Ports::NetIpListMap | CREATE_COMPLETE | 2018-08-28T16:07:15Z | | RedisVirtualIP | b469a70b-4aec-4dc1-9912-534218d27e88 | OS::TripleO::Network::Ports::RedisVipPort | CREATE_COMPLETE | 2018-08-28T16:07:15Z | | ControllerSshKnownHostsDeployment | 322091cd-f558-41f6-9d7a-f241e4554edd | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2018-08-28T16:07:15Z | | NetCidrMapValue | overcloud-NetCidrMapValue-6hphspcljofv | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:17Z | | SshKnownHostsConfig | 0069ed5c-1253-42c4-ade7-b7b1b976bf56 | OS::TripleO::Ssh::KnownHostsConfig | CREATE_COMPLETE | 2018-08-28T16:07:15Z | | DefaultPasswords | 5fd79a85-27e0-4354-82d2-2abc1907f565 | OS::TripleO::DefaultPasswords | CREATE_COMPLETE | 2018-08-28T16:07:17Z | | BlacklistedHostnames | overcloud-BlacklistedHostnames-hxjbfsz3cjoq | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:14Z | | ControllerAllNodesValidationDeployment | 2498d999-0ad4-4e9e-89b4-b1d412c1ce65 | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2018-08-28T16:07:14Z | | VipMap | ec52e32c-605a-4736-b9fc-f411e2c16f86 | OS::TripleO::Network::Ports::NetVipMap | CREATE_COMPLETE | 2018-08-28T16:07:18Z | | ServiceNetMap | cfb17c59-6a86-48ae-9ed8-15c66922cfa3 | OS::TripleO::ServiceNetMap | CREATE_COMPLETE | 2018-08-28T16:07:18Z | | RabbitCookie | overcloud-RabbitCookie-ke2fu7othjj6 | OS::TripleO::RandomString | CREATE_COMPLETE | 2018-08-28T16:07:17Z | | ControlVirtualIP | 09805e65-4943-471c-aae9-9830727a9156 | OS::TripleO::Network::Ports::ControlPlaneVipPort | CREATE_COMPLETE | 2018-08-28T16:07:19Z | | EndpointMapData | overcloud-EndpointMapData-wirieu5zv5wy | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:17Z | | Controller | 887a4b0b-c913-48c0-9111-40f83a9f4169 | OS::Heat::ResourceGroup | CREATE_COMPLETE | 2018-08-28T16:07:16Z | | ControllerServiceConfigSettings | overcloud-ControllerServiceConfigSettings-sw5yalbnrxof | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:16Z | | MysqlRootPassword | overcloud-MysqlRootPassword-nohdczm2jb2z | OS::TripleO::RandomString | CREATE_COMPLETE | 2018-08-28T16:07:17Z | | ControllerServiceChainRoleData | overcloud-ControllerServiceChainRoleData-aa2lsqlzx44x | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:17Z | | DeployedServerEnvironment | 5b356dd7-a7c1-47ef-90b6-017645a20468 | OS::TripleO::DeployedServerEnvironment | CREATE_COMPLETE | 2018-08-28T16:07:14Z | | ComputeHCIServiceChainRoleData | overcloud-ComputeHCIServiceChainRoleData-e3ltkegmzz7r | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:17Z | | AllNodesDeploySteps | e446e19a-b580-48ef-bea8-cdebb69b1480 | OS::TripleO::PostDeploySteps | CREATE_FAILED | 2018-08-28T16:07:14Z | | ComputeHCISshKnownHostsDeployment | 44d662d5-93ec-469f-a758-4f65165da48d | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2018-08-28T16:07:15Z | | AllNodesValidationConfig | 3f5bb8d4-f6ec-449d-8498-e3dbf6610ef4 | OS::TripleO::AllNodes::Validation | CREATE_COMPLETE | 2018-08-28T16:07:16Z | | InternalApiVirtualIP | c6e811f7-37e3-44dc-b334-b0a06bb43be7 | OS::TripleO::Network::Ports::InternalApiVipPort | CREATE_COMPLETE | 2018-08-28T16:07:18Z | | ComputeHCIHostsDeployment | aa3ee2da-7f89-4800-9c67-09d416ae422c | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2018-08-28T16:07:15Z | | ControllerAllNodesDeployment | d2feca33-0a4d-4c14-9aae-a84b1ea4e0f4 | OS::TripleO::AllNodesDeployment | CREATE_COMPLETE | 2018-08-28T16:07:15Z | | HeatAuthEncryptionKey | overcloud-HeatAuthEncryptionKey-dx3ijqv3fegf | OS::TripleO::RandomString | CREATE_COMPLETE | 2018-08-28T16:07:17Z | | ComputeHCIServiceChain | cc16cd91-5c0b-4012-9924-eb218334d7a5 | OS::TripleO::Services | CREATE_COMPLETE | 2018-08-28T16:07:17Z | | EndpointMap | 81ed4610-732b-474a-b355-1f8f7d841d2f | OS::TripleO::EndpointMap | CREATE_COMPLETE | 2018-08-28T16:07:17Z | | ControllerHostsDeployment | 758d948b-31b9-4fd0-b668-ca2cbca3bc90 | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2018-08-28T16:07:15Z | | ControllerMergedConfigSettings | overcloud-ControllerMergedConfigSettings-yc3lit4rmlex | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:16Z | | ComputeHCIServiceNames | overcloud-ComputeHCIServiceNames-wgvcb36tmzfo | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:17Z | | BlacklistedIpAddresses | overcloud-BlacklistedIpAddresses-5ogele4nytis | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:16Z | | ComputeHCIServers | overcloud-ComputeHCIServers-tswrhwpnjnol | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:15Z | | DeploymentServerBlacklistDict | overcloud-DeploymentServerBlacklistDict-ycuunj7xcmkh | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:16Z | | StorageMgmtVirtualIP | 3a5f7d31-ecce-45d1-b0bd-6f717ab1f503 | OS::TripleO::Network::Ports::StorageMgmtVipPort | CREATE_COMPLETE | 2018-08-28T16:07:18Z | | ComputeHCIIpListMap | c959a285-c128-4e6e-b8ba-bf1b401cd1c1 | OS::TripleO::Network::Ports::NetIpListMap | CREATE_COMPLETE | 2018-08-28T16:07:16Z | | HorizonSecret | overcloud-HorizonSecret-rrpmv25ewxuc | OS::TripleO::RandomString | CREATE_COMPLETE | 2018-08-28T16:07:17Z | | ComputeHCIAllNodesDeployment | 89905832-bff3-41ce-926d-dae27c5d7192 | OS::TripleO::AllNodesDeployment | CREATE_COMPLETE | 2018-08-28T16:07:15Z | | ComputeHCINetworkHostnameMap | overcloud-ComputeHCINetworkHostnameMap-mgveiiwu7wv2 | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:16Z | | VipHosts | overcloud-VipHosts-2ovf5fkuncwm | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:16Z | | ComputeHCIServiceConfigSettings | overcloud-ComputeHCIServiceConfigSettings-trirhpiid7yc | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:16Z | | PublicVirtualIP | 6ed28369-cfa1-4c6b-834b-c829573139a7 | OS::TripleO::Network::Ports::ExternalVipPort | CREATE_COMPLETE | 2018-08-28T16:07:19Z | | Networks | d9bbce20-96b8-4f85-b20a-6760bb816ee1 | OS::TripleO::Network | CREATE_COMPLETE | 2018-08-28T16:07:19Z | | ControllerNetworkHostnameMap | overcloud-ControllerNetworkHostnameMap-iuky4ljobrem | OS::Heat::Value | CREATE_COMPLETE | 2018-08-28T16:07:16Z | +----------------------------------------+--------------------------------------------------------+--------------------------------------------------+-----------------+----------------------+ (undercloud) [stack@refarch-r220-02 ~]$ (undercloud) [stack@refarch-r220-02 ~]$ openstack stack failures list --long overcloud overcloud.AllNodesDeploySteps.ControllerDeployment_Step4.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 3aaa2d17-aac9-4cd1-a8cd-5408ef530a93 status: CREATE_FAILED status_reason: | CREATE aborted (Task create from StructuredDeployment "0" Stack "overcloud-AllNodesDeploySteps-rawlxuoiltb7-ControllerDeployment_Step4-jjzyhon333i4" [8bbfde33-eba0-4e1f-b656-8b2f4fd7daf5] Timed out) deploy_stdout: | None deploy_stderr: | None (undercloud) [stack@refarch-r220-02 ~]$ ## Log - 5 : openstack action execution list shows ceph-install tasks are still running but in reality openstack overcloud deploy command has failed. (undercloud) [stack@refarch-r220-02 ~]$ openstack action execution list | grep -v SUCCESS +--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+ | ID | Name | Workflow name | Workflow namespace | Task name | Task ID | State | Accepted | Created at | Updated at | +--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+ | 35d57a63-08be-4a19-acd1-4014f9b2a211 | tripleo.ansible-playbook | tripleo.storage.v1.ceph-install | | ceph_install | 8051cad9-32ec-45d5-ad6e-88ccd64f6f6a | RUNNING | False | 2018-08-25 21:29:04 | <none> | | 7085d5fa-a1ac-4c9a-92c4-93bacdefafae | tripleo.ansible-playbook | tripleo.storage.v1.ceph-install | | ceph_install | 0faf2620-748c-4de2-a0b6-0a0648864a89 | RUNNING | False | 2018-08-26 05:31:27 | <none> | +--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+ (undercloud) [stack@refarch-r220-02 ~]$ ## Log - 6 : /var/log/mistral/ceph-workflow.log shows ceph deployment was successfull 2018-08-28 12:54:33,055 p=15144 u=mistral | TASK [set ceph client install 'Complete'] ************************************** 2018-08-28 12:54:33,055 p=15144 u=mistral | Tuesday 28 August 2018 12:54:33 -0400 (0:00:03.001) 0:09:17.184 ******** 2018-08-28 12:54:33,202 p=15144 u=mistral | ok: [192.168.120.7] 2018-08-28 12:54:33,208 p=15144 u=mistral | PLAY RECAP ********************************************************************* 2018-08-28 12:54:33,208 p=15144 u=mistral | 192.168.120.12 : ok=114 changed=20 unreachable=0 failed=0 2018-08-28 12:54:33,208 p=15144 u=mistral | 192.168.120.13 : ok=107 changed=15 unreachable=0 failed=0 2018-08-28 12:54:33,208 p=15144 u=mistral | 192.168.120.14 : ok=107 changed=15 unreachable=0 failed=0 2018-08-28 12:54:33,208 p=15144 u=mistral | 192.168.120.18 : ok=123 changed=22 unreachable=0 failed=0 2018-08-28 12:54:33,208 p=15144 u=mistral | 192.168.120.6 : ok=107 changed=15 unreachable=0 failed=0 2018-08-28 12:54:33,208 p=15144 u=mistral | 192.168.120.7 : ok=138 changed=19 unreachable=0 failed=0 2018-08-28 12:54:33,209 p=15144 u=mistral | INSTALLER STATUS *************************************************************** 2018-08-28 12:54:33,228 p=15144 u=mistral | Install Ceph Monitor : Complete (0:01:24) 2018-08-28 12:54:33,228 p=15144 u=mistral | Install Ceph Manager : Complete (0:00:37) 2018-08-28 12:54:33,228 p=15144 u=mistral | Install Ceph OSD : Complete (0:06:02) 2018-08-28 12:54:33,228 p=15144 u=mistral | Install Ceph Client : Complete (0:00:54) 2018-08-28 12:54:33,229 p=15144 u=mistral | Tuesday 28 August 2018 12:54:33 -0400 (0:00:00.173) 0:09:17.358 ******** 2018-08-28 12:54:33,229 p=15144 u=mistral | =============================================================================== (undercloud) [stack@refarch-r220-02 ~]$ ssh heat-admin.120.16 ceph -s cluster: id: 214c329a-a79d-11e8-916e-2047478ccfaa health: HEALTH_WARN too few PGs per OSD (8 < min 30) services: mon: 1 daemons, quorum controller-0 mgr: controller-0(active) osd: 60 osds: 60 up, 60 in data: pools: 5 pools, 160 pgs objects: 0 objects, 0 bytes usage: 6524 MB used, 218 TB / 218 TB avail pgs: 160 active+clean (undercloud) [stack@refarch-r220-02 ~]$
For better viewing same logs are here : https://pastebin.com/raw/SkGknKjX
If you pay attention to the below output, only ceph-install task is getting stuck (running endlessly, causing the main openstack overcloud deploy command to run for 4 hours and never finish) (undercloud) [stack@refarch-r220-02 ~]$ openstack action execution list | grep -v SUCCESS +--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+ | ID | Name | Workflow name | Workflow namespace | Task name | Task ID | State | Accepted | Created at | Updated at | +--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+ | 35d57a63-08be-4a19-acd1-4014f9b2a211 | tripleo.ansible-playbook | tripleo.storage.v1.ceph-install | | ceph_install | 8051cad9-32ec-45d5-ad6e-88ccd64f6f6a | RUNNING | False | 2018-08-25 21:29:04 | <none> | | 7085d5fa-a1ac-4c9a-92c4-93bacdefafae | tripleo.ansible-playbook | tripleo.storage.v1.ceph-install | | ceph_install | 0faf2620-748c-4de2-a0b6-0a0648864a89 | RUNNING | False | 2018-08-26 05:31:27 | <none> | +--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+ (undercloud) [stack@refarch-r220-02 ~]$
@John Fulton : Do you think this issues is somewhat related to the one you helped in fixing https://bugzilla.redhat.com/show_bug.cgi?id=1619263#c6 ?? FYI, i never managed to get overcloud deployed successfully even after changes mentioned in BZ1619263
I think those to stuck workflows are left overs from when you ran into 1619263 so I'd rather not conflate them with this bug. If they were an issue you wouldn't have gotten to step 4. Feel free to set the status of those two workflows to ERROR: mistral execution-update -s ERROR <ID> (In reply to karan singh from comment #2) > If you pay attention to the below output, only ceph-install task is getting > stuck (running endlessly, causing the main openstack overcloud deploy > command to run for 4 hours and never finish) > > > (undercloud) [stack@refarch-r220-02 ~]$ openstack action execution list | > grep -v SUCCESS > +--------------------------------------+---------------------------------+--- > --------------------------------------+--------------------+----------------- > ---------+--------------------------------------+---------+----------+------- > --------------+---------------------+ > | ID | Name | > Workflow name | Workflow namespace | Task name > | Task ID | State | Accepted | Created at > | Updated at | > +--------------------------------------+---------------------------------+--- > --------------------------------------+--------------------+----------------- > ---------+--------------------------------------+---------+----------+------- > --------------+---------------------+ > | 35d57a63-08be-4a19-acd1-4014f9b2a211 | tripleo.ansible-playbook | > tripleo.storage.v1.ceph-install | | ceph_install > | 8051cad9-32ec-45d5-ad6e-88ccd64f6f6a | RUNNING | False | 2018-08-25 > 21:29:04 | <none> | > | 7085d5fa-a1ac-4c9a-92c4-93bacdefafae | tripleo.ansible-playbook | > tripleo.storage.v1.ceph-install | | ceph_install > | 0faf2620-748c-4de2-a0b6-0a0648864a89 | RUNNING | False | 2018-08-26 > 05:31:27 | <none> | > +--------------------------------------+---------------------------------+--- > --------------------------------------+--------------------+----------------- > ---------+--------------------------------------+---------+----------+------- > --------------+---------------------+ > (undercloud) [stack@refarch-r220-02 ~]$
JohnF graciously provided another pointer to deploy without telemetry. So I am now performing a clean deployment by disabling telemetry [1] [1] https://raw.githubusercontent.com/openstack/tripleo-heat-templates/master/environments/disable-telemetry.yaml
In a fresh overcloud deployment, after disabling telemetry, stack creation was successful ---------- 2018-08-29 17:04:04Z [overcloud]: CREATE_COMPLETE Stack CREATE completed successfully Stack overcloud CREATE_COMPLETE Started Mistral Workflow tripleo.deployment.v1.get_horizon_url. Execution ID: 9558215f-d077-4828-b1a5-871fc8cbcb3f Overcloud Endpoint: http://172.21.1.159:5000/ Overcloud Horizon Dashboard URL: http://172.21.1.159:80/dashboard Overcloud rc file: /home/stack/overcloudrc Overcloud Deployed (undercloud) [stack@refarch-r220-02 ~]$ openstack stack list +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ | ID | Stack Name | Project | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ | a936509f-8df6-44e5-9dc6-ebd85915d14c | overcloud | d4ae8726a690413cbc62ade7e5e70763 | CREATE_COMPLETE | 2018-08-29T15:56:25Z | None | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ (undercloud) [stack@refarch-r220-02 ~]$ ---------- However here are still workflow tasks for ceph_install which are still RUNNING (stuck in RUNNING state) (undercloud) [stack@refarch-r220-02 ~]$ openstack action execution list | grep -v SUCCESS +--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+ | ID | Name | Workflow name | Workflow namespace | Task name | Task ID | State | Accepted | Created at | Updated at | +--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+ | 35d57a63-08be-4a19-acd1-4014f9b2a211 | tripleo.ansible-playbook | tripleo.storage.v1.ceph-install | | ceph_install | 8051cad9-32ec-45d5-ad6e-88ccd64f6f6a | RUNNING | False | 2018-08-25 21:29:04 | <none> | | 7085d5fa-a1ac-4c9a-92c4-93bacdefafae | tripleo.ansible-playbook | tripleo.storage.v1.ceph-install | | ceph_install | 0faf2620-748c-4de2-a0b6-0a0648864a89 | RUNNING | False | 2018-08-26 05:31:27 | <none> | +--------------------------------------+---------------------------------+-----------------------------------------+--------------------+--------------------------+--------------------------------------+---------+----------+---------------------+---------------------+ (undercloud) [stack@refarch-r220-02 ~]$ (undercloud) [stack@refarch-r220-02 ~]$ openstack workflow execution list | grep -v SUCCESS +--------------------------------------+--------------------------------------+------------------------------------------------------------------------+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+---------+------------+---------------------+---------------------+ | ID | Workflow ID | Workflow name | Workflow namespace | Description | Task Execution ID | State | State info | Created at | Updated at | +--------------------------------------+--------------------------------------+------------------------------------------------------------------------+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+---------+------------+---------------------+---------------------+ | 2a21b6ca-18e6-4b02-acca-d689802d9458 | 7383b3f7-3923-489d-8f92-b8fa9306ed01 | tripleo.overcloud.workflow_tasks.step2 | | Heat managed | <none> | RUNNING | None | 2018-08-25 21:27:54 | 2018-08-25 21:27:55 | | 1bfef3df-d437-48f9-a5c4-fc306bb63cdd | bd484c0e-a8bf-4c5f-abbd-cdd96483affa | tripleo.storage.v1.ceph-install | | sub-workflow execution | 7f2d690c-75de-4221-819d-b8661870d94e | RUNNING | None | 2018-08-25 21:27:55 | 2018-08-25 21:27:55 | | cd5f6ac9-5c06-4bb3-8bd6-fb9702e7fbac | 7383b3f7-3923-489d-8f92-b8fa9306ed01 | tripleo.overcloud.workflow_tasks.step2 | | Heat managed | <none> | RUNNING | None | 2018-08-26 05:30:27 | 2018-08-26 05:30:27 | | ec24d232-54b2-410d-9825-04754d7e1e09 | bd484c0e-a8bf-4c5f-abbd-cdd96483affa | tripleo.storage.v1.ceph-install | | sub-workflow execution | 655d4562-88d7-47ac-833b-763b511fda93 | RUNNING | None | 2018-08-26 05:30:27 | 2018-08-26 05:30:27 | | 728a21a7-69a2-403a-995b-08ca7e5cf28b | bd484c0e-a8bf-4c5f-abbd-cdd96483affa | tripleo.storage.v1.ceph-install | | sub-workflow execution | 62c6c1eb-87ff-4f58-8244-5783f0979fe7 | RUNNING | None | 2018-08-26 09:11:24 | 2018-08-26 09:11:24 | | 76c9efff-bac5-415a-8988-bc0c21bc2f0a | 02b8310c-45ba-4c89-8989-2d63acb6e030 | tripleo.overcloud.workflow_tasks.step2 | | Heat managed | <none> | RUNNING | None | 2018-08-26 09:11:24 | 2018-08-26 09:11:24 | +--------------------------------------+--------------------------------------+------------------------------------------------------------------------+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+---------+------------+---------------------+---------------------+ (undercloud) [stack@refarch-r220-02 ~]$ This is causing issues with Glance, Cinder and Nova. I am not able to - Create Glance Image (image creation stuck in saving) - Cinder volume creation throws error - Volume service is down - nova-compute container is unhealthy (overcloud) [stack@refarch-r220-02 tmp]$ openstack volume service list +------------------+------------------------+------+---------+-------+----------------------------+ | Binary | Host | Zone | Status | State | Updated At | +------------------+------------------------+------+---------+-------+----------------------------+ | cinder-scheduler | controller-0 | nova | enabled | up | 2018-08-29T19:20:18.000000 | | cinder-volume | hostgroup@tripleo_ceph | nova | enabled | down | 2018-08-29T17:04:05.000000 | +------------------+------------------------+------+---------+-------+----------------------------+ (overcloud) [stack@refarch-r220-02 tmp]$ [root@osd-compute-0 nova]# docker ps | grep -i nova 5e667b10c658 192.168.120.1:8787/rhosp13/openstack-nova-compute:latest "kolla_start" 2 hours ago Up 2 hours nova_migration_target 67f09d1d66bb 192.168.120.1:8787/rhosp13/openstack-nova-compute:latest "kolla_start" 2 hours ago Up 2 hours (unhealthy) nova_compute fbbe60c4bc17 192.168.120.1:8787/rhosp13/openstack-nova-libvirt:latest "kolla_start" 2 hours ago Up 2 hours nova_libvirt bd5712d3f206 192.168.120.1:8787/rhosp13/openstack-nova-libvirt:latest "kolla_start" 2 hours ago Up 2 hours nova_virtlogd [root@osd-compute-0 nova]# nova-compute container log =========================== + echo 'Running command: '\''/usr/bin/nova-compute '\''' + exec /usr/bin/nova-compute Running command: '/usr/bin/nova-compute ' /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:332: NotSupportedWarning: Configuration option(s) ['use_tpool'] not supported exception.NotSupportedWarning [root@osd-compute-0 nova]#
Heat won't move past Step2 if anything in that step remains stuck ... in addition to the ceph-ansible logs reporting success, the failures appear to be happening in Step4; this is definitely unrelated to the ceph-ansible workflow. The issue with the client.openstack keyring instead (seen when using disable-telemetry env file) is tracked by BZ#1613474. Closing as duplicate. *** This bug has been marked as a duplicate of bug 1613474 ***
@Giulio Fidente This issue which I have reported not only happens when overcloud is deployed using disable-telemetry env file. It occurs when deployed together with telemetry. So it's not a duplicate of BZ#1613474, we tried disable-telemetry env file to test if its a workaround for a functional overcloud. Its a reproducible issue and I can provide access to the environment if anyone is interested to look it.
Thanks for understanding Giulio. Yes the environment is available for us to debug together. I will ping you on IRC.
Changing subject as I don't think this has anything to do with previously running tasks.
WORKAROUND: ceph auth get client.openstack > /etc/ceph/ceph.client.openstack.keyring ceph auth del client.openstack ceph auth import -i /etc/ceph/ceph.client.openstack.keyring pcs resource restart openstack-cinder-volume After running the above as root on the controller node, cinder and glance started working. So this seems to be caused by an corrupted openstack keyring. Variations of the above were also attempted, but it wasn't until the old key was deleted before re-importing that the following command used to troubleshoot started working: rbd --keyring=/etc/ceph/ceph.client.openstack.keyring --id openstack -p images ls E.g. we created a new keyring (ceph.client.john.keyring) and it worked right away and we restarted the monitor container but it made no difference for getting the openstack keyring. I suspect that some copy of the keyring inside of Ceph was corrupted and I had to delete it to ensure it was cleaned. The unresolved matter is WHY was the keyring corrupted. If you can reproduce this with a fresh deploy, then the next step would be to get someone better versed in ceph key permissions to help. I don't think it's anything to do with the way tripleo asks ceph-ansible to create the keys because I'm unable to reproduce this issue in my environment and neither is CI. That's why I think, if this continues, that someone better at Ceph internals should identify the root cause. Unless this is some environmental issue that goes away when you attempt to redeploy. Recall that you had run into a few other issues earlier including 1613474 so this is the first single deployment we've done ever since 1613474 was worked around.
Hi John Really appreciate your help in troubleshooting this and finding a workaround. I confirm that cinder,glance and nova are now working with Ceph. Agree, "The unresolved matter is WHY was the keyring corrupted". In response To your comment >> "Unless this is some environmental issue that goes away when you attempt to redeploy." Do you remember on Friday 31st Aug, you have deleted my overcloud stack and redeployed it and the problem still existed. My gut feeling is that if we destroy this stack and redeploy it again, this problem will re-appear (at least in my environment). I want to help you guys fix this issue once for all, so let me know when should we attempt to reproduce this and involve someone from Ceph engineering to take a look.
Based on John's initial work, i did some series of tests, which shows single quotes (' ') in ceph auth capabilities is causing this problem. If i remove single quotes (manually) from ceph capabilities stored in Ceph. Things started to work. Now i am wondering if tripleo is adding these single quotes while creating ceph users ? OR ceph itself has dropped support for single quotes ?? which was previously there (not sure) In my environment we could probably change this behaviour in tripleo and try to redeploy the cluster to see if this problem gets fixed. Thoughts ?? ## Ceph auth list output for client.openstack before JohnF's workaround, when cinder, glance and Nova was not working. client.openstack key: AQCCAIBbAAAAABAABuJqVYY56PbuQ6KTu8C/Fg== caps: [mds] '' caps: [mgr] 'allow *' caps: [mon] 'profile rbd' caps: [osd] 'profile rbd pool=volumes, profile rbd pool=backups, profile rbd pool=vms, profile rbd pool=images, profile rbd pool=metrics' ## Ceph auth list output for client.openstack after JohnF deleted and imported client.openstack user client.openstack key: AQCCAIBbAAAAABAABuJqVYY56PbuQ6KTu8C/Fg== caps: [mds] caps: [mgr] allow * caps: [mon] profile rbd caps: [osd] profile rbd pool=volumes, profile rbd pool=backups, profile rbd pool=vms, profile rbd pool=images, profile rbd pool=metrics If you compare these 2 outputs, you could see that when capabilities are in single quotes (' ') , openstack services does not work. When JohnF deleted and re-imported client.openstack user these single quotes got removed and openstack services started to work. ## After workaround, client.openstack.keyring works fine [heat-admin@controller-0 ~]$ ceph --keyring=/etc/ceph/ceph.client.openstack.keyring --id openstack -s cluster: id: 214c329a-a79d-11e8-916e-2047478ccfaa health: HEALTH_WARN application not enabled on 2 pool(s) services: mon: 1 daemons, quorum controller-0 mgr: controller-0(active) osd: 60 osds: 60 up, 60 in data: pools: 5 pools, 1120 pgs objects: 1318 objects, 1731 MB usage: 7945 MB used, 218 TB / 218 TB avail pgs: 1120 active+clean ## If i try manila or radosgw keys, they don't work because in ceph auth list , they also have single quotes (' ') in capabilities [heat-admin@controller-0 ceph]$ ceph --keyring=/etc/ceph/ceph.client.manila.keyring --id manila -s Error EACCES: access denied [heat-admin@controller-0 ceph]$ [heat-admin@controller-0 ceph]$ [heat-admin@controller-0 ceph]$ ceph --keyring=/etc/ceph/ceph.client.radosgw.keyring --id radosgw -s Error EACCES: access denied [heat-admin@controller-0 ceph]$ $ sudo ceph auth list client.john key: AQCnfI5btYWfORAADiH22WsDDkB5v0782g0C2w== caps: [mds] allow caps: [mon] allow * caps: [osd] allow * client.manila key: AQCCAIBbAAAAABAA4kShATMpeQ/aVG4a64VR2Q== caps: [mds] 'allow *' caps: [mgr] 'allow *' caps: [mon] 'allow r, allow command "auth del", allow command "auth caps", allow command "auth get", allow command "auth get-or-create"' caps: [osd] 'allow rw' client.openstack key: AQCCAIBbAAAAABAABuJqVYY56PbuQ6KTu8C/Fg== caps: [mds] caps: [mgr] allow * caps: [mon] profile rbd caps: [osd] profile rbd pool=volumes, profile rbd pool=backups, profile rbd pool=vms, profile rbd pool=images, profile rbd pool=metrics client.radosgw key: AQCCAIBbAAAAABAAT4DElonMQQbiqSy8CfZyvg== caps: [mds] '' caps: [mgr] 'allow *' caps: [mon] 'allow rw' caps: [osd] 'allow rwx' mgr.controller-0 key: AQC9eolbZlf1MxAAy+dBzzXJ1odg8C1Wh+4r7w== caps: [mds] allow * caps: [mon] allow profile mgr caps: [osd] allow * ## To prove this theory, lets remove single quotes (' ') from ceph capabilities [heat-admin@controller-0 tmp]$ ceph auth get client.radosgw > ceph.client.radosgw.keyring exported keyring for client.radosgw [heat-admin@controller-0 tmp]$ cat ceph.client.radosgw.keyring [client.radosgw] key = AQCCAIBbAAAAABAAT4DElonMQQbiqSy8CfZyvg== caps mds = "''" caps mgr = "'allow *'" caps mon = "'allow rw'" caps osd = "'allow rwx'" [heat-admin@controller-0 tmp]$ ## You can see, there are single quotes. So lets edit ceph.client.radosgw.keyring and removed single quotes (' ') [heat-admin@controller-0 tmp]$ cat ceph.client.radosgw.keyring [client.radosgw] key = AQCCAIBbAAAAABAAT4DElonMQQbiqSy8CfZyvg== caps mds = "" caps mgr = "allow *" caps mon = "allow rw" caps osd = "allow rwx" [heat-admin@controller-0 tmp]$ ## Deleted client.radosgw user from ceph and reimported using the modified key without single quotes sudo ceph auth del client.radosgw sudo ceph auth import -i /tmp/ceph.client.radosgw.keyring sudo ceph auth list ## Single quotes removed client.radosgw key: AQCCAIBbAAAAABAAT4DElonMQQbiqSy8CfZyvg== caps: [mds] caps: [mgr] allow * caps: [mon] allow rw caps: [osd] allow rwx ## Able to run Ceph commands from client.radosgw user for which we have removed single quotes [heat-admin@controller-0 ~]$ sudo ceph --keyring=/tmp/ceph.client.radosgw.keyring --id radosgw -s cluster: id: 214c329a-a79d-11e8-916e-2047478ccfaa health: HEALTH_WARN application not enabled on 2 pool(s) services: mon: 1 daemons, quorum controller-0 mgr: controller-0(active) osd: 60 osds: 60 up, 60 in data: pools: 5 pools, 1120 pgs objects: 1318 objects, 1731 MB usage: 7945 MB used, 218 TB / 218 TB avail pgs: 1120 active+clean [heat-admin@controller-0 ~]$ ## Ceph commands still does not work with client.manila as in ceph auth list , manila capabilities are still using single quotes ('') [heat-admin@controller-0 ~]$ sudo ceph --keyring=/etc/ceph/ceph.client.manila.keyring --id manila -s Error EACCES: access denied [heat-admin@controller-0 ~]$ $ sudo ceph auth list client.manila key: AQCCAIBbAAAAABAA4kShATMpeQ/aVG4a64VR2Q== caps: [mds] 'allow *' caps: [mgr] 'allow *' caps: [mon] 'allow r, allow command "auth del", allow command "auth caps", allow command "auth get", allow command "auth get-or-create"' caps: [osd] 'allow rw' client.openstack key: AQCCAIBbAAAAABAABuJqVYY56PbuQ6KTu8C/Fg== caps: [mds] caps: [mgr] allow * caps: [mon] profile rbd caps: [osd] profile rbd pool=volumes, profile rbd pool=backups, profile rbd pool=vms, profile rbd pool=images, profile rbd pool=metrics client.radosgw key: AQCCAIBbAAAAABAAT4DElonMQQbiqSy8CfZyvg== caps: [mds] caps: [mgr] allow * caps: [mon] allow rw caps: [osd] allow rwx
This might be caused by a change in ceph-ansible; I think we need to test how the version currently shipping in OSP behaves [1] and how the upcoming build [2] shipping with the Ceph 3.1 update behaves. Are you able to test these version on the same environment and with the same parameters these two? 1. ceph-ansible-3.1.0-0.1.rc10 2. ceph-ansible 3.1.2 Should the 3.1.2 version fail, then there has been a change in THT [3] (not included in the recent z2 update) which probably will workaround the issue. 3. https://review.openstack.org/#/c/589185 makes that work.
(In reply to karan singh from comment #15) > In response To your comment >> "Unless this is some environmental issue that > goes away when you attempt to redeploy." > > Do you remember on Friday 31st Aug, you have deleted my overcloud stack and > redeployed it and the problem still existed. Yes, I remember and I believe that's when THIS problem was introduced. Prior to Aug 31, you were hitting 1613474. > My gut feeling is that if we > destroy this stack and redeploy it again, this problem will re-appear (at > least in my environment). I want to help you guys fix this issue once for > all, so let me know when should we attempt to reproduce this and involve > someone from Ceph engineering to take a look. Thanks for that. If you want to do another test, then let's try again with Giulio's suggestions from the last comment.
Created attachment 1481388 [details] auth_list_3.1.2 This is the auth list output produced by the 3.1.2 deployment, which shows single quotes around the daemon caps
Created attachment 1481390 [details] ansible_log_3.1.2 This is the ceph-ansible playbook log from the 3.1.2 deployment
Created attachment 1481394 [details] inventory_3.1.2 This is the inventory file used with the 3.1.2 deployment
Created attachment 1481396 [details] auth_list_3.1.2 This is the auth list output produced by the 3.1.2 deployment, which shows single quotes around the daemon caps
Created attachment 1481400 [details] ansible_log_3.1.2 This is the ceph-ansible playbook log from the 3.1.2 deployment
Created attachment 1481412 [details] auth_list_3.1.0rc10 This is the auth list output produced by the 3.1.0rc10 deployment, which doesn't have any single quote
Created attachment 1481413 [details] inventory_3.1.0rc10 This is the inventory file used with the 3.1.0rc10 deployment
https://github.com/ceph/ceph-ansible/releases/tag/v3.1.3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2819