rhel-osp-director: unable to delete a heat stack deployed with "--rhel-reg --reg-method portal --reg-org <rel-org> --reg-activation-key '<key>'", following a failed attempt to update it with "openstack overcloud update stack --templates -e <yaml> -i overcloud" Environment: openstack-heat-engine-2015.1.0-6.el7ost.noarch openstack-tripleo-heat-templates-0.8.6-46.el7ost.noarch instack-undercloud-2.1.2-23.el7ost.noarch Steps to reproduce: 1. Deploy an overcloud with "openstack overcloud deploy --templates --control-scale <num> --compute-scale <num> --ceph-storage-scale <num> -e <yaml> --compute-flavor compute --control-flavor control --ceph-storage-flavor ceph --rhel-reg --reg-method portal --reg-org <rel-org> --reg-activation-key '<key>'" 2. Attempt to update the stack with: penstack overcloud update stack --templates -e <yaml> -i overcloud" 3. If the update fails (for example not enough active subscriptions), then it becomes impossible to delete the stack. Run "heat stack-delete overcloud". Result: The deletion gets stuck/fails. Not possible to delete the stack. Expected result: The stack should get deleted.
Reproduced. heat resource-list -n5 overcloud|grep -v COMPLETE +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | parent_resource | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ | RHELUnregistrationDeployment | 4b6cb843-cda1-4529-b787-24f2eaefbda5 | OS::Heat::StructuredDeployments | DELETE_IN_PROGRESS | 2015-08-24T20:27:55Z | ExtraConfig | | 0 | 15fefbb8-882d-416a-a8c6-9744a1e5a05d | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-08-24T20:27:58Z | RHELUnregistrationDeployment | | RHELUnregistrationDeployment | 3ef23240-23a4-43f2-984a-54cc1c7f16c1 | OS::Heat::StructuredDeployments | DELETE_IN_PROGRESS | 2015-08-24T20:32:55Z | ExtraConfig | | 0 | f3c04c97-a619-42f3-8512-4ce757ee173e | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-08-24T20:32:57Z | RHELUnregistrationDeployment | | ControllerNodesPostDeployment | 2476441b-5aca-4d8e-96e0-32124486087b | OS::TripleO::ControllerPostDeployment | DELETE_IN_PROGRESS | 2015-08-25T13:15:20Z | | | ComputeNodesPostDeployment | f2a690a7-fb59-4fa1-8ca2-cdbf9242ade2 | OS::TripleO::ComputePostDeployment | DELETE_IN_PROGRESS | 2015-08-25T13:15:24Z | | | ExtraConfig | 15cb8528-a9a6-4953-8c27-ff40747cdcd4 | OS::TripleO::NodeExtraConfigPost | DELETE_IN_PROGRESS | 2015-08-25T13:15:34Z | ComputeNodesPostDeployment | | ExtraConfig | 1cb87b3a-8746-492c-8341-781b933e6965 | OS::TripleO::NodeExtraConfigPost | DELETE_IN_PROGRESS | 2015-08-25T13:16:08Z | ControllerNodesPostDeployment | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+
Keep on reproducing it: The deployment of overcloud against portal failed and I'm not able to delete the stack: heat resource-list -n5 overcloud |grep -v COMPLE +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | parent_resource | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ | ControllerNodesPostDeployment | b0d2468b-4e43-46ce-b4a7-c2ad3df7447d | OS::TripleO::ControllerPostDeployment | DELETE_IN_PROGRESS | 2015-08-25T17:19:05Z | | | ExtraConfig | 3512b75d-6331-4f15-9a2c-f9cd1d91c6a1 | OS::TripleO::NodeExtraConfigPost | DELETE_IN_PROGRESS | 2015-08-25T17:28:03Z | ControllerNodesPostDeployment | | RHELUnregistrationDeployment | dc0b5196-fe11-4907-8416-21ca5876f12c | OS::Heat::StructuredDeployments | DELETE_IN_PROGRESS | 2015-08-25T17:43:59Z | ExtraConfig | | 0 | 084c2ab9-d2f1-42ff-8669-a556836e5063 | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-08-25T17:44:01Z | RHELUnregistrationDeployment | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ Note the RHELUnregistrationDeployment...
The problem is that the deploy command is creating an extra environment file to pass on the fly, containing the registration details. Since the user never gets to see this file, there's no way for them to correctly pass this on a subsequent update, hence this inevitable failure. We decided to fix this by making the environment 'sticky' on PATCH updates (in the same way that parameters are). So the fix for bug 1257717 should resolve this issue too.
FailedQA Environment: openstack-heat-engine-2015.1.1-3.el7ost.noarch openstack-tripleo-heat-templates-0.8.6-62.el7ost.noarch openstack-heat-templates-0-0.6.20150605git.el7ost.noarch openstack-heat-api-2015.1.1-3.el7ost.noarch openstack-heat-common-2015.1.1-3.el7ost.noarch python-heatclient-0.6.0-1.el7ost.noarch openstack-heat-api-cfn-2015.1.1-3.el7ost.noarch openstack-heat-api-cloudwatch-2015.1.1-3.el7ost.noarch instack-undercloud-2.1.2-26.el7ost.noarch Still unable to delete the stack.
heat resource-list -n 5 overcloud|grep DELETE_IN_PROGRESS | ExtraConfig | 9fe21a57-c8f7-4432-be3b-a2af2246c02a | OS::TripleO::NodeExtraConfigPost | DELETE_IN_PROGRESS | 2015-09-17T18:52:41Z | ComputeNodesPostDeployment | | ExtraConfig | 7b584351-2aba-4490-b92a-ab3f238e4906 | OS::TripleO::NodeExtraConfigPost | DELETE_IN_PROGRESS | 2015-09-17T18:53:06Z | ControllerNodesPostDeployment | | RHELUnregistrationDeployment | d9e9be47-8341-4ff9-b28c-fb5bca23790d | OS::Heat::StructuredDeployments | DELETE_IN_PROGRESS | 2015-09-17T19:07:56Z | ExtraConfig | | 0 | 4124106f-45c7-4760-afe9-ff4bcdba0e5b | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-09-17T19:08:01Z | RHELUnregistrationDeployment | | RHELUnregistrationDeployment | 5985c027-c429-4332-b332-27e9e67d6145 | OS::Heat::StructuredDeployments | DELETE_IN_PROGRESS | 2015-09-17T19:13:53Z | ExtraConfig | | 1 | d8330f76-cebe-4b6b-8a39-74b5a1ca5de8 | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-09-17T19:13:58Z | RHELUnregistrationDeployment | | 0 | 8bb3776e-a40a-4e21-acb8-7f53a631a52d | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-09-17T19:13:59Z | RHELUnregistrationDeployment | | 2 | 261bd039-e7d5-4d6f-8310-f45625548f81 | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-09-17T19:13:59Z | RHELUnregistrationDeployment | | ComputeNodesPostDeployment | 259ca4b9-849e-468b-8882-307d7fc15ea2 | OS::TripleO::ComputePostDeployment | DELETE_IN_PROGRESS | 2015-09-17T20:49:15Z | | | ControllerNodesPostDeployment | e86fec51-2cb5-45a5-80d5-7eaf9499395e | OS::TripleO::ControllerPostDeployment | DELETE_IN_PROGRESS | 2015-09-17T20:49:21Z |
I wonder if explicitly passing -e <yaml> to the overcloud update command is causing the other environment that does the registration (the one not being passed again) to be overwritten. Can you try without passing any environment files to the overcloud update command? Another possibility is that the UnregistrationDeployment has a bug and will just not complete ever, and it's nothing to do with Heat at all.
Tried without providing the yaml file - failed right away openstack overcloud update stack --templates -i overcloud starting package update on stack overcloud IN_PROGRESS IN_PROGRESS FAILED update finished with status FAILED heat resource-list -n 5 overcloud|grep -v COMPLE +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+---------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | parent_resource | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+---------------------------------------------+ | ExternalSubnet | 5224619f-0b4d-4b7f-bd91-373ef54d6af1 | OS::Neutron::Subnet | DELETE_FAILED | 2015-09-18T14:27:46Z | ExternalNetwork | | TenantSubnet | 8ed62f9f-1c9c-4c53-a7a5-b43f2f1989fe | OS::Neutron::Subnet | DELETE_FAILED | 2015-09-18T14:27:47Z | TenantNetwork | | StorageSubnet | 311fcb10-fde1-4ed4-aadc-6da1d0971f6f | OS::Neutron::Subnet | DELETE_FAILED | 2015-09-18T14:27:48Z | StorageNetwork | | InternalApiSubnet | 59d79223-2df6-4263-8c7b-4884766e94ba | OS::Neutron::Subnet | DELETE_FAILED | 2015-09-18T14:27:49Z | InternalNetwork | | StorageMgmtSubnet | 521b0677-7689-4649-b81e-2ef5818e78f7 | OS::Neutron::Subnet | DELETE_FAILED | 2015-09-18T14:27:49Z | StorageMgmtNetwork | | Networks | 81780252-77df-4559-9778-651f2f7d3d30 | OS::TripleO::Network | UPDATE_FAILED | 2015-09-18T15:11:19Z | | | ExternalNetwork | 88155ba8-0c6a-483d-a59a-c9633ee6a973 | OS::TripleO::Network::External | UPDATE_FAILED | 2015-09-18T15:11:24Z | Networks | | StorageNetwork | c3cb6a9f-c2ff-49dd-a608-71cea5225236 | OS::TripleO::Network::Storage | UPDATE_FAILED | 2015-09-18T15:11:25Z | Networks | | TenantNetwork | 525ddd74-addc-46af-b9ad-31420c6e049b | OS::TripleO::Network::Tenant | UPDATE_FAILED | 2015-09-18T15:11:26Z | Networks | | InternalNetwork | 862971cb-6510-439b-8475-2e9686cc238a | OS::TripleO::Network::InternalApi | UPDATE_FAILED | 2015-09-18T15:11:27Z | Networks | | StorageMgmtNetwork | 1bf7dc27-0a96-4773-97d0-82f952471e67 | OS::TripleO::Network::StorageMgmt | UPDATE_FAILED | 2015-09-18T15:11:28Z | Networks | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+---------------------------------------------+
So I observed this issue today, and it does appear to be a heat issue, because we see the signal in the os-collect-config logs, but then no corresponding signal event exists in the heat event-list output. AFAICT the reason for this is heat can't find the resource, even though it's visible in both resource-list and deployment-show: [stack@instack ~]$ heat resource-list overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jdcj +---------------+--------------------------------------+--------------------------------+--------------------+----------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | +---------------+--------------------------------------+--------------------------------+--------------------+----------------------+ | 0 | cb3f6080-aa17-427d-aec3-bbdf167922c4 | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-09-21T08:30:22Z | +---------------+--------------------------------------+--------------------------------+--------------------+----------------------+ [stack@instack ~]$ heat resource-show overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jdcj Stack or resource not found: overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jdcj 0 [stack@instack ~]$ heat resource-signal overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jd Stack or resource not found: overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jdcj 0 [stack@instack ~]$ heat deployment-show cb3f6080-aa17-427d-aec3-bbdf167922c4 { "status": "IN_PROGRESS", "server_id": "1118af86-bfc7-42fc-8a48-355f7a9de338", "config_id": "5d6e14d6-2e93-4a67-b790-14a98aba0d09", "output_values": null, "creation_time": "2015-09-21T08:39:57Z", "updated_time": "2015-09-21T09:18:01Z", "input_values": {}, "action": "DELETE", "status_reason": "Deploy data available", "id": "cb3f6080-aa17-427d-aec3-bbdf167922c4" } Here, the resource-signal should have forced the IN_PROGRESS deployment to complete, but it can't because it's failing to find the resource - I assume the curl from the node is failing in a similar way.
It seems that in this case the problem is in multiple mapping of OS::TripleO::NodeExtraConfigPost resource: overcloud-resource-registry-puppet.yaml: OS::TripleO::NodeExtraConfigPost: extraconfig/post_deploy/default.yaml extraconfig/post_deploy/rhel-registration/rhel-registration-resource-registry.yaml: OS::TripleO::NodeExtraConfigPost: rhel-registration.yaml overcloud-resource-registry.yaml env registry file is passed to heat only when creating OC (CLI includes it dynamically in https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/overcloud_deploy.py#L372). But this file is not included on pkg update command, when only the general overcloud-resource-registry-puppet.yaml is included. Thanks to this mapping of OS::TripleO::NodeExtraConfigPost is changed from rhel-registration.yaml to extraconfig/post_deploy/default.yaml during the stack update operation which causes replacement of RHEL-reg resources. Thanks Steven Hardy who found this.
Verified: openstack-heat-common-2015.1.1-5.el7ost.noarch openstack-heat-api-cfn-2015.1.1-5.el7ost.noarch openstack-heat-api-cloudwatch-2015.1.1-5.el7ost.noarch openstack-heat-api-2015.1.1-5.el7ost.noarch openstack-heat-templates-0-0.6.20150605git.el7ost.noarch openstack-heat-engine-2015.1.1-5.el7ost.noarch openstack-tripleo-heat-templates-0.8.6-69.el7ost.noarch Was able to delete the overcloud after a failed attempt to update it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:1862