Bug 1255931 - rhel-osp-director: rhel-osp-director: unable to delete a heat stack deployed with "--rhel-reg --reg-method portal --reg-org <rel-org> --reg-activation-key '<key>'", following a failed attempt to update it with "openstack overcloud update stack --templates
Summary: rhel-osp-director: rhel-osp-director: unable to delete a heat stack deployed ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: unspecified
Hardware: x86_64
OS: Linux
high
high
Target Milestone: y1
: 7.0 (Kilo)
Assignee: Zane Bitter
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On: 1257717 1265010
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-08-22 00:45 UTC by Alexander Chuzhoy
Modified: 2023-02-22 23:02 UTC (History)
5 users (show)

Fixed In Version: openstack-heat-2015.1.1-1.el7ost python-rdomanager-oscplugin-0.0.10-6.el7ost openstack-tripleo-common-0.0.1.dev6-3.git49b57eb.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-10-08 12:17:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1862 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux OpenStack Platform 7 director update 2015-10-08 16:05:50 UTC

Description Alexander Chuzhoy 2015-08-22 00:45:48 UTC
rhel-osp-director: unable to delete a heat stack deployed with "--rhel-reg --reg-method portal --reg-org <rel-org> --reg-activation-key '<key>'", following a failed attempt to update it with "openstack overcloud update stack --templates  -e <yaml> -i overcloud"

Environment:
openstack-heat-engine-2015.1.0-6.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-46.el7ost.noarch
instack-undercloud-2.1.2-23.el7ost.noarch



Steps to reproduce:
1. Deploy an overcloud with "openstack overcloud deploy --templates --control-scale <num> --compute-scale <num> --ceph-storage-scale <num> -e <yaml> --compute-flavor compute --control-flavor control --ceph-storage-flavor ceph --rhel-reg --reg-method portal --reg-org <rel-org> --reg-activation-key '<key>'"


2. Attempt to update the stack with: penstack overcloud update stack --templates  -e <yaml> -i overcloud"

3. If the update fails (for example not enough active subscriptions), then it becomes impossible to delete the stack. Run "heat stack-delete overcloud".

Result:
The deletion gets stuck/fails. Not possible to delete the stack.

Expected result:
The stack should get deleted.

Comment 3 Alexander Chuzhoy 2015-08-25 15:15:34 UTC
Reproduced.


heat resource-list -n5 overcloud|grep -v COMPLETE

+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+
| resource_name                               | physical_resource_id                          | resource_type                                     | resource_status    | updated_time         | parent_resource                             |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+
| RHELUnregistrationDeployment                | 4b6cb843-cda1-4529-b787-24f2eaefbda5          | OS::Heat::StructuredDeployments                   | DELETE_IN_PROGRESS | 2015-08-24T20:27:55Z | ExtraConfig                                 |
| 0                                           | 15fefbb8-882d-416a-a8c6-9744a1e5a05d          | OS::Heat::StructuredDeployment                    | DELETE_IN_PROGRESS | 2015-08-24T20:27:58Z | RHELUnregistrationDeployment                |
| RHELUnregistrationDeployment                | 3ef23240-23a4-43f2-984a-54cc1c7f16c1          | OS::Heat::StructuredDeployments                   | DELETE_IN_PROGRESS | 2015-08-24T20:32:55Z | ExtraConfig                                 |
| 0                                           | f3c04c97-a619-42f3-8512-4ce757ee173e          | OS::Heat::StructuredDeployment                    | DELETE_IN_PROGRESS | 2015-08-24T20:32:57Z | RHELUnregistrationDeployment                |
| ControllerNodesPostDeployment               | 2476441b-5aca-4d8e-96e0-32124486087b          | OS::TripleO::ControllerPostDeployment             | DELETE_IN_PROGRESS | 2015-08-25T13:15:20Z |                                             |
| ComputeNodesPostDeployment                  | f2a690a7-fb59-4fa1-8ca2-cdbf9242ade2          | OS::TripleO::ComputePostDeployment                | DELETE_IN_PROGRESS | 2015-08-25T13:15:24Z |                                             |
| ExtraConfig                                 | 15cb8528-a9a6-4953-8c27-ff40747cdcd4          | OS::TripleO::NodeExtraConfigPost                  | DELETE_IN_PROGRESS | 2015-08-25T13:15:34Z | ComputeNodesPostDeployment                  |
| ExtraConfig                                 | 1cb87b3a-8746-492c-8341-781b933e6965          | OS::TripleO::NodeExtraConfigPost                  | DELETE_IN_PROGRESS | 2015-08-25T13:16:08Z | ControllerNodesPostDeployment               |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+

Comment 4 Alexander Chuzhoy 2015-08-25 18:08:28 UTC
Keep on reproducing it:

The deployment of overcloud against portal failed and I'm not able to delete the stack:
heat resource-list -n5 overcloud |grep -v COMPLE
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+
| resource_name                               | physical_resource_id                          | resource_type                                     | resource_status    | updated_time         | parent_resource                             |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+
| ControllerNodesPostDeployment               | b0d2468b-4e43-46ce-b4a7-c2ad3df7447d          | OS::TripleO::ControllerPostDeployment             | DELETE_IN_PROGRESS | 2015-08-25T17:19:05Z |                                             |
| ExtraConfig                                 | 3512b75d-6331-4f15-9a2c-f9cd1d91c6a1          | OS::TripleO::NodeExtraConfigPost                  | DELETE_IN_PROGRESS | 2015-08-25T17:28:03Z | ControllerNodesPostDeployment               |
| RHELUnregistrationDeployment                | dc0b5196-fe11-4907-8416-21ca5876f12c          | OS::Heat::StructuredDeployments                   | DELETE_IN_PROGRESS | 2015-08-25T17:43:59Z | ExtraConfig                                 |
| 0                                           | 084c2ab9-d2f1-42ff-8669-a556836e5063          | OS::Heat::StructuredDeployment                    | DELETE_IN_PROGRESS | 2015-08-25T17:44:01Z | RHELUnregistrationDeployment                |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+



Note the RHELUnregistrationDeployment...

Comment 5 Zane Bitter 2015-09-03 16:35:00 UTC
The problem is that the deploy command is creating an extra environment file to pass on the fly, containing the registration details. Since the user never gets to see this file, there's no way for them to correctly pass this on a subsequent update, hence this inevitable failure.

We decided to fix this by making the environment 'sticky' on PATCH updates (in the same way that parameters are). So the fix for bug 1257717 should resolve this issue too.

Comment 6 Alexander Chuzhoy 2015-09-17 21:52:30 UTC
FailedQA

Environment:
openstack-heat-engine-2015.1.1-3.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-62.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-heat-api-2015.1.1-3.el7ost.noarch
openstack-heat-common-2015.1.1-3.el7ost.noarch
python-heatclient-0.6.0-1.el7ost.noarch
openstack-heat-api-cfn-2015.1.1-3.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.1-3.el7ost.noarch
instack-undercloud-2.1.2-26.el7ost.noarch



Still unable to delete the stack.

Comment 7 Alexander Chuzhoy 2015-09-17 22:09:03 UTC
heat resource-list -n 5 overcloud|grep DELETE_IN_PROGRESS
| ExtraConfig                                 | 9fe21a57-c8f7-4432-be3b-a2af2246c02a          | OS::TripleO::NodeExtraConfigPost                  | DELETE_IN_PROGRESS | 2015-09-17T18:52:41Z | ComputeNodesPostDeployment                  |
| ExtraConfig                                 | 7b584351-2aba-4490-b92a-ab3f238e4906          | OS::TripleO::NodeExtraConfigPost                  | DELETE_IN_PROGRESS | 2015-09-17T18:53:06Z | ControllerNodesPostDeployment               |
| RHELUnregistrationDeployment                | d9e9be47-8341-4ff9-b28c-fb5bca23790d          | OS::Heat::StructuredDeployments                   | DELETE_IN_PROGRESS | 2015-09-17T19:07:56Z | ExtraConfig                                 |
| 0                                           | 4124106f-45c7-4760-afe9-ff4bcdba0e5b          | OS::Heat::StructuredDeployment                    | DELETE_IN_PROGRESS | 2015-09-17T19:08:01Z | RHELUnregistrationDeployment                |
| RHELUnregistrationDeployment                | 5985c027-c429-4332-b332-27e9e67d6145          | OS::Heat::StructuredDeployments                   | DELETE_IN_PROGRESS | 2015-09-17T19:13:53Z | ExtraConfig                                 |
| 1                                           | d8330f76-cebe-4b6b-8a39-74b5a1ca5de8          | OS::Heat::StructuredDeployment                    | DELETE_IN_PROGRESS | 2015-09-17T19:13:58Z | RHELUnregistrationDeployment                |
| 0                                           | 8bb3776e-a40a-4e21-acb8-7f53a631a52d          | OS::Heat::StructuredDeployment                    | DELETE_IN_PROGRESS | 2015-09-17T19:13:59Z | RHELUnregistrationDeployment                |
| 2                                           | 261bd039-e7d5-4d6f-8310-f45625548f81          | OS::Heat::StructuredDeployment                    | DELETE_IN_PROGRESS | 2015-09-17T19:13:59Z | RHELUnregistrationDeployment                |
| ComputeNodesPostDeployment                  | 259ca4b9-849e-468b-8882-307d7fc15ea2          | OS::TripleO::ComputePostDeployment                | DELETE_IN_PROGRESS | 2015-09-17T20:49:15Z |                                             |
| ControllerNodesPostDeployment               | e86fec51-2cb5-45a5-80d5-7eaf9499395e          | OS::TripleO::ControllerPostDeployment             | DELETE_IN_PROGRESS | 2015-09-17T20:49:21Z |

Comment 8 Zane Bitter 2015-09-17 22:16:36 UTC
I wonder if explicitly passing -e <yaml> to the overcloud update command is causing the other environment that does the registration (the one not being passed again) to be overwritten. Can you try without passing any environment files to the overcloud update command?

Another possibility is that the UnregistrationDeployment has a bug and will just not complete ever, and it's nothing to do with Heat at all.

Comment 9 Alexander Chuzhoy 2015-09-18 15:14:13 UTC
Tried without providing the yaml file - failed right away
openstack overcloud update stack --templates -i overcloud                                                
starting package update on stack overcloud                                                                                     
IN_PROGRESS                                                                                                                    
IN_PROGRESS                                                                                                                    
FAILED                                                                                                                         
update finished with status FAILED    




heat resource-list -n 5 overcloud|grep -v COMPLE
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+---------------------------------------------+                                                                                                                                                                                                
| resource_name                               | physical_resource_id                          | resource_type                                     | resource_status | updated_time         | parent_resource                             |                                                                                                                                                                                                
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+---------------------------------------------+
| ExternalSubnet                              | 5224619f-0b4d-4b7f-bd91-373ef54d6af1          | OS::Neutron::Subnet                               | DELETE_FAILED   | 2015-09-18T14:27:46Z | ExternalNetwork                             |
| TenantSubnet                                | 8ed62f9f-1c9c-4c53-a7a5-b43f2f1989fe          | OS::Neutron::Subnet                               | DELETE_FAILED   | 2015-09-18T14:27:47Z | TenantNetwork                               |
| StorageSubnet                               | 311fcb10-fde1-4ed4-aadc-6da1d0971f6f          | OS::Neutron::Subnet                               | DELETE_FAILED   | 2015-09-18T14:27:48Z | StorageNetwork                              |
| InternalApiSubnet                           | 59d79223-2df6-4263-8c7b-4884766e94ba          | OS::Neutron::Subnet                               | DELETE_FAILED   | 2015-09-18T14:27:49Z | InternalNetwork                             |
| StorageMgmtSubnet                           | 521b0677-7689-4649-b81e-2ef5818e78f7          | OS::Neutron::Subnet                               | DELETE_FAILED   | 2015-09-18T14:27:49Z | StorageMgmtNetwork                          |
| Networks                                    | 81780252-77df-4559-9778-651f2f7d3d30          | OS::TripleO::Network                              | UPDATE_FAILED   | 2015-09-18T15:11:19Z |                                             |
| ExternalNetwork                             | 88155ba8-0c6a-483d-a59a-c9633ee6a973          | OS::TripleO::Network::External                    | UPDATE_FAILED   | 2015-09-18T15:11:24Z | Networks                                    |
| StorageNetwork                              | c3cb6a9f-c2ff-49dd-a608-71cea5225236          | OS::TripleO::Network::Storage                     | UPDATE_FAILED   | 2015-09-18T15:11:25Z | Networks                                    |
| TenantNetwork                               | 525ddd74-addc-46af-b9ad-31420c6e049b          | OS::TripleO::Network::Tenant                      | UPDATE_FAILED   | 2015-09-18T15:11:26Z | Networks                                    |
| InternalNetwork                             | 862971cb-6510-439b-8475-2e9686cc238a          | OS::TripleO::Network::InternalApi                 | UPDATE_FAILED   | 2015-09-18T15:11:27Z | Networks                                    |
| StorageMgmtNetwork                          | 1bf7dc27-0a96-4773-97d0-82f952471e67          | OS::TripleO::Network::StorageMgmt                 | UPDATE_FAILED   | 2015-09-18T15:11:28Z | Networks                                    |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+---------------------------------------------+

Comment 10 Steven Hardy 2015-09-21 10:13:47 UTC
So I observed this issue today, and it does appear to be a heat issue, because we see the signal in the os-collect-config logs, but then no corresponding signal event exists in the heat event-list output.

AFAICT the reason for this is heat can't find the resource, even though it's visible in both resource-list and deployment-show:

[stack@instack ~]$ heat resource-list overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jdcj
+---------------+--------------------------------------+--------------------------------+--------------------+----------------------+
| resource_name | physical_resource_id                 | resource_type                  | resource_status    | updated_time         |
+---------------+--------------------------------------+--------------------------------+--------------------+----------------------+
| 0             | cb3f6080-aa17-427d-aec3-bbdf167922c4 | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-09-21T08:30:22Z |
+---------------+--------------------------------------+--------------------------------+--------------------+----------------------+
        
[stack@instack ~]$ heat resource-show overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jdcj
Stack or resource not found: overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jdcj 0

[stack@instack ~]$ heat resource-signal overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jd
Stack or resource not found: overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jdcj 0

[stack@instack ~]$ heat deployment-show cb3f6080-aa17-427d-aec3-bbdf167922c4
{
  "status": "IN_PROGRESS", 
  "server_id": "1118af86-bfc7-42fc-8a48-355f7a9de338", 
  "config_id": "5d6e14d6-2e93-4a67-b790-14a98aba0d09", 
  "output_values": null, 
  "creation_time": "2015-09-21T08:39:57Z", 
  "updated_time": "2015-09-21T09:18:01Z", 
  "input_values": {}, 
  "action": "DELETE", 
  "status_reason": "Deploy data available", 
  "id": "cb3f6080-aa17-427d-aec3-bbdf167922c4"
}

Here, the resource-signal should have forced the IN_PROGRESS deployment to complete, but it can't because it's failing to find the resource - I assume the curl from the node is failing in a similar way.

Comment 11 Jan Provaznik 2015-09-21 13:10:23 UTC
It seems that in this case the problem is in multiple mapping of OS::TripleO::NodeExtraConfigPost resource:

overcloud-resource-registry-puppet.yaml:
  OS::TripleO::NodeExtraConfigPost: extraconfig/post_deploy/default.yaml

extraconfig/post_deploy/rhel-registration/rhel-registration-resource-registry.yaml:
  OS::TripleO::NodeExtraConfigPost: rhel-registration.yaml

overcloud-resource-registry.yaml env registry file is passed to heat only when creating OC (CLI includes it dynamically in https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/overcloud_deploy.py#L372). But this file is not included on pkg update command, when only the general overcloud-resource-registry-puppet.yaml is included. Thanks to this mapping of OS::TripleO::NodeExtraConfigPost is changed from rhel-registration.yaml to extraconfig/post_deploy/default.yaml during the stack update operation which causes replacement of RHEL-reg resources.

Thanks Steven Hardy who found this.

Comment 13 Alexander Chuzhoy 2015-09-25 22:59:35 UTC
Verified:
openstack-heat-common-2015.1.1-5.el7ost.noarch
openstack-heat-api-cfn-2015.1.1-5.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.1-5.el7ost.noarch
openstack-heat-api-2015.1.1-5.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-heat-engine-2015.1.1-5.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-69.el7ost.noarch


Was able to delete the overcloud after a failed attempt to update it.

Comment 15 errata-xmlrpc 2015-10-08 12:17:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:1862


Note You need to log in before you can comment on or make changes to this bug.