Bug 1255931
| Summary: | rhel-osp-director: rhel-osp-director: unable to delete a heat stack deployed with "--rhel-reg --reg-method portal --reg-org <rel-org> --reg-activation-key '<key>'", following a failed attempt to update it with "openstack overcloud update stack --templates | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> |
| Component: | rhosp-director | Assignee: | Zane Bitter <zbitter> |
| Status: | CLOSED ERRATA | QA Contact: | Alexander Chuzhoy <sasha> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | unspecified | CC: | jprovazn, kbasil, mburns, rhel-osp-director-maint, shardy |
| Target Milestone: | y1 | Keywords: | TestOnly, Triaged |
| Target Release: | 7.0 (Kilo) | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-heat-2015.1.1-1.el7ost python-rdomanager-oscplugin-0.0.10-6.el7ost openstack-tripleo-common-0.0.1.dev6-3.git49b57eb.el7ost | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-10-08 12:17:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1257717, 1265010 | ||
| Bug Blocks: | |||
|
Description
Alexander Chuzhoy
2015-08-22 00:45:48 UTC
Reproduced. heat resource-list -n5 overcloud|grep -v COMPLETE +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | parent_resource | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ | RHELUnregistrationDeployment | 4b6cb843-cda1-4529-b787-24f2eaefbda5 | OS::Heat::StructuredDeployments | DELETE_IN_PROGRESS | 2015-08-24T20:27:55Z | ExtraConfig | | 0 | 15fefbb8-882d-416a-a8c6-9744a1e5a05d | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-08-24T20:27:58Z | RHELUnregistrationDeployment | | RHELUnregistrationDeployment | 3ef23240-23a4-43f2-984a-54cc1c7f16c1 | OS::Heat::StructuredDeployments | DELETE_IN_PROGRESS | 2015-08-24T20:32:55Z | ExtraConfig | | 0 | f3c04c97-a619-42f3-8512-4ce757ee173e | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-08-24T20:32:57Z | RHELUnregistrationDeployment | | ControllerNodesPostDeployment | 2476441b-5aca-4d8e-96e0-32124486087b | OS::TripleO::ControllerPostDeployment | DELETE_IN_PROGRESS | 2015-08-25T13:15:20Z | | | ComputeNodesPostDeployment | f2a690a7-fb59-4fa1-8ca2-cdbf9242ade2 | OS::TripleO::ComputePostDeployment | DELETE_IN_PROGRESS | 2015-08-25T13:15:24Z | | | ExtraConfig | 15cb8528-a9a6-4953-8c27-ff40747cdcd4 | OS::TripleO::NodeExtraConfigPost | DELETE_IN_PROGRESS | 2015-08-25T13:15:34Z | ComputeNodesPostDeployment | | ExtraConfig | 1cb87b3a-8746-492c-8341-781b933e6965 | OS::TripleO::NodeExtraConfigPost | DELETE_IN_PROGRESS | 2015-08-25T13:16:08Z | ControllerNodesPostDeployment | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ Keep on reproducing it: The deployment of overcloud against portal failed and I'm not able to delete the stack: heat resource-list -n5 overcloud |grep -v COMPLE +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | parent_resource | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ | ControllerNodesPostDeployment | b0d2468b-4e43-46ce-b4a7-c2ad3df7447d | OS::TripleO::ControllerPostDeployment | DELETE_IN_PROGRESS | 2015-08-25T17:19:05Z | | | ExtraConfig | 3512b75d-6331-4f15-9a2c-f9cd1d91c6a1 | OS::TripleO::NodeExtraConfigPost | DELETE_IN_PROGRESS | 2015-08-25T17:28:03Z | ControllerNodesPostDeployment | | RHELUnregistrationDeployment | dc0b5196-fe11-4907-8416-21ca5876f12c | OS::Heat::StructuredDeployments | DELETE_IN_PROGRESS | 2015-08-25T17:43:59Z | ExtraConfig | | 0 | 084c2ab9-d2f1-42ff-8669-a556836e5063 | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-08-25T17:44:01Z | RHELUnregistrationDeployment | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ Note the RHELUnregistrationDeployment... The problem is that the deploy command is creating an extra environment file to pass on the fly, containing the registration details. Since the user never gets to see this file, there's no way for them to correctly pass this on a subsequent update, hence this inevitable failure. We decided to fix this by making the environment 'sticky' on PATCH updates (in the same way that parameters are). So the fix for bug 1257717 should resolve this issue too. FailedQA Environment: openstack-heat-engine-2015.1.1-3.el7ost.noarch openstack-tripleo-heat-templates-0.8.6-62.el7ost.noarch openstack-heat-templates-0-0.6.20150605git.el7ost.noarch openstack-heat-api-2015.1.1-3.el7ost.noarch openstack-heat-common-2015.1.1-3.el7ost.noarch python-heatclient-0.6.0-1.el7ost.noarch openstack-heat-api-cfn-2015.1.1-3.el7ost.noarch openstack-heat-api-cloudwatch-2015.1.1-3.el7ost.noarch instack-undercloud-2.1.2-26.el7ost.noarch Still unable to delete the stack. heat resource-list -n 5 overcloud|grep DELETE_IN_PROGRESS | ExtraConfig | 9fe21a57-c8f7-4432-be3b-a2af2246c02a | OS::TripleO::NodeExtraConfigPost | DELETE_IN_PROGRESS | 2015-09-17T18:52:41Z | ComputeNodesPostDeployment | | ExtraConfig | 7b584351-2aba-4490-b92a-ab3f238e4906 | OS::TripleO::NodeExtraConfigPost | DELETE_IN_PROGRESS | 2015-09-17T18:53:06Z | ControllerNodesPostDeployment | | RHELUnregistrationDeployment | d9e9be47-8341-4ff9-b28c-fb5bca23790d | OS::Heat::StructuredDeployments | DELETE_IN_PROGRESS | 2015-09-17T19:07:56Z | ExtraConfig | | 0 | 4124106f-45c7-4760-afe9-ff4bcdba0e5b | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-09-17T19:08:01Z | RHELUnregistrationDeployment | | RHELUnregistrationDeployment | 5985c027-c429-4332-b332-27e9e67d6145 | OS::Heat::StructuredDeployments | DELETE_IN_PROGRESS | 2015-09-17T19:13:53Z | ExtraConfig | | 1 | d8330f76-cebe-4b6b-8a39-74b5a1ca5de8 | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-09-17T19:13:58Z | RHELUnregistrationDeployment | | 0 | 8bb3776e-a40a-4e21-acb8-7f53a631a52d | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-09-17T19:13:59Z | RHELUnregistrationDeployment | | 2 | 261bd039-e7d5-4d6f-8310-f45625548f81 | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-09-17T19:13:59Z | RHELUnregistrationDeployment | | ComputeNodesPostDeployment | 259ca4b9-849e-468b-8882-307d7fc15ea2 | OS::TripleO::ComputePostDeployment | DELETE_IN_PROGRESS | 2015-09-17T20:49:15Z | | | ControllerNodesPostDeployment | e86fec51-2cb5-45a5-80d5-7eaf9499395e | OS::TripleO::ControllerPostDeployment | DELETE_IN_PROGRESS | 2015-09-17T20:49:21Z | I wonder if explicitly passing -e <yaml> to the overcloud update command is causing the other environment that does the registration (the one not being passed again) to be overwritten. Can you try without passing any environment files to the overcloud update command? Another possibility is that the UnregistrationDeployment has a bug and will just not complete ever, and it's nothing to do with Heat at all. Tried without providing the yaml file - failed right away openstack overcloud update stack --templates -i overcloud starting package update on stack overcloud IN_PROGRESS IN_PROGRESS FAILED update finished with status FAILED heat resource-list -n 5 overcloud|grep -v COMPLE +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+---------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | parent_resource | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+---------------------------------------------+ | ExternalSubnet | 5224619f-0b4d-4b7f-bd91-373ef54d6af1 | OS::Neutron::Subnet | DELETE_FAILED | 2015-09-18T14:27:46Z | ExternalNetwork | | TenantSubnet | 8ed62f9f-1c9c-4c53-a7a5-b43f2f1989fe | OS::Neutron::Subnet | DELETE_FAILED | 2015-09-18T14:27:47Z | TenantNetwork | | StorageSubnet | 311fcb10-fde1-4ed4-aadc-6da1d0971f6f | OS::Neutron::Subnet | DELETE_FAILED | 2015-09-18T14:27:48Z | StorageNetwork | | InternalApiSubnet | 59d79223-2df6-4263-8c7b-4884766e94ba | OS::Neutron::Subnet | DELETE_FAILED | 2015-09-18T14:27:49Z | InternalNetwork | | StorageMgmtSubnet | 521b0677-7689-4649-b81e-2ef5818e78f7 | OS::Neutron::Subnet | DELETE_FAILED | 2015-09-18T14:27:49Z | StorageMgmtNetwork | | Networks | 81780252-77df-4559-9778-651f2f7d3d30 | OS::TripleO::Network | UPDATE_FAILED | 2015-09-18T15:11:19Z | | | ExternalNetwork | 88155ba8-0c6a-483d-a59a-c9633ee6a973 | OS::TripleO::Network::External | UPDATE_FAILED | 2015-09-18T15:11:24Z | Networks | | StorageNetwork | c3cb6a9f-c2ff-49dd-a608-71cea5225236 | OS::TripleO::Network::Storage | UPDATE_FAILED | 2015-09-18T15:11:25Z | Networks | | TenantNetwork | 525ddd74-addc-46af-b9ad-31420c6e049b | OS::TripleO::Network::Tenant | UPDATE_FAILED | 2015-09-18T15:11:26Z | Networks | | InternalNetwork | 862971cb-6510-439b-8475-2e9686cc238a | OS::TripleO::Network::InternalApi | UPDATE_FAILED | 2015-09-18T15:11:27Z | Networks | | StorageMgmtNetwork | 1bf7dc27-0a96-4773-97d0-82f952471e67 | OS::TripleO::Network::StorageMgmt | UPDATE_FAILED | 2015-09-18T15:11:28Z | Networks | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+---------------------------------------------+ So I observed this issue today, and it does appear to be a heat issue, because we see the signal in the os-collect-config logs, but then no corresponding signal event exists in the heat event-list output.
AFAICT the reason for this is heat can't find the resource, even though it's visible in both resource-list and deployment-show:
[stack@instack ~]$ heat resource-list overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jdcj
+---------------+--------------------------------------+--------------------------------+--------------------+----------------------+
| resource_name | physical_resource_id | resource_type | resource_status | updated_time |
+---------------+--------------------------------------+--------------------------------+--------------------+----------------------+
| 0 | cb3f6080-aa17-427d-aec3-bbdf167922c4 | OS::Heat::StructuredDeployment | DELETE_IN_PROGRESS | 2015-09-21T08:30:22Z |
+---------------+--------------------------------------+--------------------------------+--------------------+----------------------+
[stack@instack ~]$ heat resource-show overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jdcj
Stack or resource not found: overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jdcj 0
[stack@instack ~]$ heat resource-signal overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jd
Stack or resource not found: overcloud-ControllerNodesPostDeployment-dqduzcl6unz3-ExtraConfig-gf4u4arkc75r-RHELUnregistrationDeployment-jdcj 0
[stack@instack ~]$ heat deployment-show cb3f6080-aa17-427d-aec3-bbdf167922c4
{
"status": "IN_PROGRESS",
"server_id": "1118af86-bfc7-42fc-8a48-355f7a9de338",
"config_id": "5d6e14d6-2e93-4a67-b790-14a98aba0d09",
"output_values": null,
"creation_time": "2015-09-21T08:39:57Z",
"updated_time": "2015-09-21T09:18:01Z",
"input_values": {},
"action": "DELETE",
"status_reason": "Deploy data available",
"id": "cb3f6080-aa17-427d-aec3-bbdf167922c4"
}
Here, the resource-signal should have forced the IN_PROGRESS deployment to complete, but it can't because it's failing to find the resource - I assume the curl from the node is failing in a similar way.
It seems that in this case the problem is in multiple mapping of OS::TripleO::NodeExtraConfigPost resource: overcloud-resource-registry-puppet.yaml: OS::TripleO::NodeExtraConfigPost: extraconfig/post_deploy/default.yaml extraconfig/post_deploy/rhel-registration/rhel-registration-resource-registry.yaml: OS::TripleO::NodeExtraConfigPost: rhel-registration.yaml overcloud-resource-registry.yaml env registry file is passed to heat only when creating OC (CLI includes it dynamically in https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/overcloud_deploy.py#L372). But this file is not included on pkg update command, when only the general overcloud-resource-registry-puppet.yaml is included. Thanks to this mapping of OS::TripleO::NodeExtraConfigPost is changed from rhel-registration.yaml to extraconfig/post_deploy/default.yaml during the stack update operation which causes replacement of RHEL-reg resources. Thanks Steven Hardy who found this. Verified: openstack-heat-common-2015.1.1-5.el7ost.noarch openstack-heat-api-cfn-2015.1.1-5.el7ost.noarch openstack-heat-api-cloudwatch-2015.1.1-5.el7ost.noarch openstack-heat-api-2015.1.1-5.el7ost.noarch openstack-heat-templates-0-0.6.20150605git.el7ost.noarch openstack-heat-engine-2015.1.1-5.el7ost.noarch openstack-tripleo-heat-templates-0.8.6-69.el7ost.noarch Was able to delete the overcloud after a failed attempt to update it. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:1862 |