Bug 1278544 - Unrecoverable heat stack in UPDATE_FAILED
Unrecoverable heat stack in UPDATE_FAILED
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
unspecified Severity unspecified
: z3
: 7.0 (Kilo)
Assigned To: Steve Baker
Amit Ugol
: ZStream
Depends On:
Blocks: 1278975
  Show dependency treegraph
 
Reported: 2015-11-05 13:24 EST by James Slagle
Modified: 2016-04-26 10:47 EDT (History)
11 users (show)

See Also:
Fixed In Version: openstack-heat-2015.1.2-2.el7ost
Doc Type: Bug Fix
Doc Text:
After a failed stack update, Heat was ignoring the contents of the new environment when reading backed up resources; that is, those that were set aside while their replacements were being created. In particular, it was not picking up any new resource type aliases. As a consequence, if a new resource was successfully created using a new type alias in the environment before the update failed, further attempts to update the stack failed due to the inability to load a resource with an unknown type alias. With this update, backup resources are now stored with a merged combination of the old and new environments. As a result, after an update failure in this scenario, a subsequent update can now recover the stack.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-21 12:03:14 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2680 normal SHIPPED_LIVE openstack-heat bug fix advisory 2015-12-21 16:51:10 EST

  None (edit)
Description James Slagle 2015-11-05 13:24:21 EST
I'm trying updates from 7.0 to the latest poodle (to make sure I get all the fixes).

I've got workarounds applied on my 7.0 overcloud for bug 1278181, otherwise I always get stuck on bug 1278004

With that workaround applied, and all the needed environments being passed (including update.yaml with the other update workarounds), I'm at least now getting to where yum_update.sh is getting kicked off on all the nodes.

Unfortunately, I had DNS misconfigured on one of the overcloud controllers, so the update failed:

[stack@instack ~]$ heat resource-list c9163ce9-c8ba-4514-871f-e289914c43f9                                                                                                                                                                      
+---------------+--------------------------------------+-------------------------+--------------------+----------------------+
| resource_name | physical_resource_id                 | resource_type           | resource_status    | updated_time         |
+---------------+--------------------------------------+-------------------------+--------------------+----------------------+
| 1             | 7e8d9da6-cd45-4047-bbc8-e6f266c8de12 | OS::TripleO::Controller | UPDATE_IN_PROGRESS | 2015-11-05T15:37:03Z |
| 2             | 87550c9b-25bb-41a5-8946-27806bc3fa1c | OS::TripleO::Controller | UPDATE_FAILED      | 2015-11-05T15:38:06Z |
| 0             | d70d6756-61d5-4e57-8237-29f048f7d0ce | OS::TripleO::Controller | UPDATE_IN_PROGRESS | 2015-11-05T15:38:42Z |
+---------------+--------------------------------------+-------------------------+--------------------+----------------------+

yum just failed downoading metadata b/c there was no DNS, so no big surprise there.

The issue is that I can't seem to figure out how to recover this stack.

After fixing the DNS I problem, I want to try another update.

I thought I was supposed to clear the existing hooks first. I try to poll for all hooks for overcloud, and I can't:

[stack@instack ~]$ heat hook-poll -n 5 overcloud
Stack status UPDATE_FAILED not IN_PROGRESS

But I can if i poll on the individual Controller stacks (even though they're all in FAILED, only the 1 that actually failed prohibits polling):

[stack@instack ~]$ heat resource-list c9163ce9-c8ba-4514-871f-e289914c43f9
+---------------+--------------------------------------+-------------------------+-----------------+----------------------+
| resource_name | physical_resource_id                 | resource_type           | resource_status | updated_time         |
+---------------+--------------------------------------+-------------------------+-----------------+----------------------+
| 1             | 7e8d9da6-cd45-4047-bbc8-e6f266c8de12 | OS::TripleO::Controller | UPDATE_FAILED   | 2015-11-05T15:37:03Z |
| 2             | 87550c9b-25bb-41a5-8946-27806bc3fa1c | OS::TripleO::Controller | UPDATE_FAILED   | 2015-11-05T15:38:06Z |
| 0             | d70d6756-61d5-4e57-8237-29f048f7d0ce | OS::TripleO::Controller | UPDATE_FAILED   | 2015-11-05T15:38:42Z |
+---------------+--------------------------------------+-------------------------+-----------------+----------------------+
[stack@instack ~]$ heat hook-poll 7e8d9da6-cd45-4047-bbc8-e6f266c8de12
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+
| resource_name    | id                                   | resource_status_reason                         | resource_status | event_time           |
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+
| UpdateDeployment | 621b20e4-4c0b-4451-9e29-b888e67c64be | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-11-05T15:39:31Z |
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+
[stack@instack ~]$ heat hook-poll 87550c9b-25bb-41a5-8946-27806bc3fa1c
Stack status UPDATE_FAILED not IN_PROGRESS
[stack@instack ~]$ heat hook-poll d70d6756-61d5-4e57-8237-29f048f7d0ce
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+
| resource_name    | id                                   | resource_status_reason                         | resource_status | event_time           |
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+
| UpdateDeployment | 84806a45-031f-455e-9ecb-9ad2836532a7 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-11-05T15:42:52Z |
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+
[stack@instack ~]$ heat hook-clear 7e8d9da6-cd45-4047-bbc8-e6f266c8de12 UpdateDeployment
[stack@instack ~]$ heat hook-clear d70d6756-61d5-4e57-8237-29f048f7d0ce UpdateDeployment


At this point, I kick off another update, and it just stays UPDATE_IN_PROGRESS (been this way for several hours). The overcloud nodes aren't doing anything, it seems nothing new is avaialable to tell os-collect-config to restart. The client command never prompted me to clear breakpoints (I always run with -i).

Is this the right flow to recover a failed update? Did I do something wrong? Is it possible to recover my stack at all?
Comment 2 James Slagle 2015-11-05 13:30:12 EST
The controller resources never seem to go back into UPDATE_IN_PROGRESS: 

[stack@instack ~]$ heat resource-list c9163ce9-c8ba-4514-871f-e289914c43f9
+---------------+--------------------------------------+-------------------------+-----------------+----------------------+
| resource_name | physical_resource_id                 | resource_type           | resource_status | updated_time         |
+---------------+--------------------------------------+-------------------------+-----------------+----------------------+
| 1             | 7e8d9da6-cd45-4047-bbc8-e6f266c8de12 | OS::TripleO::Controller | UPDATE_FAILED   | 2015-11-05T15:37:03Z |
| 2             | 87550c9b-25bb-41a5-8946-27806bc3fa1c | OS::TripleO::Controller | UPDATE_FAILED   | 2015-11-05T15:38:06Z |
| 0             | d70d6756-61d5-4e57-8237-29f048f7d0ce | OS::TripleO::Controller | UPDATE_FAILED   | 2015-11-05T15:38:42Z |
+---------------+--------------------------------------+-------------------------+-----------------+----------------------+
Comment 3 James Slagle 2015-11-05 13:30:55 EST
Here's an event-list for one of the controllers:

[stack@instack ~]$ heat event-list 7e8d9da6-cd45-4047-bbc8-e6f266c8de12
+--------------------------------------------------+--------------------------------------+------------------------------------------------+--------------------+----------------------+
| resource_name                                    | id                                   | resource_status_reason                         | resource_status    | event_time           |
+--------------------------------------------------+--------------------------------------+------------------------------------------------+--------------------+----------------------+
| overcloud-Controller-hja56vtbtibv-1-dt6boa7xta3j | 17e78a45-521b-40f4-9fb3-11f443f01f22 | Stack CREATE started                           | CREATE_IN_PROGRESS | 2015-11-05T03:16:26Z |
| NodeUserData                                     | 53c537f2-b031-4221-a7d0-96d933ac4759 | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:16:26Z |
| UpdateConfig                                     | 15309b4f-c4f7-47d0-b7c2-b66ceba6f4fd | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:16:28Z |
| NodeUserData                                     | df94bdf8-ca4f-4a16-a949-1eb3404cc6a2 | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:16:31Z |
| Controller                                       | a7d0f6a4-9766-46ac-a849-0940378e65d6 | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:16:32Z |
| UpdateConfig                                     | ad90f69b-8e0c-4f9b-a2a8-bdb1733ca1b8 | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:16:37Z |
| Controller                                       | 8eb2cbfb-08d8-47a0-98d4-0871865bd681 | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:24:07Z |
| StorageMgmtPort                                  | e09678c5-97cf-4a91-8a7b-2cc67fb8fd52 | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:24:08Z |
| ExternalPort                                     | 51be2123-647a-456f-b1c3-30a5840feed8 | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:24:15Z |
| StoragePort                                      | 7fc925bb-7cf5-48b5-b0ea-4215d52c3a6c | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:24:19Z |
| UpdateDeployment                                 | 2826d707-8ad9-4bd0-9157-661d5cf03fc9 | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:24:29Z |
| InternalApiPort                                  | fa6ffe85-7bf3-4760-871d-4abc7dc23798 | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:24:38Z |
| TenantPort                                       | 7dc2a63e-cb1f-4870-99ae-697a9f8be223 | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:24:45Z |
| StorageMgmtPort                                  | d4913db7-44b8-44c4-912c-5b1708b533d1 | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:24:50Z |
| InternalApiPort                                  | 1e71f067-1c66-4366-a127-9fdeb0a105e7 | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:24:55Z |
| ExternalPort                                     | 2cf72ce4-f1eb-4898-8168-31ed4c49c84a | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:24:56Z |
| TenantPort                                       | 25b396f6-483c-4ca4-961f-ca8c3284cb14 | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:24:57Z |
| StoragePort                                      | 8c5c1200-ba22-426b-b537-171234064644 | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:24:58Z |
| NetworkConfig                                    | 8871608e-576e-4974-ae19-913a81cc4ab9 | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:25:01Z |
| NetIpMap                                         | f1d85348-9e6e-4382-bb6a-d64753897b89 | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:25:06Z |
| NetIpSubnetMap                                   | d910b261-aae0-4c48-8615-9ea94eeda067 | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:25:08Z |
| NetIpMap                                         | 9962587a-b5c8-4f91-baff-f6118d7fd7d6 | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:25:13Z |
| NetIpSubnetMap                                   | 34f8a077-b027-49ed-ab14-a64ac0abe9e1 | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:25:14Z |
| NetworkConfig                                    | 01bd7785-b4f5-4fd3-a801-c1e4ba02accc | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:25:14Z |
| ControllerConfig                                 | e1868780-1319-4b20-ae9c-897f85de8446 | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:25:14Z |
| NetworkDeployment                                | e1fbe77f-7b74-4826-9318-0f0833e9beab | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:25:14Z |
| ControllerConfig                                 | 033c899a-2123-49dc-ad86-b612f755ce25 | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:25:17Z |
| UpdateDeployment                                 | 8911cc70-09e1-4296-9071-ed975585faf2 | Signal: deployment succeeded                   | SIGNAL_IN_PROGRESS | 2015-11-05T03:29:26Z |
| NetworkDeployment                                | e7820c39-40a0-4a5c-b627-080c03c884fd | Signal: deployment succeeded                   | SIGNAL_IN_PROGRESS | 2015-11-05T03:29:29Z |
| UpdateDeployment                                 | cf45cec3-6b39-4305-b96e-2ea05502626d | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:29:30Z |
| NetworkDeployment                                | 6d8b44b4-38ad-4431-aef3-d11bb813d2b6 | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:29:31Z |
| ControllerDeployment                             | 7a54d409-9d5d-49e1-b43b-1922b0e9938a | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:29:33Z |
| ControllerDeployment                             | 0c834309-aac7-4e90-9d40-1e28213fb4ce | Signal: deployment succeeded                   | SIGNAL_IN_PROGRESS | 2015-11-05T03:30:26Z |
| NetworkDeployment                                | db5dfdbb-f019-42b4-bc17-51ceba9b5e59 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:30:27Z |
| ControllerDeployment                             | 3fbe851d-fa61-4d0f-8ea9-a3daf860cd92 | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:30:28Z |
| ControllerExtraConfigPre                         | befd264c-628e-4e67-b830-87967fac8d02 | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T03:30:29Z |
| ControllerExtraConfigPre                         | d9d6c72f-0811-48b7-9792-03dd11a7d97a | state changed                                  | CREATE_COMPLETE    | 2015-11-05T03:30:36Z |
| overcloud-Controller-hja56vtbtibv-1-dt6boa7xta3j | ebf2fe96-f636-4a24-bd29-750669cc16da | Stack CREATE completed successfully            | CREATE_COMPLETE    | 2015-11-05T03:30:36Z |
| ControllerDeployment                             | ecca66da-b87a-4202-a6b8-6105c7fd0062 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:31:43Z |
| NetworkDeployment                                | 7fcf443d-562f-4a24-8687-a6b1bd969101 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:31:44Z |
| ControllerDeployment                             | 2ddb77c0-5e6f-4d5b-af4f-b7f150bbe5d1 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:34:42Z |
| NetworkDeployment                                | 9ff5ebb2-b6c5-419f-b1c6-dc448da13f08 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:34:43Z |
| ControllerDeployment                             | c13aff29-09ff-436d-92c3-5623df222a15 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:37:36Z |
| NetworkDeployment                                | 4796a100-de14-4e5f-b7ba-3a41041b089a | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:37:36Z |
| ControllerDeployment                             | cd37b9d0-88ed-428a-a025-9d4fe2ac1ba6 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:39:01Z |
| NetworkDeployment                                | 477f572f-a422-41d5-8057-3277246848cc | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:39:02Z |
| ControllerDeployment                             | 07f2364f-2db6-4246-84ca-beb51b03ca56 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:42:28Z |
| NetworkDeployment                                | 44cad5bb-40f8-4911-a4fc-997ad13acc8b | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:42:30Z |
| ControllerDeployment                             | 23125460-37a0-42c5-ab2f-7d185ff8e491 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:45:37Z |
| NetworkDeployment                                | 50408a12-c1c8-4fdf-a273-5d5026f52a54 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:45:38Z |
| ControllerDeployment                             | a7fac72c-cae6-437f-8b5a-aa667be56df8 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:48:44Z |
| NetworkDeployment                                | 211d2bbf-792e-44c5-88f4-8b470e0e8df2 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:48:44Z |
| ControllerDeployment                             | 542686b7-0961-414a-9249-a8c8c1aac12d | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:52:39Z |
| NetworkDeployment                                | fc53abc0-f7b7-4830-b7d4-4999fdbd0e93 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T03:52:39Z |
| overcloud-Controller-hja56vtbtibv-1-dt6boa7xta3j | ad5eb07f-77a3-49d7-8648-6eecbdee078d | Stack UPDATE started                           | UPDATE_IN_PROGRESS | 2015-11-05T15:38:06Z |
| NodeUserData                                     | 8cf6ff67-ddbf-46fe-bb2e-844e7de8cb36 | state changed                                  | UPDATE_IN_PROGRESS | 2015-11-05T15:38:13Z |
| UpdateConfig                                     | 64d3556e-1255-4afc-97f2-24ea104b3de3 | state changed                                  | UPDATE_IN_PROGRESS | 2015-11-05T15:38:16Z |
| NodeUserData                                     | 6d3dc8e5-f77d-43ec-84dc-5e6c80d48310 | state changed                                  | UPDATE_COMPLETE    | 2015-11-05T15:38:25Z |
| StorageMgmtPort                                  | 3727d758-717a-467a-8130-47f208c3b1c2 | state changed                                  | UPDATE_IN_PROGRESS | 2015-11-05T15:38:27Z |
| StoragePort                                      | 171dd65a-424b-40ad-ad79-0f3ae7c65738 | state changed                                  | UPDATE_IN_PROGRESS | 2015-11-05T15:38:29Z |
| InternalApiPort                                  | 7a026c8c-4a8e-419c-9bce-ee02e43b6c35 | state changed                                  | UPDATE_IN_PROGRESS | 2015-11-05T15:38:35Z |
| ExternalPort                                     | 3337ec1d-2533-4c89-8fce-bac27f252631 | state changed                                  | UPDATE_IN_PROGRESS | 2015-11-05T15:38:43Z |
| TenantPort                                       | 328127d9-4a96-4a3c-a0c5-e763854cb412 | state changed                                  | UPDATE_IN_PROGRESS | 2015-11-05T15:38:56Z |
| StorageMgmtPort                                  | 84922ac0-836b-48bb-b01d-72304e4f2069 | state changed                                  | UPDATE_COMPLETE    | 2015-11-05T15:39:12Z |
| UpdateConfig                                     | 9bef6715-5703-4ea6-97f1-738af23c35de | state changed                                  | UPDATE_COMPLETE    | 2015-11-05T15:39:12Z |
| ExternalPort                                     | 8e27061a-dd0c-4c96-b5b0-8610f8a67710 | state changed                                  | UPDATE_COMPLETE    | 2015-11-05T15:39:13Z |
| StoragePort                                      | 5dded725-3fde-4791-a517-76b88ff44763 | state changed                                  | UPDATE_COMPLETE    | 2015-11-05T15:39:14Z |
| InternalApiPort                                  | 8d05c966-3449-4d99-9fc1-da33736cbb59 | state changed                                  | UPDATE_COMPLETE    | 2015-11-05T15:39:15Z |
| TenantPort                                       | 8c56d989-4d98-4be6-bf53-3325ef5784a5 | state changed                                  | UPDATE_COMPLETE    | 2015-11-05T15:39:16Z |
| NetIpSubnetMap                                   | 1b48901e-6889-442d-9075-f77addbf8e65 | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T15:39:19Z |
| UpdateDeployment                                 | 621b20e4-4c0b-4451-9e29-b888e67c64be | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE    | 2015-11-05T15:39:31Z |
| NetIpMap                                         | ce71b163-dbe5-4c59-bb03-8f94755011f3 | state changed                                  | UPDATE_IN_PROGRESS | 2015-11-05T15:39:33Z |
| NetworkConfig                                    | afd816cb-ad23-4948-b3d8-2e91bbaad304 | state changed                                  | UPDATE_IN_PROGRESS | 2015-11-05T15:39:42Z |
| NetIpSubnetMap                                   | d78bf9f4-e002-4e33-a1bf-60d52c79baaf | state changed                                  | CREATE_COMPLETE    | 2015-11-05T15:39:51Z |
| NetIpMap                                         | 502eb3f4-4848-47c6-939e-9e840b8a84ee | state changed                                  | UPDATE_COMPLETE    | 2015-11-05T15:39:52Z |
| ControllerConfig                                 | b33d29bd-fa6a-4e69-b6e5-58e5c2f59f29 | state changed                                  | UPDATE_IN_PROGRESS | 2015-11-05T15:39:54Z |
| ControllerConfig                                 | 8c0f795a-4c6d-4908-851a-29bfe3fea1fb | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T15:39:56Z |
| ControllerConfig                                 | 1842aca0-7a6c-4270-ad2a-4396043e553b | state changed                                  | CREATE_COMPLETE    | 2015-11-05T15:40:00Z |
| NetworkConfig                                    | 04b2330e-43fe-4a0b-82c3-8d18bffabe1d | state changed                                  | UPDATE_COMPLETE    | 2015-11-05T15:40:05Z |
| NetworkDeployment                                | 956f8eda-bb13-47f0-96d7-ffd503f4f1a2 | state changed                                  | UPDATE_IN_PROGRESS | 2015-11-05T15:40:07Z |
| ControllerDeployment                             | bf13eacb-2b5b-4780-a6a5-cd9c5fe78a67 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T15:42:33Z |
| NetworkDeployment                                | e26f2a08-34c8-4d04-9efd-92e6ea96adb4 | Signal: deployment succeeded                   | SIGNAL_IN_PROGRESS | 2015-11-05T15:42:34Z |
| NetworkDeployment                                | c2eb6e5f-3b3a-4eb9-bf04-6369292b09e8 | state changed                                  | UPDATE_COMPLETE    | 2015-11-05T15:42:35Z |
| UpdateDeployment                                 | 929c7e72-ec34-465c-886f-0b20f480cabd | Hook pre-update is cleared                     | CREATE_COMPLETE    | 2015-11-05T17:00:43Z |
| UpdateDeployment                                 | 912316dd-38e3-434c-8e84-9399b0705fe4 | state changed                                  | UPDATE_IN_PROGRESS | 2015-11-05T17:00:44Z |
| UpdateDeployment                                 | c5fc528d-90f5-44fb-903a-8e36ef20658b | Signal: deployment succeeded                   | SIGNAL_IN_PROGRESS | 2015-11-05T17:14:25Z |
| UpdateDeployment                                 | 0e32df7d-7363-41c6-a399-9e875fe2158c | state changed                                  | UPDATE_COMPLETE    | 2015-11-05T17:14:25Z |
| ControllerDeployment                             | 02dcfb86-26e6-4875-bfc8-d3d1d5e6f17c | state changed                                  | UPDATE_IN_PROGRESS | 2015-11-05T17:14:25Z |
| ControllerDeployment                             | 6d98b01e-0fbc-4f68-b9fd-8abc69818dbf | Signal: deployment succeeded                   | SIGNAL_IN_PROGRESS | 2015-11-05T17:14:28Z |
| NetworkDeployment                                | 868fb6f1-09ac-49b0-9056-3dad1b055bb2 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T17:14:28Z |
| ControllerDeployment                             | ceaddd04-fa76-426d-8193-f43b59a427dc | state changed                                  | UPDATE_COMPLETE    | 2015-11-05T17:14:29Z |
| ControllerExtraConfigPre                         | 2f58f504-22f9-4af8-a5df-1a9578eb2ed3 | state changed                                  | UPDATE_IN_PROGRESS | 2015-11-05T17:14:29Z |
| ControllerExtraConfigPre                         | 91c58a9e-a163-4b48-bbcc-745a6baeb5f5 | state changed                                  | UPDATE_COMPLETE    | 2015-11-05T17:14:32Z |
| NodeExtraConfig                                  | 9f8d9703-a951-4b1b-865a-5ef53e8fa71b | state changed                                  | CREATE_IN_PROGRESS | 2015-11-05T17:14:32Z |
| NodeExtraConfig                                  | 46746abe-cace-4d5c-9996-3acd15aecd5b | state changed                                  | CREATE_COMPLETE    | 2015-11-05T17:14:35Z |
| NetworkDeployment                                | 94abc64c-b24b-45e4-8c9d-807d83569894 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T17:15:11Z |
| ControllerDeployment                             | 962d8bf0-3809-46ee-a7f9-3ed95e6d4a37 | Unknown                                        | SIGNAL_COMPLETE    | 2015-11-05T17:15:11Z |
+--------------------------------------------------+--------------------------------------+------------------------------------------------+--------------------+----------------------+
Comment 4 James Slagle 2015-11-05 13:32:20 EST
No new breakpoints ever get added:

[stack@instack ~]$ heat hook-poll -n 5 overcloud                                                                                                                                                                                 
+----+------------------------+-----------------+------------+------------+
| id | resource_status_reason | resource_status | event_time | stack_name |
+----+------------------------+-----------------+------------+------------+
+----+------------------------+-----------------+------------+------------+
Comment 5 James Slagle 2015-11-05 13:34:00 EST
I've uploaded the heat logs here: http://file.rdu.redhat.com/~jslagle/bug-1278544/
Comment 6 James Slagle 2015-11-05 13:44:47 EST
note that the output from the client is just a seemingly infinite list of "IN_PROGRESS"

It's weird, b/c it appears nothing is in progress:

[stack@instack ~]$ heat stack-list
heat +--------------------------------------+------------+--------------------+----------------------+
| id                                   | stack_name | stack_status       | creation_time        |
+--------------------------------------+------------+--------------------+----------------------+
| d8eae9e6-c64e-4ce6-aef7-244979bfc0f1 | overcloud  | UPDATE_IN_PROGRESS | 2015-11-05T03:15:41Z |
+--------------------------------------+------------+--------------------+----------------------+
[stack@instack ~]$ heat resource-list overcloud
+-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+
| resource_name                     | physical_resource_id                          | resource_type                                     | resource_status | updated_time         |
+-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+
| BlockStorageAllNodesDeployment    | c7586f0e-4c6f-40e3-b20a-dc92298589cc          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| BlockStorageNodesPostDeployment   | d37e0e34-9654-4543-b286-2668fec896a4          | OS::TripleO::BlockStoragePostDeployment           | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| CephClusterConfig                 | 33e939b1-54a6-48ff-a235-18459c6ec36c          | OS::TripleO::CephClusterConfig::SoftwareConfig    | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| CephStorageAllNodesDeployment     | ea28a95a-aaa8-4b4f-b1b0-0d4b6a7d4737          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| CephStorageCephDeployment         | 58f40370-606b-4ec8-be06-070b24373778          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| CephStorageNodesPostDeployment    | 5224cd58-9723-4679-8185-d06215a7f3d7          | OS::TripleO::CephStoragePostDeployment            | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ComputeAllNodesDeployment         | 9a86b367-f991-47dd-bc0f-9b857d0bfdcd          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ComputeCephDeployment             | e6bf8562-b529-4123-bdb0-ac10a63fbba9          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ComputeNodesPostDeployment        | 22db48bb-6dc2-4c4c-ad3d-58e299fc3500          | OS::TripleO::ComputePostDeployment                | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ControllerAllNodesDeployment      | 443ac7d7-837e-401b-ae14-045bddc08e86          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ControllerBootstrapNodeConfig     | 4163b982-a79f-416e-b123-afacddba8e27          | OS::TripleO::BootstrapNode::SoftwareConfig        | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ControllerBootstrapNodeDeployment | 87534065-3b51-461c-8ddf-8148dfe2f198          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ControllerCephDeployment          | e787564e-fa56-4e7c-8f5e-d9a9304132fa          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ControllerClusterConfig           | f3dc9941-ce27-4713-955c-1dca9ced8d23          | OS::Heat::StructuredConfig                        | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ControllerClusterDeployment       | 853073c9-f694-4592-bd87-c7bdfee95d96          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ControllerIpListMap               | 85630f7d-c4cf-4d37-9bd4-af9234b91737          | OS::TripleO::Network::Ports::NetIpListMap         | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ControllerNodesPostDeployment     | afe25c74-a231-4df6-bb61-c84697d8277d          | OS::TripleO::ControllerPostDeployment             | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ControllerSwiftDeployment         | dc3bbe28-6a44-462e-b0e6-f8773e96fe04          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| HeatAuthEncryptionKey             | overcloud-HeatAuthEncryptionKey-6ljv5gid5hxi  | OS::Heat::RandomString                            | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| HorizonSecret                     | overcloud-HorizonSecret-6x5wuqwd6dq6          | OS::Heat::RandomString                            | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| MysqlClusterUniquePart            | overcloud-MysqlClusterUniquePart-kesbckxdev67 | OS::Heat::RandomString                            | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| MysqlRootPassword                 | overcloud-MysqlRootPassword-e3olccywlv67      | OS::Heat::RandomString                            | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ObjectStorageAllNodesDeployment   | 7c2529c8-f4f3-4e41-869e-b3e3af936829          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ObjectStorageNodesPostDeployment  | a15a83fe-91fe-4757-bc2a-31846c48bcdd          | OS::TripleO::ObjectStoragePostDeployment          | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| ObjectStorageSwiftDeployment      | 5a09f483-79ce-4454-bbe1-52f5a3017bed          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| PcsdPassword                      | overcloud-PcsdPassword-otnh56wmmlyo           | OS::Heat::RandomString                            | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| RabbitCookie                      | overcloud-RabbitCookie-suxlvfl5cd3m           | OS::Heat::RandomString                            | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| SwiftDevicesAndProxyConfig        | e49da330-9376-4046-bd8e-116b43d7a2b1          | OS::TripleO::SwiftDevicesAndProxy::SoftwareConfig | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| VipDeployment                     | 53768f74-70c6-40e5-ab27-ec374616edb1          | OS::Heat::StructuredDeployments                   | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| allNodesConfig                    | e95a6953-b590-49a4-a993-d5fbfc0efd8c          | OS::TripleO::AllNodes::SoftwareConfig             | CREATE_COMPLETE | 2015-11-05T03:15:42Z |
| Controller                        | c9163ce9-c8ba-4514-871f-e289914c43f9          | OS::Heat::ResourceGroup                           | UPDATE_FAILED   | 2015-11-05T15:36:57Z |
| BlockStorage                      | 7075b578-1939-46d0-935d-3dbf6969673f          | OS::Heat::ResourceGroup                           | UPDATE_COMPLETE | 2015-11-05T15:37:03Z |
| VipConfig                         | 58d0557a-7068-4221-bf6a-6e5f534df6c4          | OS::TripleO::VipConfig                            | UPDATE_COMPLETE | 2015-11-05T16:26:10Z |
| Networks                          | 8976fd6f-06a7-4ff6-8d0f-657563160a5d          | OS::TripleO::Network                              | UPDATE_COMPLETE | 2015-11-05T16:26:11Z |
| ControlVirtualIP                  | 6b1b4ac1-0dd3-40f6-a7ac-cf6333983092          | OS::TripleO::Network::Ports::CtlplaneVipPort      | UPDATE_COMPLETE | 2015-11-05T16:26:41Z |
| ObjectStorage                     | 6d3259eb-93fe-4932-b652-1b69eb589e43          | OS::Heat::ResourceGroup                           | UPDATE_COMPLETE | 2015-11-05T16:26:46Z |
| CephStorage                       | c5eaa74d-9a34-4fc8-8490-1a33bb4e3704          | OS::Heat::ResourceGroup                           | UPDATE_COMPLETE | 2015-11-05T16:26:48Z |
| StorageMgmtVirtualIP              | 9875176b-f7f7-4765-9913-ad6b3fb751a0          | OS::TripleO::Network::Ports::StorageMgmtVipPort   | UPDATE_COMPLETE | 2015-11-05T16:26:53Z |
| RedisVirtualIP                    | 3d9d95c3-d7e3-4cc0-8891-9f131ddde6ad          | OS::TripleO::Network::Ports::RedisVipPort         | UPDATE_COMPLETE | 2015-11-05T16:26:55Z |
| PublicVirtualIP                   | 109a58f9-2ce4-4f33-8172-5b442fee7dfb          | OS::TripleO::Network::Ports::ExternalVipPort      | UPDATE_COMPLETE | 2015-11-05T16:27:02Z |
| StorageVirtualIP                  | c08bfd65-9831-465b-a7a7-a43a330f2536          | OS::TripleO::Network::Ports::StorageVipPort       | UPDATE_COMPLETE | 2015-11-05T16:27:09Z |
| InternalApiVirtualIP              | a6dffc7e-59a5-4594-91c7-c4c7bd8c64a8          | OS::TripleO::Network::Ports::InternalApiVipPort   | UPDATE_COMPLETE | 2015-11-05T16:27:12Z |
| VipMap                            | 7868bf41-836a-4508-acfd-291e8caede35          | OS::TripleO::Network::Ports::NetVipMap            | UPDATE_COMPLETE | 2015-11-05T16:27:16Z |
| Compute                           | eb7dcb92-07d4-4f2e-8839-4f24c8bf7b53          | OS::Heat::ResourceGroup                           | UPDATE_FAILED   | 2015-11-05T16:27:20Z |
+-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+
Comment 7 James Slagle 2015-11-05 13:45:50 EST
Even doing a heat resource-list -n 10 overcloud, nothing shows IN_PROGRESS
Comment 8 James Slagle 2015-11-05 14:01:03 EST
i tailed the heat-engine.log for a minute or so and saved that to a file:
http://file.rdu.redhat.com/~jslagle/bug-1278544/heat-engine-tailed.log

it's just the same pattern over and over again. I wonder if it's stuck in an infinite loop?
Comment 9 Zane Bitter 2015-11-05 14:28:11 EST
If a stack gets stuck in IN_PROGRESS, it's almost certainly because there was an uncaught exception (https://bugs.launchpad.net/heat/+bug/1492433), and because it's uncaught it's also not logged in heat-engine.log (https://bugs.launchpad.net/heat/+bug/1492427). You should be able to find the traceback in the journal (thanks systemd!), and from there we can diagnose the bug.
Comment 10 Zane Bitter 2015-11-05 14:30:05 EST
BTW the workaround would be to restart heat-engine and then use the workaround for bug 1267364. That will get your stacks back to the FAILED state so that they're not stuck IN_PROGRESS any more.
Comment 11 Steve Baker 2015-11-05 15:28:21 EST
When you say you're upgrading from 7.0 to the latest poodle, did you start by upgrading the undercloud to latest poodle first?
Comment 12 James Slagle 2015-11-06 09:01:46 EST
(In reply to Steve Baker from comment #11)
> When you say you're upgrading from 7.0 to the latest poodle, did you start
> by upgrading the undercloud to latest poodle first?

Yes, first step in updating from 7.0 is to update the undercloud, and make sure services are restarted (should happen automatically via package updates).

I didn't have the newest heat build though with the 2 patches bug 1267364, looks like that was done yesterday just shortly after i had updated the undercloud ;).

I updated to those today, and restarted heat-engine. The stack is still stuck in UPDATE_IN_PROGRESS. Is that still expected?

Is wasn't clear to me from the bug if the 2 patches remove the need for the manual db sql workaround, or if I still have to do that.
Comment 13 James Slagle 2015-11-06 09:03:04 EST
(In reply to Zane Bitter from comment #9)
> If a stack gets stuck in IN_PROGRESS, it's almost certainly because there
> was an uncaught exception (https://bugs.launchpad.net/heat/+bug/1492433),
> and because it's uncaught it's also not logged in heat-engine.log
> (https://bugs.launchpad.net/heat/+bug/1492427). You should be able to find
> the traceback in the journal (thanks systemd!), and from there we can
> diagnose the bug.

it looks like the journal got rotated, and we weren't saving old journal files (no /var/log/journal) in 7.0.

If this happens again, what unit should I look at to see the traceback? Would it be openstack-heat-engine?
Comment 14 James Slagle 2015-11-06 10:08:57 EST
using the newest heat build with the 2 patches from https://bugzilla.redhat.com/show_bug.cgi?id=1267364 and restarting heat-engine, this issue still remains.

zaneb indicated the manual sql workaround shouldnt be needed anymore. so there must be something else going on.
Comment 15 Zane Bitter 2015-11-06 10:27:39 EST
> I updated to those today, and restarted heat-engine. The stack is still stuck in UPDATE_IN_PROGRESS. Is that still expected?

No, not expected. At startup, Heat goes through all stacks that are IN_PROGRESS and tries to break their locks (i.e. we ping the engine that owns the lock, and if it doesn't reply we steal it) and move them to FAILED.

This was broken for nested stacks, and didn't move the member resources to FAILED, and that's what the patches for bug 1267364 fixed.

> what unit should I look at to see the traceback? Would it be openstack-heat-engine?

Yes.
Comment 16 James Slagle 2015-11-06 17:10:09 EST
i've reproduced this now and captured the actual Heat exception in a new bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1278975

given the traceback there, my initial suspicion is that python-rdomanager-oscplugin did something wrong (didn't send the environment I asked it to, or something else).

Still, I think this existing bug is valid. We ought to be able to recover the stack somehow in such a situation. Even if we have to bounce the heat-engine service.
Comment 17 James Slagle 2015-11-06 17:19:04 EST
i tried the sql from https://bugzilla.redhat.com/show_bug.cgi?id=1267364 :

UPDATE stack SET status="FAILED" WHERE status="IN_PROGRESS" AND action="UPDATE";
UPDATE resource SET status="FAILED" WHERE status="IN_PROGRESS" AND action="UPDATE";


obviously my stack is in UPDATE_FAILED now :). going to see if i can figure out what causes https://bugzilla.redhat.com/show_bug.cgi?id=1278975 now
Comment 19 Jiri Stransky 2015-11-11 10:00:10 EST
Are there any possible workarounds? I also have a stack in progress, but no resources in progress and no hooks waiting to be cleared.

[stack@instack ~]$ heat resource-list overcloud -n10 | grep PROG
[stack@instack ~]$ heat hook-poll overcloud -n10
+----+------------------------+-----------------+------------+------------+
| id | resource_status_reason | resource_status | event_time | stack_name |
+----+------------------------+-----------------+------------+------------+
+----+------------------------+-----------------+------------+------------+
[stack@instack ~]$ heat stack-list
+--------------------------------------+------------+--------------------+----------------------+
| id                                   | stack_name | stack_status       | creation_time        |
+--------------------------------------+------------+--------------------+----------------------+
| 6de1d453-b12d-421f-96be-d42b8bf93f5a | overcloud  | UPDATE_IN_PROGRESS | 2015-11-11T12:42:36Z |
+--------------------------------------+------------+--------------------+----------------------+
Comment 21 Amit Ugol 2015-11-23 11:38:14 EST
Works better now.
Comment 27 errata-xmlrpc 2015-12-21 12:03:14 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2680

Note You need to log in before you can comment on or make changes to this bug.