I'm trying updates from 7.0 to the latest poodle (to make sure I get all the fixes). I've got workarounds applied on my 7.0 overcloud for bug 1278181, otherwise I always get stuck on bug 1278004 With that workaround applied, and all the needed environments being passed (including update.yaml with the other update workarounds), I'm at least now getting to where yum_update.sh is getting kicked off on all the nodes. Unfortunately, I had DNS misconfigured on one of the overcloud controllers, so the update failed: [stack@instack ~]$ heat resource-list c9163ce9-c8ba-4514-871f-e289914c43f9 +---------------+--------------------------------------+-------------------------+--------------------+----------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | +---------------+--------------------------------------+-------------------------+--------------------+----------------------+ | 1 | 7e8d9da6-cd45-4047-bbc8-e6f266c8de12 | OS::TripleO::Controller | UPDATE_IN_PROGRESS | 2015-11-05T15:37:03Z | | 2 | 87550c9b-25bb-41a5-8946-27806bc3fa1c | OS::TripleO::Controller | UPDATE_FAILED | 2015-11-05T15:38:06Z | | 0 | d70d6756-61d5-4e57-8237-29f048f7d0ce | OS::TripleO::Controller | UPDATE_IN_PROGRESS | 2015-11-05T15:38:42Z | +---------------+--------------------------------------+-------------------------+--------------------+----------------------+ yum just failed downoading metadata b/c there was no DNS, so no big surprise there. The issue is that I can't seem to figure out how to recover this stack. After fixing the DNS I problem, I want to try another update. I thought I was supposed to clear the existing hooks first. I try to poll for all hooks for overcloud, and I can't: [stack@instack ~]$ heat hook-poll -n 5 overcloud Stack status UPDATE_FAILED not IN_PROGRESS But I can if i poll on the individual Controller stacks (even though they're all in FAILED, only the 1 that actually failed prohibits polling): [stack@instack ~]$ heat resource-list c9163ce9-c8ba-4514-871f-e289914c43f9 +---------------+--------------------------------------+-------------------------+-----------------+----------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | +---------------+--------------------------------------+-------------------------+-----------------+----------------------+ | 1 | 7e8d9da6-cd45-4047-bbc8-e6f266c8de12 | OS::TripleO::Controller | UPDATE_FAILED | 2015-11-05T15:37:03Z | | 2 | 87550c9b-25bb-41a5-8946-27806bc3fa1c | OS::TripleO::Controller | UPDATE_FAILED | 2015-11-05T15:38:06Z | | 0 | d70d6756-61d5-4e57-8237-29f048f7d0ce | OS::TripleO::Controller | UPDATE_FAILED | 2015-11-05T15:38:42Z | +---------------+--------------------------------------+-------------------------+-----------------+----------------------+ [stack@instack ~]$ heat hook-poll 7e8d9da6-cd45-4047-bbc8-e6f266c8de12 +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+ | resource_name | id | resource_status_reason | resource_status | event_time | +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+ | UpdateDeployment | 621b20e4-4c0b-4451-9e29-b888e67c64be | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-11-05T15:39:31Z | +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+ [stack@instack ~]$ heat hook-poll 87550c9b-25bb-41a5-8946-27806bc3fa1c Stack status UPDATE_FAILED not IN_PROGRESS [stack@instack ~]$ heat hook-poll d70d6756-61d5-4e57-8237-29f048f7d0ce +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+ | resource_name | id | resource_status_reason | resource_status | event_time | +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+ | UpdateDeployment | 84806a45-031f-455e-9ecb-9ad2836532a7 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-11-05T15:42:52Z | +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+ [stack@instack ~]$ heat hook-clear 7e8d9da6-cd45-4047-bbc8-e6f266c8de12 UpdateDeployment [stack@instack ~]$ heat hook-clear d70d6756-61d5-4e57-8237-29f048f7d0ce UpdateDeployment At this point, I kick off another update, and it just stays UPDATE_IN_PROGRESS (been this way for several hours). The overcloud nodes aren't doing anything, it seems nothing new is avaialable to tell os-collect-config to restart. The client command never prompted me to clear breakpoints (I always run with -i). Is this the right flow to recover a failed update? Did I do something wrong? Is it possible to recover my stack at all?
The controller resources never seem to go back into UPDATE_IN_PROGRESS: [stack@instack ~]$ heat resource-list c9163ce9-c8ba-4514-871f-e289914c43f9 +---------------+--------------------------------------+-------------------------+-----------------+----------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | +---------------+--------------------------------------+-------------------------+-----------------+----------------------+ | 1 | 7e8d9da6-cd45-4047-bbc8-e6f266c8de12 | OS::TripleO::Controller | UPDATE_FAILED | 2015-11-05T15:37:03Z | | 2 | 87550c9b-25bb-41a5-8946-27806bc3fa1c | OS::TripleO::Controller | UPDATE_FAILED | 2015-11-05T15:38:06Z | | 0 | d70d6756-61d5-4e57-8237-29f048f7d0ce | OS::TripleO::Controller | UPDATE_FAILED | 2015-11-05T15:38:42Z | +---------------+--------------------------------------+-------------------------+-----------------+----------------------+
Here's an event-list for one of the controllers: [stack@instack ~]$ heat event-list 7e8d9da6-cd45-4047-bbc8-e6f266c8de12 +--------------------------------------------------+--------------------------------------+------------------------------------------------+--------------------+----------------------+ | resource_name | id | resource_status_reason | resource_status | event_time | +--------------------------------------------------+--------------------------------------+------------------------------------------------+--------------------+----------------------+ | overcloud-Controller-hja56vtbtibv-1-dt6boa7xta3j | 17e78a45-521b-40f4-9fb3-11f443f01f22 | Stack CREATE started | CREATE_IN_PROGRESS | 2015-11-05T03:16:26Z | | NodeUserData | 53c537f2-b031-4221-a7d0-96d933ac4759 | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:16:26Z | | UpdateConfig | 15309b4f-c4f7-47d0-b7c2-b66ceba6f4fd | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:16:28Z | | NodeUserData | df94bdf8-ca4f-4a16-a949-1eb3404cc6a2 | state changed | CREATE_COMPLETE | 2015-11-05T03:16:31Z | | Controller | a7d0f6a4-9766-46ac-a849-0940378e65d6 | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:16:32Z | | UpdateConfig | ad90f69b-8e0c-4f9b-a2a8-bdb1733ca1b8 | state changed | CREATE_COMPLETE | 2015-11-05T03:16:37Z | | Controller | 8eb2cbfb-08d8-47a0-98d4-0871865bd681 | state changed | CREATE_COMPLETE | 2015-11-05T03:24:07Z | | StorageMgmtPort | e09678c5-97cf-4a91-8a7b-2cc67fb8fd52 | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:24:08Z | | ExternalPort | 51be2123-647a-456f-b1c3-30a5840feed8 | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:24:15Z | | StoragePort | 7fc925bb-7cf5-48b5-b0ea-4215d52c3a6c | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:24:19Z | | UpdateDeployment | 2826d707-8ad9-4bd0-9157-661d5cf03fc9 | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:24:29Z | | InternalApiPort | fa6ffe85-7bf3-4760-871d-4abc7dc23798 | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:24:38Z | | TenantPort | 7dc2a63e-cb1f-4870-99ae-697a9f8be223 | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:24:45Z | | StorageMgmtPort | d4913db7-44b8-44c4-912c-5b1708b533d1 | state changed | CREATE_COMPLETE | 2015-11-05T03:24:50Z | | InternalApiPort | 1e71f067-1c66-4366-a127-9fdeb0a105e7 | state changed | CREATE_COMPLETE | 2015-11-05T03:24:55Z | | ExternalPort | 2cf72ce4-f1eb-4898-8168-31ed4c49c84a | state changed | CREATE_COMPLETE | 2015-11-05T03:24:56Z | | TenantPort | 25b396f6-483c-4ca4-961f-ca8c3284cb14 | state changed | CREATE_COMPLETE | 2015-11-05T03:24:57Z | | StoragePort | 8c5c1200-ba22-426b-b537-171234064644 | state changed | CREATE_COMPLETE | 2015-11-05T03:24:58Z | | NetworkConfig | 8871608e-576e-4974-ae19-913a81cc4ab9 | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:25:01Z | | NetIpMap | f1d85348-9e6e-4382-bb6a-d64753897b89 | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:25:06Z | | NetIpSubnetMap | d910b261-aae0-4c48-8615-9ea94eeda067 | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:25:08Z | | NetIpMap | 9962587a-b5c8-4f91-baff-f6118d7fd7d6 | state changed | CREATE_COMPLETE | 2015-11-05T03:25:13Z | | NetIpSubnetMap | 34f8a077-b027-49ed-ab14-a64ac0abe9e1 | state changed | CREATE_COMPLETE | 2015-11-05T03:25:14Z | | NetworkConfig | 01bd7785-b4f5-4fd3-a801-c1e4ba02accc | state changed | CREATE_COMPLETE | 2015-11-05T03:25:14Z | | ControllerConfig | e1868780-1319-4b20-ae9c-897f85de8446 | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:25:14Z | | NetworkDeployment | e1fbe77f-7b74-4826-9318-0f0833e9beab | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:25:14Z | | ControllerConfig | 033c899a-2123-49dc-ad86-b612f755ce25 | state changed | CREATE_COMPLETE | 2015-11-05T03:25:17Z | | UpdateDeployment | 8911cc70-09e1-4296-9071-ed975585faf2 | Signal: deployment succeeded | SIGNAL_IN_PROGRESS | 2015-11-05T03:29:26Z | | NetworkDeployment | e7820c39-40a0-4a5c-b627-080c03c884fd | Signal: deployment succeeded | SIGNAL_IN_PROGRESS | 2015-11-05T03:29:29Z | | UpdateDeployment | cf45cec3-6b39-4305-b96e-2ea05502626d | state changed | CREATE_COMPLETE | 2015-11-05T03:29:30Z | | NetworkDeployment | 6d8b44b4-38ad-4431-aef3-d11bb813d2b6 | state changed | CREATE_COMPLETE | 2015-11-05T03:29:31Z | | ControllerDeployment | 7a54d409-9d5d-49e1-b43b-1922b0e9938a | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:29:33Z | | ControllerDeployment | 0c834309-aac7-4e90-9d40-1e28213fb4ce | Signal: deployment succeeded | SIGNAL_IN_PROGRESS | 2015-11-05T03:30:26Z | | NetworkDeployment | db5dfdbb-f019-42b4-bc17-51ceba9b5e59 | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:30:27Z | | ControllerDeployment | 3fbe851d-fa61-4d0f-8ea9-a3daf860cd92 | state changed | CREATE_COMPLETE | 2015-11-05T03:30:28Z | | ControllerExtraConfigPre | befd264c-628e-4e67-b830-87967fac8d02 | state changed | CREATE_IN_PROGRESS | 2015-11-05T03:30:29Z | | ControllerExtraConfigPre | d9d6c72f-0811-48b7-9792-03dd11a7d97a | state changed | CREATE_COMPLETE | 2015-11-05T03:30:36Z | | overcloud-Controller-hja56vtbtibv-1-dt6boa7xta3j | ebf2fe96-f636-4a24-bd29-750669cc16da | Stack CREATE completed successfully | CREATE_COMPLETE | 2015-11-05T03:30:36Z | | ControllerDeployment | ecca66da-b87a-4202-a6b8-6105c7fd0062 | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:31:43Z | | NetworkDeployment | 7fcf443d-562f-4a24-8687-a6b1bd969101 | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:31:44Z | | ControllerDeployment | 2ddb77c0-5e6f-4d5b-af4f-b7f150bbe5d1 | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:34:42Z | | NetworkDeployment | 9ff5ebb2-b6c5-419f-b1c6-dc448da13f08 | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:34:43Z | | ControllerDeployment | c13aff29-09ff-436d-92c3-5623df222a15 | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:37:36Z | | NetworkDeployment | 4796a100-de14-4e5f-b7ba-3a41041b089a | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:37:36Z | | ControllerDeployment | cd37b9d0-88ed-428a-a025-9d4fe2ac1ba6 | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:39:01Z | | NetworkDeployment | 477f572f-a422-41d5-8057-3277246848cc | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:39:02Z | | ControllerDeployment | 07f2364f-2db6-4246-84ca-beb51b03ca56 | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:42:28Z | | NetworkDeployment | 44cad5bb-40f8-4911-a4fc-997ad13acc8b | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:42:30Z | | ControllerDeployment | 23125460-37a0-42c5-ab2f-7d185ff8e491 | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:45:37Z | | NetworkDeployment | 50408a12-c1c8-4fdf-a273-5d5026f52a54 | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:45:38Z | | ControllerDeployment | a7fac72c-cae6-437f-8b5a-aa667be56df8 | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:48:44Z | | NetworkDeployment | 211d2bbf-792e-44c5-88f4-8b470e0e8df2 | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:48:44Z | | ControllerDeployment | 542686b7-0961-414a-9249-a8c8c1aac12d | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:52:39Z | | NetworkDeployment | fc53abc0-f7b7-4830-b7d4-4999fdbd0e93 | Unknown | SIGNAL_COMPLETE | 2015-11-05T03:52:39Z | | overcloud-Controller-hja56vtbtibv-1-dt6boa7xta3j | ad5eb07f-77a3-49d7-8648-6eecbdee078d | Stack UPDATE started | UPDATE_IN_PROGRESS | 2015-11-05T15:38:06Z | | NodeUserData | 8cf6ff67-ddbf-46fe-bb2e-844e7de8cb36 | state changed | UPDATE_IN_PROGRESS | 2015-11-05T15:38:13Z | | UpdateConfig | 64d3556e-1255-4afc-97f2-24ea104b3de3 | state changed | UPDATE_IN_PROGRESS | 2015-11-05T15:38:16Z | | NodeUserData | 6d3dc8e5-f77d-43ec-84dc-5e6c80d48310 | state changed | UPDATE_COMPLETE | 2015-11-05T15:38:25Z | | StorageMgmtPort | 3727d758-717a-467a-8130-47f208c3b1c2 | state changed | UPDATE_IN_PROGRESS | 2015-11-05T15:38:27Z | | StoragePort | 171dd65a-424b-40ad-ad79-0f3ae7c65738 | state changed | UPDATE_IN_PROGRESS | 2015-11-05T15:38:29Z | | InternalApiPort | 7a026c8c-4a8e-419c-9bce-ee02e43b6c35 | state changed | UPDATE_IN_PROGRESS | 2015-11-05T15:38:35Z | | ExternalPort | 3337ec1d-2533-4c89-8fce-bac27f252631 | state changed | UPDATE_IN_PROGRESS | 2015-11-05T15:38:43Z | | TenantPort | 328127d9-4a96-4a3c-a0c5-e763854cb412 | state changed | UPDATE_IN_PROGRESS | 2015-11-05T15:38:56Z | | StorageMgmtPort | 84922ac0-836b-48bb-b01d-72304e4f2069 | state changed | UPDATE_COMPLETE | 2015-11-05T15:39:12Z | | UpdateConfig | 9bef6715-5703-4ea6-97f1-738af23c35de | state changed | UPDATE_COMPLETE | 2015-11-05T15:39:12Z | | ExternalPort | 8e27061a-dd0c-4c96-b5b0-8610f8a67710 | state changed | UPDATE_COMPLETE | 2015-11-05T15:39:13Z | | StoragePort | 5dded725-3fde-4791-a517-76b88ff44763 | state changed | UPDATE_COMPLETE | 2015-11-05T15:39:14Z | | InternalApiPort | 8d05c966-3449-4d99-9fc1-da33736cbb59 | state changed | UPDATE_COMPLETE | 2015-11-05T15:39:15Z | | TenantPort | 8c56d989-4d98-4be6-bf53-3325ef5784a5 | state changed | UPDATE_COMPLETE | 2015-11-05T15:39:16Z | | NetIpSubnetMap | 1b48901e-6889-442d-9075-f77addbf8e65 | state changed | CREATE_IN_PROGRESS | 2015-11-05T15:39:19Z | | UpdateDeployment | 621b20e4-4c0b-4451-9e29-b888e67c64be | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-11-05T15:39:31Z | | NetIpMap | ce71b163-dbe5-4c59-bb03-8f94755011f3 | state changed | UPDATE_IN_PROGRESS | 2015-11-05T15:39:33Z | | NetworkConfig | afd816cb-ad23-4948-b3d8-2e91bbaad304 | state changed | UPDATE_IN_PROGRESS | 2015-11-05T15:39:42Z | | NetIpSubnetMap | d78bf9f4-e002-4e33-a1bf-60d52c79baaf | state changed | CREATE_COMPLETE | 2015-11-05T15:39:51Z | | NetIpMap | 502eb3f4-4848-47c6-939e-9e840b8a84ee | state changed | UPDATE_COMPLETE | 2015-11-05T15:39:52Z | | ControllerConfig | b33d29bd-fa6a-4e69-b6e5-58e5c2f59f29 | state changed | UPDATE_IN_PROGRESS | 2015-11-05T15:39:54Z | | ControllerConfig | 8c0f795a-4c6d-4908-851a-29bfe3fea1fb | state changed | CREATE_IN_PROGRESS | 2015-11-05T15:39:56Z | | ControllerConfig | 1842aca0-7a6c-4270-ad2a-4396043e553b | state changed | CREATE_COMPLETE | 2015-11-05T15:40:00Z | | NetworkConfig | 04b2330e-43fe-4a0b-82c3-8d18bffabe1d | state changed | UPDATE_COMPLETE | 2015-11-05T15:40:05Z | | NetworkDeployment | 956f8eda-bb13-47f0-96d7-ffd503f4f1a2 | state changed | UPDATE_IN_PROGRESS | 2015-11-05T15:40:07Z | | ControllerDeployment | bf13eacb-2b5b-4780-a6a5-cd9c5fe78a67 | Unknown | SIGNAL_COMPLETE | 2015-11-05T15:42:33Z | | NetworkDeployment | e26f2a08-34c8-4d04-9efd-92e6ea96adb4 | Signal: deployment succeeded | SIGNAL_IN_PROGRESS | 2015-11-05T15:42:34Z | | NetworkDeployment | c2eb6e5f-3b3a-4eb9-bf04-6369292b09e8 | state changed | UPDATE_COMPLETE | 2015-11-05T15:42:35Z | | UpdateDeployment | 929c7e72-ec34-465c-886f-0b20f480cabd | Hook pre-update is cleared | CREATE_COMPLETE | 2015-11-05T17:00:43Z | | UpdateDeployment | 912316dd-38e3-434c-8e84-9399b0705fe4 | state changed | UPDATE_IN_PROGRESS | 2015-11-05T17:00:44Z | | UpdateDeployment | c5fc528d-90f5-44fb-903a-8e36ef20658b | Signal: deployment succeeded | SIGNAL_IN_PROGRESS | 2015-11-05T17:14:25Z | | UpdateDeployment | 0e32df7d-7363-41c6-a399-9e875fe2158c | state changed | UPDATE_COMPLETE | 2015-11-05T17:14:25Z | | ControllerDeployment | 02dcfb86-26e6-4875-bfc8-d3d1d5e6f17c | state changed | UPDATE_IN_PROGRESS | 2015-11-05T17:14:25Z | | ControllerDeployment | 6d98b01e-0fbc-4f68-b9fd-8abc69818dbf | Signal: deployment succeeded | SIGNAL_IN_PROGRESS | 2015-11-05T17:14:28Z | | NetworkDeployment | 868fb6f1-09ac-49b0-9056-3dad1b055bb2 | Unknown | SIGNAL_COMPLETE | 2015-11-05T17:14:28Z | | ControllerDeployment | ceaddd04-fa76-426d-8193-f43b59a427dc | state changed | UPDATE_COMPLETE | 2015-11-05T17:14:29Z | | ControllerExtraConfigPre | 2f58f504-22f9-4af8-a5df-1a9578eb2ed3 | state changed | UPDATE_IN_PROGRESS | 2015-11-05T17:14:29Z | | ControllerExtraConfigPre | 91c58a9e-a163-4b48-bbcc-745a6baeb5f5 | state changed | UPDATE_COMPLETE | 2015-11-05T17:14:32Z | | NodeExtraConfig | 9f8d9703-a951-4b1b-865a-5ef53e8fa71b | state changed | CREATE_IN_PROGRESS | 2015-11-05T17:14:32Z | | NodeExtraConfig | 46746abe-cace-4d5c-9996-3acd15aecd5b | state changed | CREATE_COMPLETE | 2015-11-05T17:14:35Z | | NetworkDeployment | 94abc64c-b24b-45e4-8c9d-807d83569894 | Unknown | SIGNAL_COMPLETE | 2015-11-05T17:15:11Z | | ControllerDeployment | 962d8bf0-3809-46ee-a7f9-3ed95e6d4a37 | Unknown | SIGNAL_COMPLETE | 2015-11-05T17:15:11Z | +--------------------------------------------------+--------------------------------------+------------------------------------------------+--------------------+----------------------+
No new breakpoints ever get added: [stack@instack ~]$ heat hook-poll -n 5 overcloud +----+------------------------+-----------------+------------+------------+ | id | resource_status_reason | resource_status | event_time | stack_name | +----+------------------------+-----------------+------------+------------+ +----+------------------------+-----------------+------------+------------+
I've uploaded the heat logs here: http://file.rdu.redhat.com/~jslagle/bug-1278544/
note that the output from the client is just a seemingly infinite list of "IN_PROGRESS" It's weird, b/c it appears nothing is in progress: [stack@instack ~]$ heat stack-list heat +--------------------------------------+------------+--------------------+----------------------+ | id | stack_name | stack_status | creation_time | +--------------------------------------+------------+--------------------+----------------------+ | d8eae9e6-c64e-4ce6-aef7-244979bfc0f1 | overcloud | UPDATE_IN_PROGRESS | 2015-11-05T03:15:41Z | +--------------------------------------+------------+--------------------+----------------------+ [stack@instack ~]$ heat resource-list overcloud +-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | +-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+ | BlockStorageAllNodesDeployment | c7586f0e-4c6f-40e3-b20a-dc92298589cc | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | BlockStorageNodesPostDeployment | d37e0e34-9654-4543-b286-2668fec896a4 | OS::TripleO::BlockStoragePostDeployment | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | CephClusterConfig | 33e939b1-54a6-48ff-a235-18459c6ec36c | OS::TripleO::CephClusterConfig::SoftwareConfig | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | CephStorageAllNodesDeployment | ea28a95a-aaa8-4b4f-b1b0-0d4b6a7d4737 | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | CephStorageCephDeployment | 58f40370-606b-4ec8-be06-070b24373778 | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | CephStorageNodesPostDeployment | 5224cd58-9723-4679-8185-d06215a7f3d7 | OS::TripleO::CephStoragePostDeployment | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ComputeAllNodesDeployment | 9a86b367-f991-47dd-bc0f-9b857d0bfdcd | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ComputeCephDeployment | e6bf8562-b529-4123-bdb0-ac10a63fbba9 | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ComputeNodesPostDeployment | 22db48bb-6dc2-4c4c-ad3d-58e299fc3500 | OS::TripleO::ComputePostDeployment | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ControllerAllNodesDeployment | 443ac7d7-837e-401b-ae14-045bddc08e86 | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ControllerBootstrapNodeConfig | 4163b982-a79f-416e-b123-afacddba8e27 | OS::TripleO::BootstrapNode::SoftwareConfig | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ControllerBootstrapNodeDeployment | 87534065-3b51-461c-8ddf-8148dfe2f198 | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ControllerCephDeployment | e787564e-fa56-4e7c-8f5e-d9a9304132fa | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ControllerClusterConfig | f3dc9941-ce27-4713-955c-1dca9ced8d23 | OS::Heat::StructuredConfig | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ControllerClusterDeployment | 853073c9-f694-4592-bd87-c7bdfee95d96 | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ControllerIpListMap | 85630f7d-c4cf-4d37-9bd4-af9234b91737 | OS::TripleO::Network::Ports::NetIpListMap | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ControllerNodesPostDeployment | afe25c74-a231-4df6-bb61-c84697d8277d | OS::TripleO::ControllerPostDeployment | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ControllerSwiftDeployment | dc3bbe28-6a44-462e-b0e6-f8773e96fe04 | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | HeatAuthEncryptionKey | overcloud-HeatAuthEncryptionKey-6ljv5gid5hxi | OS::Heat::RandomString | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | HorizonSecret | overcloud-HorizonSecret-6x5wuqwd6dq6 | OS::Heat::RandomString | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | MysqlClusterUniquePart | overcloud-MysqlClusterUniquePart-kesbckxdev67 | OS::Heat::RandomString | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | MysqlRootPassword | overcloud-MysqlRootPassword-e3olccywlv67 | OS::Heat::RandomString | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ObjectStorageAllNodesDeployment | 7c2529c8-f4f3-4e41-869e-b3e3af936829 | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ObjectStorageNodesPostDeployment | a15a83fe-91fe-4757-bc2a-31846c48bcdd | OS::TripleO::ObjectStoragePostDeployment | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | ObjectStorageSwiftDeployment | 5a09f483-79ce-4454-bbe1-52f5a3017bed | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | PcsdPassword | overcloud-PcsdPassword-otnh56wmmlyo | OS::Heat::RandomString | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | RabbitCookie | overcloud-RabbitCookie-suxlvfl5cd3m | OS::Heat::RandomString | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | SwiftDevicesAndProxyConfig | e49da330-9376-4046-bd8e-116b43d7a2b1 | OS::TripleO::SwiftDevicesAndProxy::SoftwareConfig | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | VipDeployment | 53768f74-70c6-40e5-ab27-ec374616edb1 | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | allNodesConfig | e95a6953-b590-49a4-a993-d5fbfc0efd8c | OS::TripleO::AllNodes::SoftwareConfig | CREATE_COMPLETE | 2015-11-05T03:15:42Z | | Controller | c9163ce9-c8ba-4514-871f-e289914c43f9 | OS::Heat::ResourceGroup | UPDATE_FAILED | 2015-11-05T15:36:57Z | | BlockStorage | 7075b578-1939-46d0-935d-3dbf6969673f | OS::Heat::ResourceGroup | UPDATE_COMPLETE | 2015-11-05T15:37:03Z | | VipConfig | 58d0557a-7068-4221-bf6a-6e5f534df6c4 | OS::TripleO::VipConfig | UPDATE_COMPLETE | 2015-11-05T16:26:10Z | | Networks | 8976fd6f-06a7-4ff6-8d0f-657563160a5d | OS::TripleO::Network | UPDATE_COMPLETE | 2015-11-05T16:26:11Z | | ControlVirtualIP | 6b1b4ac1-0dd3-40f6-a7ac-cf6333983092 | OS::TripleO::Network::Ports::CtlplaneVipPort | UPDATE_COMPLETE | 2015-11-05T16:26:41Z | | ObjectStorage | 6d3259eb-93fe-4932-b652-1b69eb589e43 | OS::Heat::ResourceGroup | UPDATE_COMPLETE | 2015-11-05T16:26:46Z | | CephStorage | c5eaa74d-9a34-4fc8-8490-1a33bb4e3704 | OS::Heat::ResourceGroup | UPDATE_COMPLETE | 2015-11-05T16:26:48Z | | StorageMgmtVirtualIP | 9875176b-f7f7-4765-9913-ad6b3fb751a0 | OS::TripleO::Network::Ports::StorageMgmtVipPort | UPDATE_COMPLETE | 2015-11-05T16:26:53Z | | RedisVirtualIP | 3d9d95c3-d7e3-4cc0-8891-9f131ddde6ad | OS::TripleO::Network::Ports::RedisVipPort | UPDATE_COMPLETE | 2015-11-05T16:26:55Z | | PublicVirtualIP | 109a58f9-2ce4-4f33-8172-5b442fee7dfb | OS::TripleO::Network::Ports::ExternalVipPort | UPDATE_COMPLETE | 2015-11-05T16:27:02Z | | StorageVirtualIP | c08bfd65-9831-465b-a7a7-a43a330f2536 | OS::TripleO::Network::Ports::StorageVipPort | UPDATE_COMPLETE | 2015-11-05T16:27:09Z | | InternalApiVirtualIP | a6dffc7e-59a5-4594-91c7-c4c7bd8c64a8 | OS::TripleO::Network::Ports::InternalApiVipPort | UPDATE_COMPLETE | 2015-11-05T16:27:12Z | | VipMap | 7868bf41-836a-4508-acfd-291e8caede35 | OS::TripleO::Network::Ports::NetVipMap | UPDATE_COMPLETE | 2015-11-05T16:27:16Z | | Compute | eb7dcb92-07d4-4f2e-8839-4f24c8bf7b53 | OS::Heat::ResourceGroup | UPDATE_FAILED | 2015-11-05T16:27:20Z | +-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+
Even doing a heat resource-list -n 10 overcloud, nothing shows IN_PROGRESS
i tailed the heat-engine.log for a minute or so and saved that to a file: http://file.rdu.redhat.com/~jslagle/bug-1278544/heat-engine-tailed.log it's just the same pattern over and over again. I wonder if it's stuck in an infinite loop?
If a stack gets stuck in IN_PROGRESS, it's almost certainly because there was an uncaught exception (https://bugs.launchpad.net/heat/+bug/1492433), and because it's uncaught it's also not logged in heat-engine.log (https://bugs.launchpad.net/heat/+bug/1492427). You should be able to find the traceback in the journal (thanks systemd!), and from there we can diagnose the bug.
BTW the workaround would be to restart heat-engine and then use the workaround for bug 1267364. That will get your stacks back to the FAILED state so that they're not stuck IN_PROGRESS any more.
When you say you're upgrading from 7.0 to the latest poodle, did you start by upgrading the undercloud to latest poodle first?
(In reply to Steve Baker from comment #11) > When you say you're upgrading from 7.0 to the latest poodle, did you start > by upgrading the undercloud to latest poodle first? Yes, first step in updating from 7.0 is to update the undercloud, and make sure services are restarted (should happen automatically via package updates). I didn't have the newest heat build though with the 2 patches bug 1267364, looks like that was done yesterday just shortly after i had updated the undercloud ;). I updated to those today, and restarted heat-engine. The stack is still stuck in UPDATE_IN_PROGRESS. Is that still expected? Is wasn't clear to me from the bug if the 2 patches remove the need for the manual db sql workaround, or if I still have to do that.
(In reply to Zane Bitter from comment #9) > If a stack gets stuck in IN_PROGRESS, it's almost certainly because there > was an uncaught exception (https://bugs.launchpad.net/heat/+bug/1492433), > and because it's uncaught it's also not logged in heat-engine.log > (https://bugs.launchpad.net/heat/+bug/1492427). You should be able to find > the traceback in the journal (thanks systemd!), and from there we can > diagnose the bug. it looks like the journal got rotated, and we weren't saving old journal files (no /var/log/journal) in 7.0. If this happens again, what unit should I look at to see the traceback? Would it be openstack-heat-engine?
using the newest heat build with the 2 patches from https://bugzilla.redhat.com/show_bug.cgi?id=1267364 and restarting heat-engine, this issue still remains. zaneb indicated the manual sql workaround shouldnt be needed anymore. so there must be something else going on.
> I updated to those today, and restarted heat-engine. The stack is still stuck in UPDATE_IN_PROGRESS. Is that still expected? No, not expected. At startup, Heat goes through all stacks that are IN_PROGRESS and tries to break their locks (i.e. we ping the engine that owns the lock, and if it doesn't reply we steal it) and move them to FAILED. This was broken for nested stacks, and didn't move the member resources to FAILED, and that's what the patches for bug 1267364 fixed. > what unit should I look at to see the traceback? Would it be openstack-heat-engine? Yes.
i've reproduced this now and captured the actual Heat exception in a new bug: https://bugzilla.redhat.com/show_bug.cgi?id=1278975 given the traceback there, my initial suspicion is that python-rdomanager-oscplugin did something wrong (didn't send the environment I asked it to, or something else). Still, I think this existing bug is valid. We ought to be able to recover the stack somehow in such a situation. Even if we have to bounce the heat-engine service.
i tried the sql from https://bugzilla.redhat.com/show_bug.cgi?id=1267364 : UPDATE stack SET status="FAILED" WHERE status="IN_PROGRESS" AND action="UPDATE"; UPDATE resource SET status="FAILED" WHERE status="IN_PROGRESS" AND action="UPDATE"; obviously my stack is in UPDATE_FAILED now :). going to see if i can figure out what causes https://bugzilla.redhat.com/show_bug.cgi?id=1278975 now
Are there any possible workarounds? I also have a stack in progress, but no resources in progress and no hooks waiting to be cleared. [stack@instack ~]$ heat resource-list overcloud -n10 | grep PROG [stack@instack ~]$ heat hook-poll overcloud -n10 +----+------------------------+-----------------+------------+------------+ | id | resource_status_reason | resource_status | event_time | stack_name | +----+------------------------+-----------------+------------+------------+ +----+------------------------+-----------------+------------+------------+ [stack@instack ~]$ heat stack-list +--------------------------------------+------------+--------------------+----------------------+ | id | stack_name | stack_status | creation_time | +--------------------------------------+------------+--------------------+----------------------+ | 6de1d453-b12d-421f-96be-d42b8bf93f5a | overcloud | UPDATE_IN_PROGRESS | 2015-11-11T12:42:36Z | +--------------------------------------+------------+--------------------+----------------------+
Works better now.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2015:2680