Bug 1304797

Summary: Heat nested stack has CREATE_IN_PROGRESS status after deployment is complete
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-heatAssignee: Zane Bitter <zbitter>
Status: CLOSED WONTFIX QA Contact: Amit Ugol <augol>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: dbecker, jslagle, mburns, morazi, rhel-osp-director-maint, sbaker, shardy, yeylon
Target Milestone: z4Keywords: ZStream
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-15 23:57:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2016-02-04 16:03:45 UTC
Description of problem:
I am doing a  7.3 deployment with IPv6: 
export THT=/home/stack/templates/my-overcloud 
openstack overcloud deploy --templates $THT \
-e $THT/environments/network-isolation-v6.yaml \
-e /home/stack/templates/network-environment-v6.yaml \
-e /home/stack/templates/ceph.yaml \
--control-scale 3 \
--compute-scale 1 \
--ceph-storage-scale 2 \
--neutron-disable-tunneling \
--neutron-network-type vlan \
--neutron-network-vlan-ranges tenantvlan:50:60 \
--neutron-bridge-mappings datacentre:br-ex,tenantvlan:br-nic4 \
--ntp-server clock.redhat.com 

Deployment completes successfully:
[stack@undercloud72 ~]$ time bash deploy.command.vlan 
Deploying templates in the directory /home/stack/templates/my-overcloud
Overcloud Endpoint: http://[2620:52:0:13b8:5054:ff:fd3e:1]:5000/v2.0
Overcloud Deployed

real	48m27.949s
user	0m5.575s
sys	0m0.616s
[stack@undercloud72 ~]$ 
[stack@undercloud72 ~]$ 
[stack@undercloud72 ~]$ heat stack-list
+--------------------------------------+------------+-----------------+----------------------+
| id                                   | stack_name | stack_status    | creation_time        |
+--------------------------------------+------------+-----------------+----------------------+
| e11bc489-fc10-4c2b-ad3f-fdf8babfa69c | overcloud  | CREATE_COMPLETE | 2016-02-04T14:49:44Z |
+--------------------------------------+------------+-----------------+----------------------+

One nested stacks still shows as CREATE_IN_PROGRESS:

[stack@undercloud72 ~]$ heat stack-list -n | grep PROGRESS
| 3e29f882-bb99-4f37-8169-32b1f1840b5d | overcloud-ControllerAllNodesDeployment-ccvzbn2lsi7b                                                                                               | CREATE_IN_PROGRESS | 2016-02-04T14:42:50Z | 4178e4e9-28d2-454f-b2e6-47986b0d42bc |

Version-Release number of selected component (if applicable):

openstack-heat-api-cloudwatch-2015.1.2-7.el7ost.noarch
openstack-heat-templates-0-0.8.20150605git.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-112.el7ost.noarch
openstack-heat-api-cfn-2015.1.2-7.el7ost.noarch
python-heatclient-0.6.0-1.el7ost.noarch
openstack-heat-engine-2015.1.2-7.el7ost.noarch
openstack-heat-api-2015.1.2-7.el7ost.noarch
heat-cfntools-1.2.8-2.el7.noarch
openstack-heat-common-2015.1.2-7.el7ost.noarch


Expected results:
All nested stacks show as CREATE_COMPLETE.

Comment 2 Zane Bitter 2016-02-04 17:03:30 UTC
Can't think of a mechanism whereby that would be legit, so changing component to Heat.

Comment 5 Zane Bitter 2016-02-15 23:57:52 UTC
What appears to be happening is that the stack overcloud-ControllerAllNodesDeployment-ccvzbn2lsi7b is *not* a nested stack of the current "overcloud" stack, e11bc489-fc10-4c2b-ad3f-fdf8babfa69c. Rather, it must belong to a previous incarnation of the overcloud because it is created 7 minutes before the overcloud stack is created.

2016-02-04 09:42:50.065 28447 INFO heat.engine.service [req-0a60a214-de6a-4810-a
871-e7d55bfcd0b4 5b890f3d27e44322a53cde34da1122bb 68e747fc4d6747c09d56b2e43e1ef3
95] Creating stack overcloud-ControllerAllNodesDeployment-ccvzbn2lsi7b

2016-02-04 09:49:38.076 28447 INFO heat.engine.service [req-42a337bf-9f28-41e1-b4c1-4fe564e70a29 5b890f3d27e44322a53cde34da1122bb 68e747fc4d6747c09d56b2e43e1ef395] Creating stack overcloud

So the apparent discrepancy between the overcloud being complete and one of its children being in progress is not actually an issue. The issue is that the previous overcloud was deleted and yet one of its child stacks is still around.

The cause of this is that the delete started at an inopportune moment between when we requested the nested stack be created and when we received the UUID of the new stack back - the overcloud stack delete starts less than 400ms after the overcloud-ControllerAllNodesDeployment-ccvzbn2lsi7b stack create

2016-02-04 09:42:50.456 28448 INFO heat.engine.service [req-b095ac8b-0f0a-4777-a
708-d9e8245abfde 5b890f3d27e44322a53cde34da1122bb 68e747fc4d6747c09d56b2e43e1ef3
95] Deleting stack overcloud

Losing track of resources in this way is a rare but known issue:

https://bugs.launchpad.net/heat/+bug/1536451

This really needs to be fixed upstream, and it's unlikely that any such fix would get backported as far as Kilo.