Bug 1306502

Summary: rhel-osp-director: Update from 7.2: Heat stack times out with no resources IN_PROGRESS after engine restart (include-password or export HEAT_INCLUDE_PASSWORD=1)
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: rhosp-directorAssignee: Zane Bitter <zbitter>
Status: CLOSED DUPLICATE QA Contact: yeylon <yeylon>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: dbecker, mburns, morazi, ohochman, rhel-osp-director-maint, srevivo, zbitter
Target Milestone: y3   
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-11 05:09:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Alexander Chuzhoy 2016-02-11 03:21:53 UTC
rhel-osp-director: Update from 7.0:  Heat stack times out with no resources IN_PROGRESS after engine restart


Environment:
python-heatclient-0.6.0-1.el7ost.noarch
openstack-heat-api-2015.1.2-8.el7ost.noarch
heat-cfntools-1.2.8-2.el7.noarch
openstack-heat-templates-0-0.8.20150605git.el7ost.noarch
instack-undercloud-2.1.2-39.el7ost.noarch
openstack-heat-engine-2015.1.2-8.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-117.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.2-8.el7ost.noarch
openstack-heat-api-cfn-2015.1.2-8.el7ost.noarch
openstack-heat-common-2015.1.2-8.el7ost.noarch

Steps to reproduce:
1. Deploy 7.0 with openstack overcloud deploy --templates --control-scale 3 --compute-scale 2 --ceph-storage-scale 1   --neutron-network-type vxlan --neutron-tunnel-types vxlan  --ntp-server x.x.x.x --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml
2. Attempt to update the deployment.
3. The deployment will fail due to timeout - need to raise the "rpc_response_timeout = " value to 600 in /etc/heat/heat.conf  (https://bugzilla.redhat.com/show_bug.cgi?id=1305947 )
and restart the heat engine.
4. Resume the overcloud update.

Result:
The deployment times out with this message:
"ERROR: openstack ERROR: Authentication failed. Please try again with option --include-password or export HEAT_INCLUDE_PASSWORD=1"

Expected result:
The update should complete successfully.

Comment 2 Alexander Chuzhoy 2016-02-11 03:36:49 UTC
Correction, the initially deployed version is 7.2GA.

Comment 3 Zane Bitter 2016-02-11 03:43:23 UTC
The Controller-0 and Controller-1 nested stacks were the only resources still in progress when the stack timed out:

[stack@instack ~]$ heat resource-list -n5 overcloud|grep -v COMPLE
+-----------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+-----------------------------------------------+
| resource_name                                 | physical_resource_id                          | resource_type                                     | resource_status | updated_time         | parent_resource                               |
+-----------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+-----------------------------------------------+
| Controller                                    | 280e7277-1646-4b51-8fa7-d7b50cb0310e          | OS::Heat::ResourceGroup                           | UPDATE_FAILED   | 2016-02-10T21:30:47Z |                                               |
| 1                                             | 348fbeed-1521-439c-8dc0-85de4197a438          | OS::TripleO::Controller                           | UPDATE_FAILED   | 2016-02-10T21:30:59Z | Controller                                    |
| 0                                             | b0997bf0-dc0c-4380-ac4b-e101658a6c02          | OS::TripleO::Controller                           | UPDATE_FAILED   | 2016-02-10T21:31:44Z | Controller                                    |
+-----------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+-----------------------------------------------+

Comment 4 Zane Bitter 2016-02-11 04:25:04 UTC
Analysis of the log shows that this is actually a duplicate of bug 1290950. However, I'm going to leave this open for the moment because there are reports of a similar failure mode *not* involving a heat-engine restart, which presumably couldn't have the same cause.

Comment 5 Zane Bitter 2016-02-11 05:09:41 UTC
Other issue was unrelated, so closing this as a duplicate.

*** This bug has been marked as a duplicate of bug 1290950 ***