Bug 1319944 - rhel-osp-director: 7.3->8.0 overcloud upgrade fails on "rerunning the deploy command with additional parameters (point it to the checked-out tripleo-heat-templates)"
Summary: rhel-osp-director: 7.3->8.0 overcloud upgrade fails on "rerunning the deploy ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 8.0 (Liberty)
Assignee: Angus Thomas
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-21 22:46 UTC by Alexander Chuzhoy
Modified: 2016-03-24 11:50 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-24 11:50:08 UTC
Target Upstream Version:


Attachments (Terms of Use)
heat-engine.log from the undercloud. (766.41 KB, application/x-gzip)
2016-03-21 22:51 UTC, Alexander Chuzhoy
no flags Details

Description Alexander Chuzhoy 2016-03-21 22:46:54 UTC
rhel-osp-director: 7.3->8.0 overcloud upgrade fails on "rerunning the deploy command with additional parameters (point it to the checked-out tripleo-heat-templates)"

Environment:
openstack-tripleo-heat-templates-0.8.12-1.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.12-1.el7ost.noarch
openstack-puppet-modules-7.0.15-1.el7ost.noarch
instack-undercloud-2.2.6-1.el7ost.noarch


Steps to reproduce:
1. Deploy overcloud 7.3
2. Update the undercoud.
3. Complete this step
openstack overcloud deploy --templates tripleo-heat-templates -e tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e tripleo-heat-templates/environments/puppet-pacemaker.yaml -e tripleo-heat-templates/environments/network-isolation.yaml -e tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e network_env.yaml -e tripleo-heat-templates/environments/major-upgrade-pacemaker-init.yaml -e rhos-release-8.yaml
4. == Re-run the deploy command with additional parameters (point it to the checked-out tripleo-heat-templates)
command:
openstack overcloud deploy --templates tripleo-heat-templates/ --control-scale 3 --compute-scale 2    --neutron-network-type vxlan --neutron-tunnel-types vxlan  --ntp-server 10.5.26.10 --timeout 90 -e tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml


Result:
2016-03-21 21:48:16 [2]: UPDATE_COMPLETE  state changed                                                                                                                                         
2016-03-21 21:48:24 [overcloud-Controller-j6zrllsd5qwx]: UPDATE_FAILED  Engine went down during stack UPDATE                                                                                    
Stack overcloud UPDATE_FAILED                                                                                                                                                                   
Heat Stack update failed.    


Expected result:
The command should complete successfully.


[stack@instack ~]$ heat resource-list -n5 overcloud|grep -v COMPLETE
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+------------------------------------------------------------------------------------------+                                                                                                                                                    
| resource_name                               | physical_resource_id                          | resource_type                                     | resource_status | updated_time        | stack_name                                                                               |                                                                                                                                                    
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+------------------------------------------------------------------------------------------+                                                                                                                                                    
| Controller                                  | 351a6bf2-fe52-4e38-9c93-46c77ed0fa0c          | OS::Heat::ResourceGroup                           | UPDATE_FAILED   | 2016-03-21T21:42:00 | overcloud                                                                                |                                                                                                                                                    
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+------------------------------------------------------------------------------------------+                                                                                                                                                    
[stack@instack ~]$ heat resource-show overcloud Controller
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| Property               | Value                                                                                                                                            |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| attributes             | {                                                                                                                                                |
|                        |   "attributes": null,                                                                                                                            |
|                        |   "refs": null                                                                                                                                   |
|                        | }                                                                                                                                                |
| creation_time          | 2016-03-21T17:36:53                                                                                                                              |
| description            |                                                                                                                                                  |
| links                  | http://192.0.2.1:8004/v1/3f1ed20dad4b40359d0a890b05992b40/stacks/overcloud/f9610699-cb4c-49c2-a96e-d5c39519bf59/resources/Controller (self)      |
|                        | http://192.0.2.1:8004/v1/3f1ed20dad4b40359d0a890b05992b40/stacks/overcloud/f9610699-cb4c-49c2-a96e-d5c39519bf59 (stack)                          |
|                        | http://192.0.2.1:8004/v1/3f1ed20dad4b40359d0a890b05992b40/stacks/overcloud-Controller-j6zrllsd5qwx/351a6bf2-fe52-4e38-9c93-46c77ed0fa0c (nested) |
| logical_resource_id    | Controller                                                                                                                                       |
| physical_resource_id   | 351a6bf2-fe52-4e38-9c93-46c77ed0fa0c                                                                                                             |
| required_by            | ControllerClusterDeployment                                                                                                                      |
|                        | SwiftDevicesAndProxyConfig                                                                                                                       |
|                        | ControllerNodesPostDeployment                                                                                                                    |
|                        | ControllerBootstrapNodeDeployment                                                                                                                |
|                        | allNodesConfig                                                                                                                                   |
|                        | ControllerClusterConfig                                                                                                                          |
|                        | ControllerIpListMap                                                                                                                              |
|                        | ControllerAllNodesValidationDeployment                                                                                                           |
|                        | VipDeployment                                                                                                                                    |
|                        | UpdateWorkflow                                                                                                                                   |
|                        | AllNodesValidationConfig                                                                                                                         |
|                        | CephClusterConfig                                                                                                                                |
|                        | ControllerCephDeployment                                                                                                                         |
|                        | ControllerAllNodesDeployment                                                                                                                     |
|                        | AllNodesExtraConfig                                                                                                                              |
|                        | ControllerBootstrapNodeConfig                                                                                                                    |
|                        | ControllerSwiftDeployment                                                                                                                        |
| resource_name          | Controller                                                                                                                                       |
| resource_status        | UPDATE_FAILED                                                                                                                                    |
| resource_status_reason | Engine went down during resource UPDATE                                                                                                          |
| resource_type          | OS::Heat::ResourceGroup                                                                                                                          |
| updated_time           | 2016-03-21T21:42:00                                                                                                                              |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+

Comment 2 Alexander Chuzhoy 2016-03-21 22:51:05 UTC
Created attachment 1138820 [details]
heat-engine.log from the undercloud.

Comment 3 Marios Andreou 2016-03-22 07:45:16 UTC
Hi Sasha, I think the problem here is insufficient memory for the undercloud... I used to hit this too ("Engine went down...") with a 4GB and then even with a 6GB undercloud (RAM). I just logged onto the box to confirm... the undercloud has 2 vcpus but just under 6GB ram and also no swap space.

I'd suggest a minimum of 8GB for the undercloud (and 2vcpu) as well as some swap space - e.g. you can define 4GB swap like:

sudo dd if=/dev/zero of=/swapfile count=4k bs=1M
sudo mkswap /swapfile
sudo chmod 0600 /swapfile
sudo swapon /swapfile
cat /proc/swaps 

I'd also suggest you edit the /etc/heat/heat.conf and uncomment the " #num_engine_workers = 4" but make it like "num_engine_workers=2" and then sudo systemctl restart openstack-heat-engine (as per https://bugzilla.redhat.com/show_bug.cgi?id=1290949#c8 - seems like you may not need to explicitly set this, since you've got 2 vcpu, but am just sharing what I do on my env for a successful run).

As a side note - the default 4 GB ram given to the overcloud nodes has also caused problems for me ('cannot allocate memory' during the last 'convergence' upgrade step) so I am also using 5GB for those (and 4GB swap as above for the overcloud nodes too).

thanks, marios.

Comment 4 Zane Bitter 2016-03-22 16:47:41 UTC
I wouldn't recommend running heat-engine with <4 workers, even on 2 cores.

Comment 5 Angus Thomas 2016-03-22 18:12:27 UTC
Hi Sasha,

Can you confirm that this issue is resolved by increasing the RAM on the undercloud node?


Angus

Comment 6 Jaromir Coufal 2016-03-23 15:34:47 UTC
waiting with pm_ack for qe confirmation

Comment 7 Alexander Chuzhoy 2016-03-24 01:44:40 UTC
Confirmed.
Lets close it.Will file a new one if reproduces on a stronger setup.


Note You need to log in before you can comment on or make changes to this bug.