Hide Forgot
rhel-osp-director: 7.3->8.0 overcloud upgrade fails on "rerunning the deploy command with additional parameters (point it to the checked-out tripleo-heat-templates)" Environment: openstack-tripleo-heat-templates-0.8.12-1.el7ost.noarch openstack-tripleo-heat-templates-kilo-0.8.12-1.el7ost.noarch openstack-puppet-modules-7.0.15-1.el7ost.noarch instack-undercloud-2.2.6-1.el7ost.noarch Steps to reproduce: 1. Deploy overcloud 7.3 2. Update the undercoud. 3. Complete this step openstack overcloud deploy --templates tripleo-heat-templates -e tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e tripleo-heat-templates/environments/puppet-pacemaker.yaml -e tripleo-heat-templates/environments/network-isolation.yaml -e tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e network_env.yaml -e tripleo-heat-templates/environments/major-upgrade-pacemaker-init.yaml -e rhos-release-8.yaml 4. == Re-run the deploy command with additional parameters (point it to the checked-out tripleo-heat-templates) command: openstack overcloud deploy --templates tripleo-heat-templates/ --control-scale 3 --compute-scale 2 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server 10.5.26.10 --timeout 90 -e tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml Result: 2016-03-21 21:48:16 [2]: UPDATE_COMPLETE state changed 2016-03-21 21:48:24 [overcloud-Controller-j6zrllsd5qwx]: UPDATE_FAILED Engine went down during stack UPDATE Stack overcloud UPDATE_FAILED Heat Stack update failed. Expected result: The command should complete successfully. [stack@instack ~]$ heat resource-list -n5 overcloud|grep -v COMPLETE +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+------------------------------------------------------------------------------------------+ | Controller | 351a6bf2-fe52-4e38-9c93-46c77ed0fa0c | OS::Heat::ResourceGroup | UPDATE_FAILED | 2016-03-21T21:42:00 | overcloud | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+------------------------------------------------------------------------------------------+ [stack@instack ~]$ heat resource-show overcloud Controller +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ | Property | Value | +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ | attributes | { | | | "attributes": null, | | | "refs": null | | | } | | creation_time | 2016-03-21T17:36:53 | | description | | | links | http://192.0.2.1:8004/v1/3f1ed20dad4b40359d0a890b05992b40/stacks/overcloud/f9610699-cb4c-49c2-a96e-d5c39519bf59/resources/Controller (self) | | | http://192.0.2.1:8004/v1/3f1ed20dad4b40359d0a890b05992b40/stacks/overcloud/f9610699-cb4c-49c2-a96e-d5c39519bf59 (stack) | | | http://192.0.2.1:8004/v1/3f1ed20dad4b40359d0a890b05992b40/stacks/overcloud-Controller-j6zrllsd5qwx/351a6bf2-fe52-4e38-9c93-46c77ed0fa0c (nested) | | logical_resource_id | Controller | | physical_resource_id | 351a6bf2-fe52-4e38-9c93-46c77ed0fa0c | | required_by | ControllerClusterDeployment | | | SwiftDevicesAndProxyConfig | | | ControllerNodesPostDeployment | | | ControllerBootstrapNodeDeployment | | | allNodesConfig | | | ControllerClusterConfig | | | ControllerIpListMap | | | ControllerAllNodesValidationDeployment | | | VipDeployment | | | UpdateWorkflow | | | AllNodesValidationConfig | | | CephClusterConfig | | | ControllerCephDeployment | | | ControllerAllNodesDeployment | | | AllNodesExtraConfig | | | ControllerBootstrapNodeConfig | | | ControllerSwiftDeployment | | resource_name | Controller | | resource_status | UPDATE_FAILED | | resource_status_reason | Engine went down during resource UPDATE | | resource_type | OS::Heat::ResourceGroup | | updated_time | 2016-03-21T21:42:00 | +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
Created attachment 1138820 [details] heat-engine.log from the undercloud.
Hi Sasha, I think the problem here is insufficient memory for the undercloud... I used to hit this too ("Engine went down...") with a 4GB and then even with a 6GB undercloud (RAM). I just logged onto the box to confirm... the undercloud has 2 vcpus but just under 6GB ram and also no swap space. I'd suggest a minimum of 8GB for the undercloud (and 2vcpu) as well as some swap space - e.g. you can define 4GB swap like: sudo dd if=/dev/zero of=/swapfile count=4k bs=1M sudo mkswap /swapfile sudo chmod 0600 /swapfile sudo swapon /swapfile cat /proc/swaps I'd also suggest you edit the /etc/heat/heat.conf and uncomment the " #num_engine_workers = 4" but make it like "num_engine_workers=2" and then sudo systemctl restart openstack-heat-engine (as per https://bugzilla.redhat.com/show_bug.cgi?id=1290949#c8 - seems like you may not need to explicitly set this, since you've got 2 vcpu, but am just sharing what I do on my env for a successful run). As a side note - the default 4 GB ram given to the overcloud nodes has also caused problems for me ('cannot allocate memory' during the last 'convergence' upgrade step) so I am also using 5GB for those (and 4GB swap as above for the overcloud nodes too). thanks, marios.
I wouldn't recommend running heat-engine with <4 workers, even on 2 cores.
Hi Sasha, Can you confirm that this issue is resolved by increasing the RAM on the undercloud node? Angus
waiting with pm_ack for qe confirmation
Confirmed. Lets close it.Will file a new one if reproduces on a stronger setup.