| Summary: | rhel-osp-director: 7.3->8.0 overcloud upgrade fails on "rerunning the deploy command with additional parameters (point it to the checked-out tripleo-heat-templates)" | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> | ||||
| Component: | rhosp-director | Assignee: | Angus Thomas <athomas> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Arik Chernetsky <achernet> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 8.0 (Liberty) | CC: | dbecker, jcoufal, mandreou, mburns, morazi, rhel-osp-director-maint, sasha | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 8.0 (Liberty) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-03-24 11:50:08 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Created attachment 1138820 [details]
heat-engine.log from the undercloud.
Hi Sasha, I think the problem here is insufficient memory for the undercloud... I used to hit this too ("Engine went down...") with a 4GB and then even with a 6GB undercloud (RAM). I just logged onto the box to confirm... the undercloud has 2 vcpus but just under 6GB ram and also no swap space.
I'd suggest a minimum of 8GB for the undercloud (and 2vcpu) as well as some swap space - e.g. you can define 4GB swap like:
sudo dd if=/dev/zero of=/swapfile count=4k bs=1M
sudo mkswap /swapfile
sudo chmod 0600 /swapfile
sudo swapon /swapfile
cat /proc/swaps
I'd also suggest you edit the /etc/heat/heat.conf and uncomment the " #num_engine_workers = 4" but make it like "num_engine_workers=2" and then sudo systemctl restart openstack-heat-engine (as per https://bugzilla.redhat.com/show_bug.cgi?id=1290949#c8 - seems like you may not need to explicitly set this, since you've got 2 vcpu, but am just sharing what I do on my env for a successful run).
As a side note - the default 4 GB ram given to the overcloud nodes has also caused problems for me ('cannot allocate memory' during the last 'convergence' upgrade step) so I am also using 5GB for those (and 4GB swap as above for the overcloud nodes too).
thanks, marios.
I wouldn't recommend running heat-engine with <4 workers, even on 2 cores. Hi Sasha, Can you confirm that this issue is resolved by increasing the RAM on the undercloud node? Angus waiting with pm_ack for qe confirmation Confirmed. Lets close it.Will file a new one if reproduces on a stronger setup. |
rhel-osp-director: 7.3->8.0 overcloud upgrade fails on "rerunning the deploy command with additional parameters (point it to the checked-out tripleo-heat-templates)" Environment: openstack-tripleo-heat-templates-0.8.12-1.el7ost.noarch openstack-tripleo-heat-templates-kilo-0.8.12-1.el7ost.noarch openstack-puppet-modules-7.0.15-1.el7ost.noarch instack-undercloud-2.2.6-1.el7ost.noarch Steps to reproduce: 1. Deploy overcloud 7.3 2. Update the undercoud. 3. Complete this step openstack overcloud deploy --templates tripleo-heat-templates -e tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e tripleo-heat-templates/environments/puppet-pacemaker.yaml -e tripleo-heat-templates/environments/network-isolation.yaml -e tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e network_env.yaml -e tripleo-heat-templates/environments/major-upgrade-pacemaker-init.yaml -e rhos-release-8.yaml 4. == Re-run the deploy command with additional parameters (point it to the checked-out tripleo-heat-templates) command: openstack overcloud deploy --templates tripleo-heat-templates/ --control-scale 3 --compute-scale 2 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server 10.5.26.10 --timeout 90 -e tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml Result: 2016-03-21 21:48:16 [2]: UPDATE_COMPLETE state changed 2016-03-21 21:48:24 [overcloud-Controller-j6zrllsd5qwx]: UPDATE_FAILED Engine went down during stack UPDATE Stack overcloud UPDATE_FAILED Heat Stack update failed. Expected result: The command should complete successfully. [stack@instack ~]$ heat resource-list -n5 overcloud|grep -v COMPLETE +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+------------------------------------------------------------------------------------------+ | Controller | 351a6bf2-fe52-4e38-9c93-46c77ed0fa0c | OS::Heat::ResourceGroup | UPDATE_FAILED | 2016-03-21T21:42:00 | overcloud | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+------------------------------------------------------------------------------------------+ [stack@instack ~]$ heat resource-show overcloud Controller +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ | Property | Value | +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ | attributes | { | | | "attributes": null, | | | "refs": null | | | } | | creation_time | 2016-03-21T17:36:53 | | description | | | links | http://192.0.2.1:8004/v1/3f1ed20dad4b40359d0a890b05992b40/stacks/overcloud/f9610699-cb4c-49c2-a96e-d5c39519bf59/resources/Controller (self) | | | http://192.0.2.1:8004/v1/3f1ed20dad4b40359d0a890b05992b40/stacks/overcloud/f9610699-cb4c-49c2-a96e-d5c39519bf59 (stack) | | | http://192.0.2.1:8004/v1/3f1ed20dad4b40359d0a890b05992b40/stacks/overcloud-Controller-j6zrllsd5qwx/351a6bf2-fe52-4e38-9c93-46c77ed0fa0c (nested) | | logical_resource_id | Controller | | physical_resource_id | 351a6bf2-fe52-4e38-9c93-46c77ed0fa0c | | required_by | ControllerClusterDeployment | | | SwiftDevicesAndProxyConfig | | | ControllerNodesPostDeployment | | | ControllerBootstrapNodeDeployment | | | allNodesConfig | | | ControllerClusterConfig | | | ControllerIpListMap | | | ControllerAllNodesValidationDeployment | | | VipDeployment | | | UpdateWorkflow | | | AllNodesValidationConfig | | | CephClusterConfig | | | ControllerCephDeployment | | | ControllerAllNodesDeployment | | | AllNodesExtraConfig | | | ControllerBootstrapNodeConfig | | | ControllerSwiftDeployment | | resource_name | Controller | | resource_status | UPDATE_FAILED | | resource_status_reason | Engine went down during resource UPDATE | | resource_type | OS::Heat::ResourceGroup | | updated_time | 2016-03-21T21:42:00 | +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+