Bug 1319926

Summary: rhel-osp-director: 7.3->8.0 Overcloud upgrade fails on running the deployment command with major-upgrade-pacemaker-init.yaml
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: rhosp-directorAssignee: Angus Thomas <athomas>
Status: CLOSED NOTABUG QA Contact: Arik Chernetsky <achernet>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: dbecker, jcoufal, mandreou, mburns, morazi, rhel-osp-director-maint, sasha
Target Milestone: ---   
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-24 11:49:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
heat-engine.log from the undercloud. none

Description Alexander Chuzhoy 2016-03-21 20:36:00 UTC
rhel-osp-director: 7.3->8.0 Overcloud upgrade fails on running the deployment command with major-upgrade-pacemaker-init.yaml


Environment:
openstack-tripleo-heat-templates-0.8.12-1.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.12-1.el7ost.noarch
openstack-puppet-modules-7.0.15-1.el7ost.noarch
instack-undercloud-2.2.6-1.el7ost.noarch


Steps to reproduce:
1. Deploy overcloud 7.3
2. Upgrade undercloud 7.3->8.0
3. Attempt to upgrade the overcloud nodes:
openstack overcloud deploy --templates tripleo-heat-templates -e tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e tripleo-heat-templates/environments/puppet-pacemaker.yaml -e tripleo-heat-templates/environments/network-isolation.yaml -e tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e network_env.yaml -e tripleo-heat-templates/environments/major-upgrade-pacemaker-init.yaml -e rhos-release-8.yaml


Result:
The command fails:

heat resource-list -n5 overcloud|grep -v COMPLETE
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+----------------------------------------------------------------------------------------------+
| resource_name                               | physical_resource_id                          | resource_type                                     | resource_status | updated_time        | stack_name                                                                                   |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+----------------------------------------------------------------------------------------------+
| UpdateWorkflow                              | eab49f7f-7985-47bd-9699-cd59e72dcac3          | OS::TripleO::Tasks::UpdateWorkflow                | CREATE_FAILED   | 2016-03-21T19:45:11 | overcloud                                                                                    |

| UpgradeInitControllerDeployment             | 668f203a-5857-40c6-b23c-c9041a8850b1          | OS::Heat::SoftwareDeploymentGroup                 | CREATE_FAILED   | 2016-03-21T19:45:15 | overcloud-UpdateWorkflow-2ji4kbavyqs5                                                        |

| 2                                           |                                               | OS::Heat::SoftwareDeployment                      | CREATE_FAILED   | 2016-03-21T19:45:58 | overcloud-UpdateWorkflow-2ji4kbavyqs5-UpgradeInitControllerDeployment-otsfhz7dpip4           |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+----------------------------------------------------------------------------------------------+

Comment 2 Alexander Chuzhoy 2016-03-21 20:38:46 UTC
Created attachment 1138790 [details]
heat-engine.log from the undercloud.

Comment 3 Alexander Chuzhoy 2016-03-21 21:47:46 UTC
The issue seems to be intermittent.

Comment 4 Marios Andreou 2016-03-22 08:07:44 UTC
Hi Sasha, had a look on the box you gave me the details for - first thing I noticed was you forgot to include the storage-environment.yaml in the update command (you have a ceph node and the deploy had -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml)

Can you try again and include all the environment files please (also note the info about undercloud resources at https://bugzilla.redhat.com/show_bug.cgi?id=1319944#c3 applies here too so you may want a slightly bigger undercloud, esp, more ram and some swap).

thanks, marios

Comment 5 Marios Andreou 2016-03-22 08:12:36 UTC
> Can you try again and include all the environment files please (also note
> the info about undercloud resources at
> https://bugzilla.redhat.com/show_bug.cgi?id=1319944#c3 applies here too so
> you may want a slightly bigger undercloud, esp, more ram and some swap).
>

FYI, I added some swap to this env and re-ran with:

openstack overcloud deploy --templates tripleo-heat-templates -e tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e tripleo-heat-templates/environments/puppet-pacemaker.yaml -e tripleo-heat-templates/environments/storage-environment.yaml -e tripleo-heat-templates/environments/network-isolation.yaml -e tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e network-environment.yaml -e tripleo-heat-templates/environments/major-upgrade-pacemaker-init.yaml -e rhos-release-8.yaml

2016-03-22 08:02:28 [overcloud]: UPDATE_IN_PROGRESS  Stack UPDATE started
....
2016-03-22 08:08:41 [overcloud-AllNodesExtraConfig-asirvl26u7sh]: UPDATE_COMPLETE  Stack UPDATE completed successfully


and it completed OK in ~ 6 minutes - I confirmed the presence of the /root/tripleo_upgrade_node.sh on the compute and ceph node

Comment 6 Angus Thomas 2016-03-22 18:09:33 UTC
Hi Sasha,

Can you confirm that Marios' steps have resolved this?



Angus

Comment 7 Jaromir Coufal 2016-03-23 19:51:27 UTC
waiting with pm_ack on qe feedback

Comment 8 Alexander Chuzhoy 2016-03-24 01:45:52 UTC
Confirm.
Let's close it.