Bug 1323987

Summary: [docs] [director] Rerunning the initial overcloud deploy command from an upgraded undercloud fails
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: documentationAssignee: Dan Macpherson <dmacpher>
Status: CLOSED CURRENTRELEASE QA Contact: RHOS Documentation Team <rhos-docs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: brad, dbecker, dmacpher, dyasny, jcoufal, mandreou, mburns, mcornea, morazi, rhel-osp-director-maint, srevivo
Target Milestone: ---Keywords: Documentation
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-03 02:11:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Marius Cornea 2016-04-05 09:35:48 UTC
Description of problem:
Rerunning overcloud deploy command from an upgraded undercloud fails and leaves the overcloud in a not functional state. 

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-0.8.14-5.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-5.el7ost.noarch

How reproducible:


Steps to Reproduce:
1. Deploy 7.3 overcloud, ~/templates/my-overcloud-7.3 is a copy of the templates in /usr/share/openstack-tripleo-heat-templates/

export THT=~/templates/my-overcloud-7.3
openstack overcloud deploy --templates $THT \
-e $THT/environments/network-isolation-v6.yaml \
-e ~/templates/network-environment-7.3-v6.yaml \
-e $THT/environments/storage-environment.yaml \
-e ~/templates/enable-tls.yaml \
-e ~/templates/inject-trust-anchor.yaml \
--control-scale 3 \
--compute-scale 1 \
--ceph-storage-scale 2 \
--ntp-server clock.redhat.com \
--libvirt-type qemu

2. Upgrade undercloud

3. Rerun the deploy command:
export THT=~/templates/my-overcloud-7.3
openstack overcloud deploy --templates $THT \
-e $THT/environments/network-isolation-v6.yaml \
-e ~/templates/network-environment-7.3-v6.yaml \
-e $THT/environments/storage-environment.yaml \
-e ~/templates/enable-tls.yaml \
-e ~/templates/inject-trust-anchor.yaml \
--control-scale 3 \
--compute-scale 1 \
--ceph-storage-scale 2 \
--ntp-server clock.redhat.com \
--libvirt-type qemu

Actual results:
The deploy command fails and the overcloud is left in a non functional state.

Expected results:
The deploy command succeeds and the overcloud is accessible.

Comment 2 Marius Cornea 2016-04-05 10:44:07 UTC
The problem appears to have the same cause as BZ#1321132 but since I was using a copy of the 7.3 templates the patch for it wasn't present.

Comment 3 Marios Andreou 2016-04-05 11:51:54 UTC
To be clear, this is a mixed version issue: use an upgraded undercloud to manage an existing overcloud. To do this, the deployer uses backed up versions of the (pre-undercloud upgrade) templates and not the newly upgraded openstack-tripleo-heat-templates package installed in /usr/share.

However, after the undercloud upgrade, the tripleo client is also updated (i.e. not just the templates) and it is that which causes the issue being seen here. Mcornea is right, it is the same behaviour/root as in BZ#1321132 - the tripleoclient now sets random passwords for rabbit and since the fix from https://review.openstack.org/#/c/298834/ isn't applied puppet tries and fails to restart the neutron-server service.

I think the workaround is to override the tripleo-passwords-file and use it to set the values to whatever your existing overcloud has. Given we are using backed up templates since we don't want to update/upgrade/change our overcloud just yet, this makes sense to me at least, that we want to maintain the existing passwords.

Alternatively we try and backport the fix from https://review.openstack.org/#/c/298834 - though we have the added complication of also having to work out how to make the post-puppet services restart happen. We do it for upgrades by setting the update_identifier like https://review.openstack.org/#/c/297175/ - but it wouldn't happen during the 'normal' stack update attempted here.

Comment 4 Brad P. Crochet 2016-04-05 15:00:59 UTC
Recommended workaround:

If the undercloud is being upgraded to OSPd 8, and the admin wishes to continue to manage via deploy a 7.3 overcloud, the following will be necessary.

1. echo "OVERCLOUD_RABBITMQ_PASSWORD=guest" >> $HOME/tripleo-overcloud-passwords
2. Point all templates and environments to /usr/share/openstack-tripleo-heat-templates/kilo when running additional deployments.

Comment 5 Dan Macpherson 2016-04-26 02:02:35 UTC
I included the workaround from comment #4 in 10.1. Important Pre-Upgrade Notes:

https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/director-installation-and-usage/chapter-10-upgrading-the-environment

Marius and Brad -- Is there anything else we need to document for this issue? Is the workaround the only part needing documentation?

Comment 6 Marius Cornea 2016-04-26 07:26:17 UTC
Thanks, Dan. It looks good to me, the workaround should be enough to cover the initial report.

Comment 7 Dan Macpherson 2016-05-03 02:11:56 UTC
Cool, I'll close this BZ down, but please feel free to reopen it if further changes are required.