Hide Forgot
Description of problem: Rerunning overcloud deploy command from an upgraded undercloud fails and leaves the overcloud in a not functional state. Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-0.8.14-5.el7ost.noarch openstack-tripleo-heat-templates-kilo-0.8.14-5.el7ost.noarch How reproducible: Steps to Reproduce: 1. Deploy 7.3 overcloud, ~/templates/my-overcloud-7.3 is a copy of the templates in /usr/share/openstack-tripleo-heat-templates/ export THT=~/templates/my-overcloud-7.3 openstack overcloud deploy --templates $THT \ -e $THT/environments/network-isolation-v6.yaml \ -e ~/templates/network-environment-7.3-v6.yaml \ -e $THT/environments/storage-environment.yaml \ -e ~/templates/enable-tls.yaml \ -e ~/templates/inject-trust-anchor.yaml \ --control-scale 3 \ --compute-scale 1 \ --ceph-storage-scale 2 \ --ntp-server clock.redhat.com \ --libvirt-type qemu 2. Upgrade undercloud 3. Rerun the deploy command: export THT=~/templates/my-overcloud-7.3 openstack overcloud deploy --templates $THT \ -e $THT/environments/network-isolation-v6.yaml \ -e ~/templates/network-environment-7.3-v6.yaml \ -e $THT/environments/storage-environment.yaml \ -e ~/templates/enable-tls.yaml \ -e ~/templates/inject-trust-anchor.yaml \ --control-scale 3 \ --compute-scale 1 \ --ceph-storage-scale 2 \ --ntp-server clock.redhat.com \ --libvirt-type qemu Actual results: The deploy command fails and the overcloud is left in a non functional state. Expected results: The deploy command succeeds and the overcloud is accessible.
The problem appears to have the same cause as BZ#1321132 but since I was using a copy of the 7.3 templates the patch for it wasn't present.
To be clear, this is a mixed version issue: use an upgraded undercloud to manage an existing overcloud. To do this, the deployer uses backed up versions of the (pre-undercloud upgrade) templates and not the newly upgraded openstack-tripleo-heat-templates package installed in /usr/share. However, after the undercloud upgrade, the tripleo client is also updated (i.e. not just the templates) and it is that which causes the issue being seen here. Mcornea is right, it is the same behaviour/root as in BZ#1321132 - the tripleoclient now sets random passwords for rabbit and since the fix from https://review.openstack.org/#/c/298834/ isn't applied puppet tries and fails to restart the neutron-server service. I think the workaround is to override the tripleo-passwords-file and use it to set the values to whatever your existing overcloud has. Given we are using backed up templates since we don't want to update/upgrade/change our overcloud just yet, this makes sense to me at least, that we want to maintain the existing passwords. Alternatively we try and backport the fix from https://review.openstack.org/#/c/298834 - though we have the added complication of also having to work out how to make the post-puppet services restart happen. We do it for upgrades by setting the update_identifier like https://review.openstack.org/#/c/297175/ - but it wouldn't happen during the 'normal' stack update attempted here.
Recommended workaround: If the undercloud is being upgraded to OSPd 8, and the admin wishes to continue to manage via deploy a 7.3 overcloud, the following will be necessary. 1. echo "OVERCLOUD_RABBITMQ_PASSWORD=guest" >> $HOME/tripleo-overcloud-passwords 2. Point all templates and environments to /usr/share/openstack-tripleo-heat-templates/kilo when running additional deployments.
I included the workaround from comment #4 in 10.1. Important Pre-Upgrade Notes: https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/director-installation-and-usage/chapter-10-upgrading-the-environment Marius and Brad -- Is there anything else we need to document for this issue? Is the workaround the only part needing documentation?
Thanks, Dan. It looks good to me, the workaround should be enough to cover the initial report.
Cool, I'll close this BZ down, but please feel free to reopen it if further changes are required.