Description of problem: When scaling out an updated overcloud the cluster gets restarted and brings down the control plane for a few minutes (the issue has been described in BZ#1287812) Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-0.8.6-94.el7ost.noarch Steps to Reproduce: 1. Deploy 7.1 by using 7.1 templates: openstack overcloud deploy \ --templates ~/templates/my-overcloud \ --control-scale 3 --compute-scale 1 --ceph-storage-scale 3 \ --ntp-server clock.redhat.com \ --libvirt-type qemu \ -e ~/templates/my-overcloud/environments/network-isolation.yaml \ -e ~/templates/network-environment.yaml \ -e ~/templates/firstboot-environment.yaml \ -e ~/templates/ceph.yaml 2. Update the undercloud to 7.2 and run the update procedure to 7.2 with 7.2 templates: /usr/bin/yes '' | openstack overcloud update stack overcloud -i \ --templates ~/templates/my-overcloud \ -e ~/templates/my-overcloud/overcloud-resource-registry-puppet.yaml \ -e ~/templates/my-overcloud/environments/network-isolation.yaml \ -e ~/templates/network-environment.yaml \ -e ~/templates/firstboot-environment.yaml \ -e ~/templates/ceph.yaml \ -e ~/templates/my-overcloud/environments/updates/update-from-vip.yaml \ -e ~/templates/ctrlport.yaml Wait for the update to complete 3. Scale out with an additional node: openstack overcloud deploy \ --templates ~/templates/my-overcloud \ --control-scale 3 --compute-scale 2 --ceph-storage-scale 3 \ --ntp-server clock.redhat.com \ --libvirt-type qemu \ -e ~/templates/my-overcloud/overcloud-resource-registry-puppet.yaml \ -e ~/templates/my-overcloud/environments/network-isolation.yaml \ -e ~/templates/network-environment.yaml \ -e ~/templates/firstboot-environment.yaml \ -e ~/templates/ceph.yaml \ -e ~/templates/my-overcloud/environments/updates/update-from-vip.yaml \ -e ~/templates/ctrlport.yaml Actual results: During the scale out the cluster gets restarted which brings down all the APIs exposed via HAProxy for a few minutes. Expected results: The APIs are available when adding a compute node.
puppet will restart services even during a scale out attempt due to configuration changes. there is currently no synchronization in place to make sure that happens on one controller node at a time, so outages as you describe are likely to happen. moving to osp8 as something to consider.
RFE, removing blocker flag.
Summary of the request: When scaling out/down, assure that OpenStack services are not interrupted and that changes happen only on the node which is being scaled (not on all the nodes).
*** This bug has been marked as a duplicate of bug 1395308 ***