Bug 1292225

Summary:	[RFE] Non-disruptive scaling-out operations
Product:	Red Hat OpenStack	Reporter:	Marius Cornea <mcornea>
Component:	rhosp-director	Assignee:	Hugh Brock <hbrock>
Status:	CLOSED DUPLICATE	QA Contact:	Shai Revivo <srevivo>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	7.0 (Kilo)	CC:	ggillies, jcoufal, jslagle, mburns, morazi, oblaut, rhel-osp-director-maint, tvignaud
Target Milestone:	Upstream M1	Keywords:	FutureFeature, Triaged
Target Release:	12.0 (Pike)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-12-14 21:03:12 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Marius Cornea 2015-12-16 19:08:09 UTC

Description of problem:
When scaling out an updated overcloud the cluster gets restarted and brings down the control plane for a few minutes (the issue has been described in BZ#1287812)

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-0.8.6-94.el7ost.noarch


Steps to Reproduce:
1. Deploy 7.1 by using 7.1 templates:
openstack overcloud deploy \
    --templates ~/templates/my-overcloud \
    --control-scale 3 --compute-scale 1 --ceph-storage-scale 3 \
    --ntp-server clock.redhat.com \
    --libvirt-type qemu \
    -e ~/templates/my-overcloud/environments/network-isolation.yaml \
    -e ~/templates/network-environment.yaml \
    -e ~/templates/firstboot-environment.yaml \
    -e ~/templates/ceph.yaml 

2. Update the undercloud to 7.2 and run the update procedure to 7.2 with 7.2 templates:
/usr/bin/yes '' | openstack overcloud update stack overcloud -i \
         --templates ~/templates/my-overcloud \
         -e ~/templates/my-overcloud/overcloud-resource-registry-puppet.yaml \
         -e ~/templates/my-overcloud/environments/network-isolation.yaml \
         -e ~/templates/network-environment.yaml \
         -e ~/templates/firstboot-environment.yaml \
         -e ~/templates/ceph.yaml \
         -e ~/templates/my-overcloud/environments/updates/update-from-vip.yaml \
         -e ~/templates/ctrlport.yaml

Wait for the update to complete

3. Scale out with an additional node:

openstack overcloud deploy \
    --templates ~/templates/my-overcloud \
    --control-scale 3 --compute-scale 2 --ceph-storage-scale 3 \
    --ntp-server clock.redhat.com \
    --libvirt-type qemu \
    -e ~/templates/my-overcloud/overcloud-resource-registry-puppet.yaml \
    -e ~/templates/my-overcloud/environments/network-isolation.yaml \
    -e ~/templates/network-environment.yaml \
    -e ~/templates/firstboot-environment.yaml \
    -e ~/templates/ceph.yaml \
    -e ~/templates/my-overcloud/environments/updates/update-from-vip.yaml \
    -e ~/templates/ctrlport.yaml

Actual results:
During the scale out the cluster gets restarted which brings down all the APIs exposed via HAProxy for a few minutes.

Expected results:
The APIs are available when adding a compute node.

Comment 1 James Slagle 2016-01-28 14:49:40 UTC

puppet will restart services even during a scale out attempt due to configuration changes. there is currently no synchronization in place to make sure that happens on one controller node at a time, so outages as you describe are likely to happen.

moving to osp8 as something to consider.

Comment 3 Hugh Brock 2016-02-05 12:30:11 UTC

RFE, removing blocker flag.

Comment 6 Jaromir Coufal 2016-12-14 19:36:45 UTC

Summary of the request:

When scaling out/down, assure that OpenStack services are not interrupted and that changes happen only on the node which is being scaled (not on all the nodes).

Comment 7 Jaromir Coufal 2016-12-14 21:03:12 UTC


*** This bug has been marked as a duplicate of bug 1395308 ***