Bug 1292225

Summary: [RFE] Non-disruptive scaling-out operations
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: rhosp-directorAssignee: Hugh Brock <hbrock>
Status: CLOSED DUPLICATE QA Contact: Shai Revivo <srevivo>
Severity: urgent Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: ggillies, jcoufal, jslagle, mburns, morazi, oblaut, rhel-osp-director-maint, tvignaud
Target Milestone: Upstream M1Keywords: FutureFeature, Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-14 21:03:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2015-12-16 19:08:09 UTC
Description of problem:
When scaling out an updated overcloud the cluster gets restarted and brings down the control plane for a few minutes (the issue has been described in BZ#1287812)

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-0.8.6-94.el7ost.noarch


Steps to Reproduce:
1. Deploy 7.1 by using 7.1 templates:
openstack overcloud deploy \
    --templates ~/templates/my-overcloud \
    --control-scale 3 --compute-scale 1 --ceph-storage-scale 3 \
    --ntp-server clock.redhat.com \
    --libvirt-type qemu \
    -e ~/templates/my-overcloud/environments/network-isolation.yaml \
    -e ~/templates/network-environment.yaml \
    -e ~/templates/firstboot-environment.yaml \
    -e ~/templates/ceph.yaml 

2. Update the undercloud to 7.2 and run the update procedure to 7.2 with 7.2 templates:
/usr/bin/yes '' | openstack overcloud update stack overcloud -i \
         --templates ~/templates/my-overcloud \
         -e ~/templates/my-overcloud/overcloud-resource-registry-puppet.yaml \
         -e ~/templates/my-overcloud/environments/network-isolation.yaml \
         -e ~/templates/network-environment.yaml \
         -e ~/templates/firstboot-environment.yaml \
         -e ~/templates/ceph.yaml \
         -e ~/templates/my-overcloud/environments/updates/update-from-vip.yaml \
         -e ~/templates/ctrlport.yaml

Wait for the update to complete

3. Scale out with an additional node:

openstack overcloud deploy \
    --templates ~/templates/my-overcloud \
    --control-scale 3 --compute-scale 2 --ceph-storage-scale 3 \
    --ntp-server clock.redhat.com \
    --libvirt-type qemu \
    -e ~/templates/my-overcloud/overcloud-resource-registry-puppet.yaml \
    -e ~/templates/my-overcloud/environments/network-isolation.yaml \
    -e ~/templates/network-environment.yaml \
    -e ~/templates/firstboot-environment.yaml \
    -e ~/templates/ceph.yaml \
    -e ~/templates/my-overcloud/environments/updates/update-from-vip.yaml \
    -e ~/templates/ctrlport.yaml

Actual results:
During the scale out the cluster gets restarted which brings down all the APIs exposed via HAProxy for a few minutes.

Expected results:
The APIs are available when adding a compute node.

Comment 1 James Slagle 2016-01-28 14:49:40 UTC
puppet will restart services even during a scale out attempt due to configuration changes. there is currently no synchronization in place to make sure that happens on one controller node at a time, so outages as you describe are likely to happen.

moving to osp8 as something to consider.

Comment 3 Hugh Brock 2016-02-05 12:30:11 UTC
RFE, removing blocker flag.

Comment 6 Jaromir Coufal 2016-12-14 19:36:45 UTC
Summary of the request:

When scaling out/down, assure that OpenStack services are not interrupted and that changes happen only on the node which is being scaled (not on all the nodes).

Comment 7 Jaromir Coufal 2016-12-14 21:03:12 UTC

*** This bug has been marked as a duplicate of bug 1395308 ***