Bug 1292225 - [RFE] Non-disruptive scaling-out operations
Summary: [RFE] Non-disruptive scaling-out operations
Keywords:
Status: CLOSED DUPLICATE of bug 1395308
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: Upstream M1
: 12.0 (Pike)
Assignee: Hugh Brock
QA Contact: Shai Revivo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-12-16 19:08 UTC by Marius Cornea
Modified: 2016-12-14 21:03 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-14 21:03:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Marius Cornea 2015-12-16 19:08:09 UTC
Description of problem:
When scaling out an updated overcloud the cluster gets restarted and brings down the control plane for a few minutes (the issue has been described in BZ#1287812)

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-0.8.6-94.el7ost.noarch


Steps to Reproduce:
1. Deploy 7.1 by using 7.1 templates:
openstack overcloud deploy \
    --templates ~/templates/my-overcloud \
    --control-scale 3 --compute-scale 1 --ceph-storage-scale 3 \
    --ntp-server clock.redhat.com \
    --libvirt-type qemu \
    -e ~/templates/my-overcloud/environments/network-isolation.yaml \
    -e ~/templates/network-environment.yaml \
    -e ~/templates/firstboot-environment.yaml \
    -e ~/templates/ceph.yaml 

2. Update the undercloud to 7.2 and run the update procedure to 7.2 with 7.2 templates:
/usr/bin/yes '' | openstack overcloud update stack overcloud -i \
         --templates ~/templates/my-overcloud \
         -e ~/templates/my-overcloud/overcloud-resource-registry-puppet.yaml \
         -e ~/templates/my-overcloud/environments/network-isolation.yaml \
         -e ~/templates/network-environment.yaml \
         -e ~/templates/firstboot-environment.yaml \
         -e ~/templates/ceph.yaml \
         -e ~/templates/my-overcloud/environments/updates/update-from-vip.yaml \
         -e ~/templates/ctrlport.yaml

Wait for the update to complete

3. Scale out with an additional node:

openstack overcloud deploy \
    --templates ~/templates/my-overcloud \
    --control-scale 3 --compute-scale 2 --ceph-storage-scale 3 \
    --ntp-server clock.redhat.com \
    --libvirt-type qemu \
    -e ~/templates/my-overcloud/overcloud-resource-registry-puppet.yaml \
    -e ~/templates/my-overcloud/environments/network-isolation.yaml \
    -e ~/templates/network-environment.yaml \
    -e ~/templates/firstboot-environment.yaml \
    -e ~/templates/ceph.yaml \
    -e ~/templates/my-overcloud/environments/updates/update-from-vip.yaml \
    -e ~/templates/ctrlport.yaml

Actual results:
During the scale out the cluster gets restarted which brings down all the APIs exposed via HAProxy for a few minutes.

Expected results:
The APIs are available when adding a compute node.

Comment 1 James Slagle 2016-01-28 14:49:40 UTC
puppet will restart services even during a scale out attempt due to configuration changes. there is currently no synchronization in place to make sure that happens on one controller node at a time, so outages as you describe are likely to happen.

moving to osp8 as something to consider.

Comment 3 Hugh Brock 2016-02-05 12:30:11 UTC
RFE, removing blocker flag.

Comment 6 Jaromir Coufal 2016-12-14 19:36:45 UTC
Summary of the request:

When scaling out/down, assure that OpenStack services are not interrupted and that changes happen only on the node which is being scaled (not on all the nodes).

Comment 7 Jaromir Coufal 2016-12-14 21:03:12 UTC

*** This bug has been marked as a duplicate of bug 1395308 ***


Note You need to log in before you can comment on or make changes to this bug.