Bug 1460267 - RFE : Cluster Elasticity - we need a breathing cluster that adds nodes / removes nodes depending on demand
Summary: RFE : Cluster Elasticity - we need a breathing cluster that adds nodes / remo...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RFE
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.1.0
Assignee: Brenton Leanhardt
QA Contact: Xiaoli Tian
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-09 14:10 UTC by Lutz Lange
Modified: 2021-06-10 12:25 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-18 18:22:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Lutz Lange 2017-06-09 14:10:51 UTC
Description of problem:
Currently OpenShift Clusters are rather static in their infrastructure. They can be extended, but the way it is usually done and documented takes ~20 minuites and has to be triggered manually. I'm not sure about how long downsizing really takes and what the best way is to do it.

Running OpenShift Clusters under the CCSP Program, or running OpenShift on top of public Clouds like AWS, Azure, Google Compute results in costs that are associated with the Cluster Infrastructure. 

In the CCSP world, partners pay for each and every application node that is active in the cluster. They would like to pay only for those that they need, when they need the resources. Quick cluster size extension and reduction is what they ask for.

If you are running on top of public clouds every minute or hour that a node is up needs to be paid for. Not only in Red Hat subscriptions, but in fees for the infrastructure. 

Having a documented / tested Infrastructure Elasticity feature would help a lot here. I do get asked a lot lately for a breathing OpenShift Cluster that grows and shrinks with demand.

Version-Release number of selected component (if applicable):
all

Actual results:
hard to acheive feature. Can be done today, requires a lot of testing and manual implementation in each infrastructur.

Expected results (brain storming):
A Reference Architecture, Best practice document. Ansible add_node playbook and remove_node playbook. A guide how to speed up node add and remove. 

An imlemented use case with CloudForms. Where CloudForms watches the resources in OpenShift and triggers node add or node remove via these ansible playbooks.

A document on how to prepare Nodes and design your cluster to make this work.

Additional info:
This would have a very positive effect on OpenShift usage in public clouds and installations / offerings by service providers.


Note You need to log in before you can comment on or make changes to this bug.