Bug 1460267 - RFE : Cluster Elasticity - we need a breathing cluster that adds nodes / removes nodes depending on demand
RFE : Cluster Elasticity - we need a breathing cluster that adds nodes / remo...
Status: NEW
Product: OpenShift Container Platform
Classification: Red Hat
Component: RFE (Show other bugs)
Unspecified Unspecified
unspecified Severity high
: ---
: 3.7.0
Assigned To: Brenton Leanhardt
Xiaoli Tian
Depends On:
  Show dependency treegraph
Reported: 2017-06-09 10:10 EDT by Lutz Lange
Modified: 2017-08-22 16:21 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Lutz Lange 2017-06-09 10:10:51 EDT
Description of problem:
Currently OpenShift Clusters are rather static in their infrastructure. They can be extended, but the way it is usually done and documented takes ~20 minuites and has to be triggered manually. I'm not sure about how long downsizing really takes and what the best way is to do it.

Running OpenShift Clusters under the CCSP Program, or running OpenShift on top of public Clouds like AWS, Azure, Google Compute results in costs that are associated with the Cluster Infrastructure. 

In the CCSP world, partners pay for each and every application node that is active in the cluster. They would like to pay only for those that they need, when they need the resources. Quick cluster size extension and reduction is what they ask for.

If you are running on top of public clouds every minute or hour that a node is up needs to be paid for. Not only in Red Hat subscriptions, but in fees for the infrastructure. 

Having a documented / tested Infrastructure Elasticity feature would help a lot here. I do get asked a lot lately for a breathing OpenShift Cluster that grows and shrinks with demand.

Version-Release number of selected component (if applicable):

Actual results:
hard to acheive feature. Can be done today, requires a lot of testing and manual implementation in each infrastructur.

Expected results (brain storming):
A Reference Architecture, Best practice document. Ansible add_node playbook and remove_node playbook. A guide how to speed up node add and remove. 

An imlemented use case with CloudForms. Where CloudForms watches the resources in OpenShift and triggers node add or node remove via these ansible playbooks.

A document on how to prepare Nodes and design your cluster to make this work.

Additional info:
This would have a very positive effect on OpenShift usage in public clouds and installations / offerings by service providers.

Note You need to log in before you can comment on or make changes to this bug.