Bug 1465501

Summary: [RFE] Provide an OpenShift application node cluster object
Product: Red Hat CloudForms Management Engine Reporter: Peter McGowan <pmcgowan>
Component: ProvidersAssignee: Loic Avenel <lavenel>
Status: CLOSED WONTFIX QA Contact: Dave Johnson <dajohnso>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 5.8.0CC: fdupont, fsimonce, gblomqui, jfrey, jhardy, jmarc, ncatling, obarenbo, pmcgowan, slopez
Target Milestone: GAKeywords: FutureFeature
Target Release: cfme-future   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-01 18:38:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: Container Management Target Upstream Version:
Embargoed:
Bug Depends On: 1490131    
Bug Blocks:    
Attachments:
Description Flags
Screenshot from 2017-06-30 18-10-56.png
none
Cluster utilization alert screenshot none

Description Peter McGowan 2017-06-27 14:20:44 UTC
Description of problem:
We can currently see individual OpenStack nodes in the WebUI, however we have no concept of which nodes make up the application node cluster. 

If we could have a way of modelling OpenShift nodes as a cluster, (rolling up node C&U metrics into cluster metrics), we could detect utilization (and trends) on this cluster as a single entity. We could make scaling decisions such as deploying additional nodes, or scaling-back nodes if the cluster was idle. Scaling back is particularly important when running OpenShift in a public cloud.

We would also need a way of generating a control alert based on real-time performance of this node cluster.

Comment 2 Peter McGowan 2017-06-27 14:22:14 UTC
*** Bug 1465499 has been marked as a duplicate of this bug. ***

Comment 3 Federico Simoncelli 2017-06-27 22:24:02 UTC
(In reply to Peter McGowan from comment #0)
> If we could have a way of modelling OpenShift nodes as a cluster, (rolling
> up node C&U metrics into cluster metrics),

Peter we do have rollups for the cluster. If you go into the C&U of the provider you should see the graphs.
Each provider is a cluster, I don't think we need a new object to represent it (or at least we should have a good reason to introduce it).

Is that enough for this RFE?

Comment 4 Peter McGowan 2017-06-28 08:04:03 UTC
Hi Federico

I only see this for the underlying provider (e.g. VMware) cluster, not the cluster of OpenShift app nodes (which may be VMs running on the underlying hardware).

To be able to make scaling decisions for the OpenShift app nodes, we need to be able to see utilization stats for the group of app nodes (possibly VMs) that make up the app node cluster.

Comment 5 Federico Simoncelli 2017-06-30 17:17:20 UTC
Created attachment 1293313 [details]
Screenshot from 2017-06-30 18-10-56.png

(In reply to Peter McGowan from comment #4)
> Hi Federico
> 
> I only see this for the underlying provider (e.g. VMware) cluster, not the
> cluster of OpenShift app nodes (which may be VMs running on the underlying
> hardware).

Peter, the screenshot I am attaching shows you the OpenShift C&U of the entire provider (cluster).

As mentioned above it's in the OpenShift Provider page under "Monitoring => Utilization".
Let me know if you can't find it.

Comment 6 Federico Simoncelli 2017-06-30 17:19:05 UTC
(In reply to Peter McGowan from comment #4)
> To be able to make scaling decisions for the OpenShift app nodes, we need to
> be able to see utilization stats for the group of app nodes (possibly VMs)
> that make up the app node cluster.

This feature (nodes elasticity) is currently scheduled as an OpenShift feature.

Comment 7 Peter McGowan 2017-07-03 09:58:09 UTC
Created attachment 1293787 [details]
Cluster utilization alert screenshot

It would be useful to be able to create a cluster utilization alert, as shown in the screenshot. This doesn't seem to be possible as the OpenShift cluster doesn't appear in the list of Cluster / Deployment Roles (presumably because it's not of type ems_cluster).

Comment 8 Loic Avenel 2017-07-03 12:20:38 UTC
(In reply to Peter McGowan from comment #7)
> Created attachment 1293787 [details]
> Cluster utilization alert screenshot
> 
> It would be useful to be able to create a cluster utilization alert, as
> shown in the screenshot. This doesn't seem to be possible as the OpenShift
> cluster doesn't appear in the list of Cluster / Deployment Roles (presumably
> because it's not of type ems_cluster).

Peter, to clarify, you want to use current CloudForms Alert mechanism to generate an Alert and react to it?

Comment 9 Peter McGowan 2017-07-03 12:27:09 UTC
Loic, that's correct. 

Specifically I'd like to be able to create an alert on real-time performance utilization of an OpenShift cluster. This alert might possibly be used to scale out (or scale back) the cluster, but might also be useful to send into an external monitoring system.

Comment 10 Fabien Dupont 2017-09-05 20:03:41 UTC
(In reply to Federico Simoncelli from comment #6)
> (In reply to Peter McGowan from comment #4)
> > To be able to make scaling decisions for the OpenShift app nodes, we need to
> > be able to see utilization stats for the group of app nodes (possibly VMs)
> > that make up the app node cluster.
> 
> This feature (nodes elasticity) is currently scheduled as an OpenShift
> feature.

Federico,

Do you have any pointer to that feature ? I would like to understand the underlying technical implementation, to see how we could leverage it from CloudForms.

Thanks.

Comment 11 Federico Simoncelli 2017-09-05 21:40:46 UTC
(In reply to Fabien Dupont from comment #10)
> (In reply to Federico Simoncelli from comment #6)
> > (In reply to Peter McGowan from comment #4)
> > > To be able to make scaling decisions for the OpenShift app nodes, we need to
> > > be able to see utilization stats for the group of app nodes (possibly VMs)
> > > that make up the app node cluster.
> > 
> > This feature (nodes elasticity) is currently scheduled as an OpenShift
> > feature.
> 
> Federico,
> 
> Do you have any pointer to that feature ? I would like to understand the
> underlying technical implementation, to see how we could leverage it from
> CloudForms.

Hi Fabian,
  you can find the documentation here:

https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

Comment 12 Fabien Dupont 2017-09-07 17:10:17 UTC
Thank you Federico. My first remark is that it only covers AWS and GCE. Hence, it doesn't take into account other infrastructures, such as Microsoft Azure, OpenStack and virtualization platforms. We are able to manage all these environments. IMHO, that would be of great value to provide our customers the ability to (auto) scale their OpenShift platforms in a consistent way across all platforms.

To achieve that, we have to identify the relevant metrics: CPU, memory, storage I/O, network I/O, pods per nodes (correlated to overcommit), etc... and events: impossible to deploy a container because the nodes with its node selector are full, etc... to have a clear status of the whole platform and make a decision. 

Then, we can trigger the automation to achieve this: scale out/in the nodes, relabel empty nodes ?, run the scaleup.yml playbook, evacuate and delete nodes, etc...