Bug 1324906

Summary: Looking at Openshift metrics at a higher level.
Product: OpenShift Container Platform Reporter: William Henry <whenry>
Component: RFEAssignee: Diógenes Rettori <drettori>
Status: CLOSED DEFERRED QA Contact: Xiaoli Tian <xtian>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: aos-bugs, erich, jforrest, jokerman, miburman, mmccomas, myllynen
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-12 21:40:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description William Henry 2016-04-07 14:37:20 UTC
Description of problem:

In large scale clusters it is often very difficult to be guided towards the right areas of focus. Looking at metrics from a pod by pod or container by container is not reasonable. The over "health" of the system needs to be looked at from a very top down level.

What are the higher level metrics we should be looking at? A better question is: why are we gathering these metrics? Then how are they driven by lower level data? Answering these questions can help us develop very powerful GUIs that can be easily navigated to find the appropriate data and make the appropriate actions (manual or automated). 

Always assume that Openshift is going to scale massively. If there were 2000 nodes in a cluster two years from now, how would a user navigate? What type of users? Cluster health users? Project users. Organization users in multi-tenent env. 
   

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Check with Matt Farrellee team.

Comment 1 Jeff Cantrill 2016-04-08 13:02:47 UTC
This is really an RFE that needs to be flushed out and prioritized.

Comment 2 Diógenes Rettori 2016-05-04 18:45:29 UTC
Card created for this: https://trello.com/c/RCEgBltZ

Comment 3 Eric Rich 2016-05-05 21:23:03 UTC
(In reply to Diógenes Rettori from comment #2)
> Card created for this: https://trello.com/c/RCEgBltZ

Is this something CFME should / could do?

Comment 4 Michael Burman 2016-05-12 19:41:29 UTC
The Trello card and the BZ seem to differ slightly in explanation, however I'll try to answer anyway. 

Hawkular-Metrics allows the combination of several time series and making aggregated statistics from those. The selection of time series can be done with the tag queries, in the case of Openshift/Kubernetes that means with the labels that are associated with the metrics.

With current structure of the data, everything up to a project-level metrics (using the namespace as search criteria and the type of metric as secondary, such as memory/usage) could be aggregated together. Other example could be all pods with the same container (by version or all versions).

Comment 5 Jessica Forrester 2016-07-05 13:22:49 UTC
We already had a card tracking this on the UI trello boards, consolidating the cards.