Bug 1324906 - Looking at Openshift metrics at a higher level.
Summary: Looking at Openshift metrics at a higher level.
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RFE
Version: 3.2.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Diógenes Rettori
QA Contact: Xiaoli Tian
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-07 14:37 UTC by William Henry
Modified: 2018-03-12 21:40 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-12 21:40:55 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description William Henry 2016-04-07 14:37:20 UTC
Description of problem:

In large scale clusters it is often very difficult to be guided towards the right areas of focus. Looking at metrics from a pod by pod or container by container is not reasonable. The over "health" of the system needs to be looked at from a very top down level.

What are the higher level metrics we should be looking at? A better question is: why are we gathering these metrics? Then how are they driven by lower level data? Answering these questions can help us develop very powerful GUIs that can be easily navigated to find the appropriate data and make the appropriate actions (manual or automated). 

Always assume that Openshift is going to scale massively. If there were 2000 nodes in a cluster two years from now, how would a user navigate? What type of users? Cluster health users? Project users. Organization users in multi-tenent env. 
   

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Check with Matt Farrellee team.

Comment 1 Jeff Cantrill 2016-04-08 13:02:47 UTC
This is really an RFE that needs to be flushed out and prioritized.

Comment 2 Diógenes Rettori 2016-05-04 18:45:29 UTC
Card created for this: https://trello.com/c/RCEgBltZ

Comment 3 Eric Rich 2016-05-05 21:23:03 UTC
(In reply to Diógenes Rettori from comment #2)
> Card created for this: https://trello.com/c/RCEgBltZ

Is this something CFME should / could do?

Comment 4 Michael Burman 2016-05-12 19:41:29 UTC
The Trello card and the BZ seem to differ slightly in explanation, however I'll try to answer anyway. 

Hawkular-Metrics allows the combination of several time series and making aggregated statistics from those. The selection of time series can be done with the tag queries, in the case of Openshift/Kubernetes that means with the labels that are associated with the metrics.

With current structure of the data, everything up to a project-level metrics (using the namespace as search criteria and the type of metric as secondary, such as memory/usage) could be aggregated together. Other example could be all pods with the same container (by version or all versions).

Comment 5 Jessica Forrester 2016-07-05 13:22:49 UTC
We already had a card tracking this on the UI trello boards, consolidating the cards.


Note You need to log in before you can comment on or make changes to this bug.