Bug 1469243 - [RFE] C&U rollups / NOR / Right-Size values don't accurately reflect realtime data
Summary: [RFE] C&U rollups / NOR / Right-Size values don't accurately reflect realtime...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: C&U Capacity and Utilization
Version: 5.8.0
Hardware: All
OS: All
high
medium
Target Milestone: GA
: cfme-future
Assignee: Gregg Tanzillo
QA Contact: Tasos Papaioannou
URL:
Whiteboard: c&u:NOR
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-10 17:23 UTC by Tasos Papaioannou
Modified: 2019-09-18 02:04 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-18 02:04:48 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Realtime memory usage histogram (11.26 KB, image/png)
2017-07-10 17:23 UTC, Tasos Papaioannou
no flags Details

Description Tasos Papaioannou 2017-07-10 17:23:35 UTC
Created attachment 1295899 [details]
Realtime memory usage histogram

Description of problem:

Hourly and daily rollups store the arithmetic mean of memory usage (mem_usage_absolute_average) and CPU usage (cpu_usagemhz_rate_average and cpu_usage_rate_average). The NOR high and low values are calculated as the mean +/- the sample standard deviation from the daily rollup averages.

These values aren't accurate measures of the hourly or daily usage, especially when the data aren't subject to a symmetric distribution. For example, the attached metrics-realtime-hist-20170703-20170710.png shows the distribution of a week's worth of realtime memory usage captured for a VM. The distribution is cut off by the minimum value of 0, so that the mean value (1.67) is skewed towards higher values than the median and mode (both 0.99).

The daily rollups have the following distribution for mem_usage_absolute_average:

daily avg min = 1.55
daily avg max = 1.72
daily avg avg = 1.65
daily avg stddev_samp = 0.05
low  = avg - stddev_samp = 1.60
high = avg + stddev_samp = 1.70

Compare these values to the percentiles calculated below from the realtime values:

mean median min 10%  20%   30%  40%  50%  60%  70%  80%  90%   max
1.67   0.99   0   0 0.99  0.99 0.99 0.99 1.99 1.99 2.99 2.99 14.99

median = 0.99
60th percentile = 1.99
70th percentile = 1.99
80th percentile = 2.99

The calculated 'low' value of 1.60 and the 'high' value of 1.70 are both between the 50th and 60th percentile of actual realtime memory usage. The 'conservative' right-size recommendations based on the 'high' value would actually be quite aggressive, bringing the available memory below the expected memory requirements >40% of the time. Similarly skewed estimates can be seen in calculations for CPU usage.

Instead of using the mean and standard deviation, something like the median (50th percentile) and other high/low percentile values (the 85th and 15th percentiles, for example) would be more representative of the actual usage.

Version-Release number of selected component (if applicable):

5.8.1.0.

How reproducible:

100%

Steps to Reproduce:
1.) Gather VM C&U data for several days.
2.) Compare realtime C&U to the NOR / Right-size data.

Actual results:

Avg/Max/High/Low values shown for NOR / Right-size do not reflect actual realtime usage values.

Expected results:

Avg/Max/High/Low values shown for NOR / Right-size reflect actual realtime usage values.

Additional info:


Note You need to log in before you can comment on or make changes to this bug.