Bug 1349309 - Lower interval collection rate (to improve monitoring for System Dashboard)
Summary: Lower interval collection rate (to improve monitoring for System Dashboard)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine-dwh
Classification: oVirt
Component: ETL
Version: 4.0.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.0.2
: 4.0.2
Assignee: Shirly Radco
QA Contact: Lukas Svaty
URL:
Whiteboard:
: 1108144 (view as bug list)
Depends On: 1344935
Blocks: 1360313
TreeView+ depends on / blocked
 
Reported: 2016-06-23 08:15 UTC by Shirly Radco
Modified: 2016-08-12 14:21 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
The sampling interval default time has been decreased from 1 minute to 20 seconds to provide more accurate calculations for the new dashboards.
Clone Of:
Environment:
Last Closed: 2016-08-12 14:21:51 UTC
oVirt Team: Metrics
Embargoed:
rule-engine: ovirt-4.0.z+
lsvaty: testing_plan_complete-
ylavi: planning_ack+
sradco: devel_ack+
lsvaty: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 60850 0 master MERGED history: lower sampling interval 2016-07-26 07:59:23 UTC
oVirt gerrit 60887 0 ovirt-engine-dwh-4.0 MERGED history: lower sampling interval 2016-07-26 08:00:55 UTC

Description Shirly Radco 2016-06-23 08:15:35 UTC
Description of problem:
In order to improve granularity of the monitoring used for System Dashboard we would like to lower the sampling interval from 1 minute.

Related bug: https://bugzilla.redhat.com/show_bug.cgi?id=1306626

In order to lower the interval, a change to engine heartbeat is also required.

Steps to Reproduce:
1. Install engine+dwh
2. Test in debug mode the time that sampling is taking is less then the interval.

Comment 1 Shirly Radco 2016-06-23 08:17:17 UTC
What is the new required collection interval? 20 /15 seconds?

Comment 2 Oved Ourfali 2016-06-23 08:41:02 UTC
I'm again not sure that's needed. Leaving need info on Dary. If it is changed we must make sure cfme isn't broken in any way.

Comment 3 Shirly Radco 2016-06-23 10:29:51 UTC
We need to lower the interval for better accuracy in the dashboards.
cfme afaik collects our samples and divide them by 4 in order for it to match vmware 20 seconds interval.
So we need to decide if to align with vmware on 20 seconds or have better accuracy at 15 seconds.
A fix for cfme will be required in both cases.

Comment 4 Oved Ourfali 2016-06-23 18:03:54 UTC
So what happens if the released cfme works with 4.0? It will just fail? 
We should carefully test that, and then decide if to change that. 

Anyway, I wouldn't do that for 4.0, but only for 4.1.
Lets discuss next week.

Comment 5 Yaniv Lavi 2016-06-26 13:32:51 UTC
Please look into changing this in 4.0 without breaking CFME collection. This is critical to have valuable monitoring info.

Comment 6 Shirly Radco 2016-07-03 14:12:22 UTC
2 options for implementation:
1. Adding to History_configurations the Interval between samples for cfme ease of use.
2. Have cfme read the minutes in status directly from the db - each sample holds the "minutes in status" column that represents the time between samples.

From the dwh perspective the 2nd option that does not involve changes to the dwh is preferred.
A update to ovirt-metrics in cfme is required in both cases.

Please update on how to proceed with this.

Comment 7 Oved Ourfali 2016-07-04 05:11:09 UTC
(In reply to Shirly Radco from comment #6)
> 2 options for implementation:
> 1. Adding to History_configurations the Interval between samples for cfme
> ease of use.
> 2. Have cfme read the minutes in status directly from the db - each sample
> holds the "minutes in status" column that represents the time between
> samples.
> 

Can you give an example of option #2 on how it looks like today and how it will look like when you do 20 seconds interval?

Comment 8 Shirly Radco 2016-07-04 07:30:13 UTC
In ovirt-engine-dwhd.conf.in the default is set to

# Samples Collection Interleave in Seconds
DWH_SAMPLING=60

Currently, Each record in samples table has "minutes_in_status" for default setup it is equal to 1.00 (1 minute).

For 20 seconds this will be 0.33.

Comment 9 Shirly Radco 2016-07-04 07:38:18 UTC
When lowering the sampling interval we will change this to seconds_in_status for more accurate calculation.

Comment 10 Oved Ourfali 2016-07-04 09:18:21 UTC
(In reply to Shirly Radco from comment #9)
> When lowering the sampling interval we will change this to seconds_in_status
> for more accurate calculation.

So you'll have both?
As CFME also work with older versions.

Comment 11 Shirly Radco 2016-07-04 09:35:31 UTC
I can maintain a calculated column of 'minutes_in_status'

Comment 12 Lukas Svaty 2016-07-29 08:28:53 UTC
verified in ovirt-engine-dwh-4.0.2-1.el7ev.noarch

Comment 13 Yaniv Lavi 2016-08-03 15:49:42 UTC
*** Bug 1108144 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.