Bug 1349309

Summary: Lower interval collection rate (to improve monitoring for System Dashboard)
Product: [oVirt] ovirt-engine-dwh Reporter: Shirly Radco <sradco>
Component: ETLAssignee: Shirly Radco <sradco>
Status: CLOSED CURRENTRELEASE QA Contact: Lukas Svaty <lsvaty>
Severity: high Docs Contact:
Priority: high    
Version: 4.0.0CC: asegundo, bgraveno, bugs, lsvaty, oourfali, sbonazzo, sradco, ylavi
Target Milestone: ovirt-4.0.2Keywords: Improvement
Target Release: 4.0.2Flags: rule-engine: ovirt-4.0.z+
lsvaty: testing_plan_complete-
ylavi: planning_ack+
sradco: devel_ack+
lsvaty: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
The sampling interval default time has been decreased from 1 minute to 20 seconds to provide more accurate calculations for the new dashboards.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-12 14:21:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Metrics RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1344935    
Bug Blocks: 1360313    

Description Shirly Radco 2016-06-23 08:15:35 UTC
Description of problem:
In order to improve granularity of the monitoring used for System Dashboard we would like to lower the sampling interval from 1 minute.

Related bug: https://bugzilla.redhat.com/show_bug.cgi?id=1306626

In order to lower the interval, a change to engine heartbeat is also required.

Steps to Reproduce:
1. Install engine+dwh
2. Test in debug mode the time that sampling is taking is less then the interval.

Comment 1 Shirly Radco 2016-06-23 08:17:17 UTC
What is the new required collection interval? 20 /15 seconds?

Comment 2 Oved Ourfali 2016-06-23 08:41:02 UTC
I'm again not sure that's needed. Leaving need info on Dary. If it is changed we must make sure cfme isn't broken in any way.

Comment 3 Shirly Radco 2016-06-23 10:29:51 UTC
We need to lower the interval for better accuracy in the dashboards.
cfme afaik collects our samples and divide them by 4 in order for it to match vmware 20 seconds interval.
So we need to decide if to align with vmware on 20 seconds or have better accuracy at 15 seconds.
A fix for cfme will be required in both cases.

Comment 4 Oved Ourfali 2016-06-23 18:03:54 UTC
So what happens if the released cfme works with 4.0? It will just fail? 
We should carefully test that, and then decide if to change that. 

Anyway, I wouldn't do that for 4.0, but only for 4.1.
Lets discuss next week.

Comment 5 Yaniv Lavi 2016-06-26 13:32:51 UTC
Please look into changing this in 4.0 without breaking CFME collection. This is critical to have valuable monitoring info.

Comment 6 Shirly Radco 2016-07-03 14:12:22 UTC
2 options for implementation:
1. Adding to History_configurations the Interval between samples for cfme ease of use.
2. Have cfme read the minutes in status directly from the db - each sample holds the "minutes in status" column that represents the time between samples.

From the dwh perspective the 2nd option that does not involve changes to the dwh is preferred.
A update to ovirt-metrics in cfme is required in both cases.

Please update on how to proceed with this.

Comment 7 Oved Ourfali 2016-07-04 05:11:09 UTC
(In reply to Shirly Radco from comment #6)
> 2 options for implementation:
> 1. Adding to History_configurations the Interval between samples for cfme
> ease of use.
> 2. Have cfme read the minutes in status directly from the db - each sample
> holds the "minutes in status" column that represents the time between
> samples.
> 

Can you give an example of option #2 on how it looks like today and how it will look like when you do 20 seconds interval?

Comment 8 Shirly Radco 2016-07-04 07:30:13 UTC
In ovirt-engine-dwhd.conf.in the default is set to

# Samples Collection Interleave in Seconds
DWH_SAMPLING=60

Currently, Each record in samples table has "minutes_in_status" for default setup it is equal to 1.00 (1 minute).

For 20 seconds this will be 0.33.

Comment 9 Shirly Radco 2016-07-04 07:38:18 UTC
When lowering the sampling interval we will change this to seconds_in_status for more accurate calculation.

Comment 10 Oved Ourfali 2016-07-04 09:18:21 UTC
(In reply to Shirly Radco from comment #9)
> When lowering the sampling interval we will change this to seconds_in_status
> for more accurate calculation.

So you'll have both?
As CFME also work with older versions.

Comment 11 Shirly Radco 2016-07-04 09:35:31 UTC
I can maintain a calculated column of 'minutes_in_status'

Comment 12 Lukas Svaty 2016-07-29 08:28:53 UTC
verified in ovirt-engine-dwh-4.0.2-1.el7ev.noarch

Comment 13 Yaniv Lavi 2016-08-03 15:49:42 UTC
*** Bug 1108144 has been marked as a duplicate of this bug. ***