Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1349309 - Lower interval collection rate (to improve monitoring for System Dashboard)
Lower interval collection rate (to improve monitoring for System Dashboard)
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine-dwh
Classification: oVirt
Component: ETL (Show other bugs)
4.0.0
Unspecified Unspecified
high Severity high (vote)
: ovirt-4.0.2
: 4.0.2
Assigned To: Shirly Radco
Lukas Svaty
: Improvement
: 1108144 (view as bug list)
Depends On: 1344935
Blocks: 1360313
  Show dependency treegraph
 
Reported: 2016-06-23 04:15 EDT by Shirly Radco
Modified: 2016-08-12 10:21 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
The sampling interval default time has been decreased from 1 minute to 20 seconds to provide more accurate calculations for the new dashboards.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-08-12 10:21:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Metrics
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.0.z+
lsvaty: testing_plan_complete-
ylavi: planning_ack+
sradco: devel_ack+
lsvaty: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 60850 master MERGED history: lower sampling interval 2016-07-26 03:59 EDT
oVirt gerrit 60887 ovirt-engine-dwh-4.0 MERGED history: lower sampling interval 2016-07-26 04:00 EDT

  None (edit)
Description Shirly Radco 2016-06-23 04:15:35 EDT
Description of problem:
In order to improve granularity of the monitoring used for System Dashboard we would like to lower the sampling interval from 1 minute.

Related bug: https://bugzilla.redhat.com/show_bug.cgi?id=1306626

In order to lower the interval, a change to engine heartbeat is also required.

Steps to Reproduce:
1. Install engine+dwh
2. Test in debug mode the time that sampling is taking is less then the interval.
Comment 1 Shirly Radco 2016-06-23 04:17:17 EDT
What is the new required collection interval? 20 /15 seconds?
Comment 2 Oved Ourfali 2016-06-23 04:41:02 EDT
I'm again not sure that's needed. Leaving need info on Dary. If it is changed we must make sure cfme isn't broken in any way.
Comment 3 Shirly Radco 2016-06-23 06:29:51 EDT
We need to lower the interval for better accuracy in the dashboards.
cfme afaik collects our samples and divide them by 4 in order for it to match vmware 20 seconds interval.
So we need to decide if to align with vmware on 20 seconds or have better accuracy at 15 seconds.
A fix for cfme will be required in both cases.
Comment 4 Oved Ourfali 2016-06-23 14:03:54 EDT
So what happens if the released cfme works with 4.0? It will just fail? 
We should carefully test that, and then decide if to change that. 

Anyway, I wouldn't do that for 4.0, but only for 4.1.
Lets discuss next week.
Comment 5 Yaniv Lavi 2016-06-26 09:32:51 EDT
Please look into changing this in 4.0 without breaking CFME collection. This is critical to have valuable monitoring info.
Comment 6 Shirly Radco 2016-07-03 10:12:22 EDT
2 options for implementation:
1. Adding to History_configurations the Interval between samples for cfme ease of use.
2. Have cfme read the minutes in status directly from the db - each sample holds the "minutes in status" column that represents the time between samples.

From the dwh perspective the 2nd option that does not involve changes to the dwh is preferred.
A update to ovirt-metrics in cfme is required in both cases.

Please update on how to proceed with this.
Comment 7 Oved Ourfali 2016-07-04 01:11:09 EDT
(In reply to Shirly Radco from comment #6)
> 2 options for implementation:
> 1. Adding to History_configurations the Interval between samples for cfme
> ease of use.
> 2. Have cfme read the minutes in status directly from the db - each sample
> holds the "minutes in status" column that represents the time between
> samples.
> 

Can you give an example of option #2 on how it looks like today and how it will look like when you do 20 seconds interval?
Comment 8 Shirly Radco 2016-07-04 03:30:13 EDT
In ovirt-engine-dwhd.conf.in the default is set to

# Samples Collection Interleave in Seconds
DWH_SAMPLING=60

Currently, Each record in samples table has "minutes_in_status" for default setup it is equal to 1.00 (1 minute).

For 20 seconds this will be 0.33.
Comment 9 Shirly Radco 2016-07-04 03:38:18 EDT
When lowering the sampling interval we will change this to seconds_in_status for more accurate calculation.
Comment 10 Oved Ourfali 2016-07-04 05:18:21 EDT
(In reply to Shirly Radco from comment #9)
> When lowering the sampling interval we will change this to seconds_in_status
> for more accurate calculation.

So you'll have both?
As CFME also work with older versions.
Comment 11 Shirly Radco 2016-07-04 05:35:31 EDT
I can maintain a calculated column of 'minutes_in_status'
Comment 12 Lukas Svaty 2016-07-29 04:28:53 EDT
verified in ovirt-engine-dwh-4.0.2-1.el7ev.noarch
Comment 13 Yaniv Lavi 2016-08-03 11:49:42 EDT
*** Bug 1108144 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.