Bug 1478859 - [downstream clone] DWH sampling is too high - switch back to 60s
Summary: [downstream clone] DWH sampling is too high - switch back to 60s
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-dwh
Version: 4.1.5
Hardware: Unspecified
OS: Unspecified
low
high
Target Milestone: ovirt-4.1.6
: ---
Assignee: Shirly Radco
QA Contact: Lukas Svaty
URL:
Whiteboard:
Depends On: 1395608 1490272
Blocks: 1398553
TreeView+ depends on / blocked
 
Reported: 2017-08-07 10:44 UTC by Shirly Radco
Modified: 2021-05-01 16:50 UTC (History)
18 users (show)

Fixed In Version: ovirt-engine-dwh-4.1.7
Doc Type: Bug Fix
Doc Text:
Cause: DHW sampling rate was 20 seconds. Consequence: That created load on postgres, specifically if the history db is hosted with the engine db. And created warning messages in the dwh, when the engine heartbeat did not update in the required interval. Fix: Moved back to 60 seconds interval. Result: Warning message are now gone, lees stress on the database.
Clone Of: 1395608
Environment:
Last Closed: 2017-10-16 10:10:40 UTC
oVirt Team: Metrics
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 81373 0 'None' MERGED history: update etl sampling interval to 60s 2020-06-05 21:18:59 UTC
oVirt gerrit 81374 0 'None' MERGED history: update etl sampling interval to 60s 2020-06-05 21:18:59 UTC

Comment 2 Yaniv Lavi 2017-08-07 15:39:25 UTC
We will not change the DWH sampling mid-stream unless it causes performance regression even when on a remote host. Nacking this for now.

Comment 3 RHEL Program Management 2017-08-07 15:42:42 UTC
Product Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 6 Lukas Svaty 2017-09-11 10:15:58 UTC
[root@pm-rh40 ~]# engine-config -g DwhHeartBeatInterval
DwhHeartBeatInterval: 15 version: general
[root@pm-rh40 ~]# rpm -q ovirt-engine-dwh
ovirt-engine-dwh-4.1.6.1-2.el7ev.noarch

Comment 7 Yaniv Kaul 2017-09-11 10:30:31 UTC
(In reply to Lukas Svaty from comment #6)
> [root@pm-rh40 ~]# engine-config -g DwhHeartBeatInterval
> DwhHeartBeatInterval: 15 version: general
> [root@pm-rh40 ~]# rpm -q ovirt-engine-dwh
> ovirt-engine-dwh-4.1.6.1-2.el7ev.noarch

Is that the right parameter? I thought it was DWH_SAMPLING in ovirt-engine-dwhd.conf

However, it's a good question why we need the heartbeat every 15 secs, if we move to 60secs collection interval.

Comment 8 Lukas Svaty 2017-09-11 10:38:15 UTC
Ah, my mistake did not read the bug correctly.

[root@pm-rh40 ~]# grep SAMPLING /usr/share/ovirt-engine-dwh/services/ovirt-engine-dwhd/ovirt-engine-dwhd.conf
DWH_SAMPLING=60

Moving to ON_QA, as I would like to check the service as well, when BZ#1490272 is unblocked, Shirly please check as well.

AFAIK we don't have any engine-config values for this. 

Moving needinfo to Shirly if we wanna change the heartbeat as well.

Comment 9 Lukas Svaty 2017-09-12 15:42:59 UTC
verified in ovirt-engine-dwh-4.1.7-1.el7ev.noarch

[root@pm-rh40 ~]# vim /etc/ovirt-engine-dwh/ovirt-engine-dwhd.conf.d/logging.conf
[root@pm-rh40 ~]# service ovirt-engine-dwhd restart && tail -f /var/log/ovirt-
2017-09-12 17:38:14|ETL Service Stopped
2017-09-12 17:38:16|ETL Service Started... omitted output
2017-09-12 17:39:00|ZltQkz|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||begin||
2017-09-12 17:39:00 Statistics sync ended. Duration: 847 milliseconds 
2017-09-12 17:40:00|ZltQkz|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||end|success|60001
2017-09-12 17:40:00|jgIWHe|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||begin||
2017-09-12 17:40:00 Statistics sync ended. Duration: 356 milliseconds 
2017-09-12 17:41:00|jgIWHe|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||end|success|60002
2017-09-12 17:41:00|ovCTrq|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||begin||
2017-09-12 17:41:00 Statistics sync ended. Duration: 283 milliseconds

Comment 10 Lukas Svaty 2017-09-12 15:43:39 UTC
re-adding the needinfo as Shirly removed it during setting of Fixed in version.

Comment 11 Shirly Radco 2017-09-12 17:49:33 UTC
If we change the engine heartbeat we cant support 20 seconds interval.
Not sure if we want it dynamic when setting the dwh interval. Yaniv?

Comment 12 Lukas Svaty 2017-09-13 06:19:00 UTC
Per bug the sampling interval was changed to 60 seconds and per comment#6 hearbeat is on 15 seconds. Where does 20 seconds come from?

[root@pm-rh40 ~]# engine-config -g DwhHeartBeatInterval
DwhHeartBeatInterval: 15 version: general

or am I missing something?

Comment 13 Yaniv Lavi 2017-09-18 17:20:30 UTC
(In reply to Shirly Radco from comment #11)
> If we change the engine heartbeat we cant support 20 seconds interval.
> Not sure if we want it dynamic when setting the dwh interval. Yaniv?

The heartbeat is on the engine side to know that the metrics are current. 
As long as it is lower than the collection interval, we should be ok.

Comment 14 Shirly Radco 2017-09-26 07:35:32 UTC
The dwh checks that the heartbeat timestamp is later then the last sampling/error timestamp in dwh_history_timekeeping in engine db.

The default now is 60 sec.
If user chooses to set to 20 sec then interval and we change DwhHeartBeatInterval back to 30 sec, then the dwh will not collect the data since heartbeat is not lower then 20 sec.
Should we move DwhHeartBeatInterval back to 30 seconds?

Comment 16 Yaniv Lavi 2017-10-29 17:05:29 UTC
(In reply to Shirly Radco from comment #14)
> The dwh checks that the heartbeat timestamp is later then the last
> sampling/error timestamp in dwh_history_timekeeping in engine db.
> 
> The default now is 60 sec.
> If user chooses to set to 20 sec then interval and we change
> DwhHeartBeatInterval back to 30 sec, then the dwh will not collect the data
> since heartbeat is not lower then 20 sec.
> Should we move DwhHeartBeatInterval back to 30 seconds?

no


Note You need to log in before you can comment on or make changes to this bug.