Bug 1478859 - [downstream clone] DWH sampling is too high - switch back to 60s
[downstream clone] DWH sampling is too high - switch back to 60s
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-dwh (Show other bugs)
4.1.5
Unspecified Unspecified
low Severity high
: ovirt-4.1.6
: ---
Assigned To: Shirly Radco
Lukas Svaty
: Performance, Regression, Reopened, ZStream
Depends On: 1395608 1490272
Blocks: 1398553
  Show dependency treegraph
 
Reported: 2017-08-07 06:44 EDT by Shirly Radco
Modified: 2017-10-29 13:05 EDT (History)
19 users (show)

See Also:
Fixed In Version: ovirt-engine-dwh-4.1.7
Doc Type: Bug Fix
Doc Text:
Cause: DHW sampling rate was 20 seconds. Consequence: That created load on postgres, specifically if the history db is hosted with the engine db. And created warning messages in the dwh, when the engine heartbeat did not update in the required interval. Fix: Moved back to 60 seconds interval. Result: Warning message are now gone, lees stress on the database.
Story Points: ---
Clone Of: 1395608
Environment:
Last Closed: 2017-10-16 06:10:40 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Metrics
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 81373 master POST history: update etl sampling interval to 60s 2017-09-03 04:51 EDT
oVirt gerrit 81374 ovirt-engine-dwh-4.1 POST history: update etl sampling interval to 60s 2017-09-03 04:59 EDT

  None (edit)
Comment 2 Yaniv Lavi 2017-08-07 11:39:25 EDT
We will not change the DWH sampling mid-stream unless it causes performance regression even when on a remote host. Nacking this for now.
Comment 3 RHEL Product and Program Management 2017-08-07 11:42:42 EDT
Product Management has reviewed and declined this request.
You may appeal this decision by reopening this request.
Comment 6 Lukas Svaty 2017-09-11 06:15:58 EDT
[root@pm-rh40 ~]# engine-config -g DwhHeartBeatInterval
DwhHeartBeatInterval: 15 version: general
[root@pm-rh40 ~]# rpm -q ovirt-engine-dwh
ovirt-engine-dwh-4.1.6.1-2.el7ev.noarch
Comment 7 Yaniv Kaul 2017-09-11 06:30:31 EDT
(In reply to Lukas Svaty from comment #6)
> [root@pm-rh40 ~]# engine-config -g DwhHeartBeatInterval
> DwhHeartBeatInterval: 15 version: general
> [root@pm-rh40 ~]# rpm -q ovirt-engine-dwh
> ovirt-engine-dwh-4.1.6.1-2.el7ev.noarch

Is that the right parameter? I thought it was DWH_SAMPLING in ovirt-engine-dwhd.conf

However, it's a good question why we need the heartbeat every 15 secs, if we move to 60secs collection interval.
Comment 8 Lukas Svaty 2017-09-11 06:38:15 EDT
Ah, my mistake did not read the bug correctly.

[root@pm-rh40 ~]# grep SAMPLING /usr/share/ovirt-engine-dwh/services/ovirt-engine-dwhd/ovirt-engine-dwhd.conf
DWH_SAMPLING=60

Moving to ON_QA, as I would like to check the service as well, when BZ#1490272 is unblocked, Shirly please check as well.

AFAIK we don't have any engine-config values for this. 

Moving needinfo to Shirly if we wanna change the heartbeat as well.
Comment 9 Lukas Svaty 2017-09-12 11:42:59 EDT
verified in ovirt-engine-dwh-4.1.7-1.el7ev.noarch

[root@pm-rh40 ~]# vim /etc/ovirt-engine-dwh/ovirt-engine-dwhd.conf.d/logging.conf
[root@pm-rh40 ~]# service ovirt-engine-dwhd restart && tail -f /var/log/ovirt-
2017-09-12 17:38:14|ETL Service Stopped
2017-09-12 17:38:16|ETL Service Started... omitted output
2017-09-12 17:39:00|ZltQkz|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||begin||
2017-09-12 17:39:00 Statistics sync ended. Duration: 847 milliseconds 
2017-09-12 17:40:00|ZltQkz|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||end|success|60001
2017-09-12 17:40:00|jgIWHe|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||begin||
2017-09-12 17:40:00 Statistics sync ended. Duration: 356 milliseconds 
2017-09-12 17:41:00|jgIWHe|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||end|success|60002
2017-09-12 17:41:00|ovCTrq|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||begin||
2017-09-12 17:41:00 Statistics sync ended. Duration: 283 milliseconds
Comment 10 Lukas Svaty 2017-09-12 11:43:39 EDT
re-adding the needinfo as Shirly removed it during setting of Fixed in version.
Comment 11 Shirly Radco 2017-09-12 13:49:33 EDT
If we change the engine heartbeat we cant support 20 seconds interval.
Not sure if we want it dynamic when setting the dwh interval. Yaniv?
Comment 12 Lukas Svaty 2017-09-13 02:19:00 EDT
Per bug the sampling interval was changed to 60 seconds and per comment#6 hearbeat is on 15 seconds. Where does 20 seconds come from?

[root@pm-rh40 ~]# engine-config -g DwhHeartBeatInterval
DwhHeartBeatInterval: 15 version: general

or am I missing something?
Comment 13 Yaniv Lavi 2017-09-18 13:20:30 EDT
(In reply to Shirly Radco from comment #11)
> If we change the engine heartbeat we cant support 20 seconds interval.
> Not sure if we want it dynamic when setting the dwh interval. Yaniv?

The heartbeat is on the engine side to know that the metrics are current. 
As long as it is lower than the collection interval, we should be ok.
Comment 14 Shirly Radco 2017-09-26 03:35:32 EDT
The dwh checks that the heartbeat timestamp is later then the last sampling/error timestamp in dwh_history_timekeeping in engine db.

The default now is 60 sec.
If user chooses to set to 20 sec then interval and we change DwhHeartBeatInterval back to 30 sec, then the dwh will not collect the data since heartbeat is not lower then 20 sec.
Should we move DwhHeartBeatInterval back to 30 seconds?
Comment 16 Yaniv Lavi 2017-10-29 13:05:29 EDT
(In reply to Shirly Radco from comment #14)
> The dwh checks that the heartbeat timestamp is later then the last
> sampling/error timestamp in dwh_history_timekeeping in engine db.
> 
> The default now is 60 sec.
> If user chooses to set to 20 sec then interval and we change
> DwhHeartBeatInterval back to 30 sec, then the dwh will not collect the data
> since heartbeat is not lower then 20 sec.
> Should we move DwhHeartBeatInterval back to 30 seconds?

no

Note You need to log in before you can comment on or make changes to this bug.