Bug 1478859
Summary: | [downstream clone] DWH sampling is too high - switch back to 60s | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Shirly Radco <sradco> |
Component: | ovirt-engine-dwh | Assignee: | Shirly Radco <sradco> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Lukas Svaty <lsvaty> |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | 4.1.5 | CC: | amarchuk, bugs, eberman, emarcian, guchen, lsurette, lsvaty, lveyde, michal.skrivanek, mwest, pstehlik, rbalakri, rgolan, Rhev-m-bugs, sradco, srevivo, ykaul, ylavi |
Target Milestone: | ovirt-4.1.6 | Keywords: | Performance, Regression, Reopened, ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ovirt-engine-dwh-4.1.7 | Doc Type: | Bug Fix |
Doc Text: |
Cause:
DHW sampling rate was 20 seconds.
Consequence:
That created load on postgres, specifically if the history db is hosted with the engine db.
And created warning messages in the dwh, when the engine heartbeat did not update in the required interval.
Fix:
Moved back to 60 seconds interval.
Result:
Warning message are now gone, lees stress on the database.
|
Story Points: | --- |
Clone Of: | 1395608 | Environment: | |
Last Closed: | 2017-10-16 10:10:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Metrics | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1395608, 1490272 | ||
Bug Blocks: | 1398553 |
Comment 2
Yaniv Lavi
2017-08-07 15:39:25 UTC
Product Management has reviewed and declined this request. You may appeal this decision by reopening this request. [root@pm-rh40 ~]# engine-config -g DwhHeartBeatInterval DwhHeartBeatInterval: 15 version: general [root@pm-rh40 ~]# rpm -q ovirt-engine-dwh ovirt-engine-dwh-4.1.6.1-2.el7ev.noarch (In reply to Lukas Svaty from comment #6) > [root@pm-rh40 ~]# engine-config -g DwhHeartBeatInterval > DwhHeartBeatInterval: 15 version: general > [root@pm-rh40 ~]# rpm -q ovirt-engine-dwh > ovirt-engine-dwh-4.1.6.1-2.el7ev.noarch Is that the right parameter? I thought it was DWH_SAMPLING in ovirt-engine-dwhd.conf However, it's a good question why we need the heartbeat every 15 secs, if we move to 60secs collection interval. Ah, my mistake did not read the bug correctly. [root@pm-rh40 ~]# grep SAMPLING /usr/share/ovirt-engine-dwh/services/ovirt-engine-dwhd/ovirt-engine-dwhd.conf DWH_SAMPLING=60 Moving to ON_QA, as I would like to check the service as well, when BZ#1490272 is unblocked, Shirly please check as well. AFAIK we don't have any engine-config values for this. Moving needinfo to Shirly if we wanna change the heartbeat as well. verified in ovirt-engine-dwh-4.1.7-1.el7ev.noarch [root@pm-rh40 ~]# vim /etc/ovirt-engine-dwh/ovirt-engine-dwhd.conf.d/logging.conf [root@pm-rh40 ~]# service ovirt-engine-dwhd restart && tail -f /var/log/ovirt- 2017-09-12 17:38:14|ETL Service Stopped 2017-09-12 17:38:16|ETL Service Started... omitted output 2017-09-12 17:39:00|ZltQkz|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||begin|| 2017-09-12 17:39:00 Statistics sync ended. Duration: 847 milliseconds 2017-09-12 17:40:00|ZltQkz|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||end|success|60001 2017-09-12 17:40:00|jgIWHe|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||begin|| 2017-09-12 17:40:00 Statistics sync ended. Duration: 356 milliseconds 2017-09-12 17:41:00|jgIWHe|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||end|success|60002 2017-09-12 17:41:00|ovCTrq|IgH59r|MDVNSt|1257|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|_FvEy8LzqEeCaj-T1n0SCFw|4.1|Default||begin|| 2017-09-12 17:41:00 Statistics sync ended. Duration: 283 milliseconds re-adding the needinfo as Shirly removed it during setting of Fixed in version. If we change the engine heartbeat we cant support 20 seconds interval. Not sure if we want it dynamic when setting the dwh interval. Yaniv? Per bug the sampling interval was changed to 60 seconds and per comment#6 hearbeat is on 15 seconds. Where does 20 seconds come from? [root@pm-rh40 ~]# engine-config -g DwhHeartBeatInterval DwhHeartBeatInterval: 15 version: general or am I missing something? (In reply to Shirly Radco from comment #11) > If we change the engine heartbeat we cant support 20 seconds interval. > Not sure if we want it dynamic when setting the dwh interval. Yaniv? The heartbeat is on the engine side to know that the metrics are current. As long as it is lower than the collection interval, we should be ok. The dwh checks that the heartbeat timestamp is later then the last sampling/error timestamp in dwh_history_timekeeping in engine db. The default now is 60 sec. If user chooses to set to 20 sec then interval and we change DwhHeartBeatInterval back to 30 sec, then the dwh will not collect the data since heartbeat is not lower then 20 sec. Should we move DwhHeartBeatInterval back to 30 seconds? (In reply to Shirly Radco from comment #14) > The dwh checks that the heartbeat timestamp is later then the last > sampling/error timestamp in dwh_history_timekeeping in engine db. > > The default now is 60 sec. > If user chooses to set to 20 sec then interval and we change > DwhHeartBeatInterval back to 30 sec, then the dwh will not collect the data > since heartbeat is not lower then 20 sec. > Should we move DwhHeartBeatInterval back to 30 seconds? no |