Bug 1347281 - [scale] Remove foreign keys from history database for better etl performance on scale
Summary: [scale] Remove foreign keys from history database for better etl performance ...
Alias: None
Product: ovirt-engine-dwh
Classification: oVirt
Component: Database
Version: 4.0.0
Hardware: Unspecified
OS: Unspecified
high vote
Target Milestone: ovirt-4.0.1
: 4.0.1
Assignee: Shirly Radco
QA Contact: mlehrer
Depends On:
Blocks: 1353189
TreeView+ depends on / blocked
Reported: 2016-06-16 12:41 UTC by Shirly Radco
Modified: 2016-08-23 03:12 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Removed foreign keys from the history database to provide better sampling scale performance.
Clone Of:
Last Closed: 2016-08-12 14:11:46 UTC
oVirt Team: Metrics
rule-engine: ovirt-4.0.z+
ylavi: planning_ack+
sradco: devel_ack+
pstehlik: testing_ack+

Attachments (Terms of Use)
sample query duration values (10.89 KB, text/plain)
2016-08-09 08:29 UTC, mlehrer
no flags Details

System ID Private Priority Status Summary Last Updated
oVirt gerrit 59453 0 master MERGED history: dropped all history database foreign keys 2016-06-21 12:19:23 UTC

Description Shirly Radco 2016-06-16 12:41:01 UTC
Description of problem:
In scale environments the sampling process took around 25s.
We want to lower it below 15s in order to run the sampling every 15s.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.Install engine +dwh and set up a scale environment
2.Run dwh in debug mode and with sampling intercal set to 15.
3.Check the sampling time in the dwh log file.

Actual results:
The sampling takes around 25 seconds 

Expected results:
Should be under15s.

Additional info:

Comment 1 mlehrer 2016-08-09 08:29:21 UTC
Created attachment 1189126 [details]
sample query duration values

Comment 2 mlehrer 2016-08-09 08:30:56 UTC
Tested on 4.0.2-1
DWH (App & DB) on same tier as Engine App

hosts: 541
  vms: 6322

Standard disk used for Database.

Without postgres tuning sample query finishes in 30-39s.
With tuned postgres sample query reduced to 5-6s.
Some sample query degradation occurs during delete jobs but only by a few additional seconds still remaining under 15s for most queries.

While we tune a few settings, in order to see sample query return under 15s please set:
checkpoint_segments = 128  
checkpoint_completion_target = 0.9

Further information available: https://mojo.redhat.com/docs/DOC-1089988
Last Sample values parsed, and attached.

Note You need to log in before you can comment on or make changes to this bug.