Bug 1347281

Summary: [scale] Remove foreign keys from history database for better etl performance on scale
Product: [oVirt] ovirt-engine-dwh Reporter: Shirly Radco <sradco>
Component: DatabaseAssignee: Shirly Radco <sradco>
Status: CLOSED CURRENTRELEASE QA Contact: mlehrer
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.0.0CC: bgraveno, bugs, gklein, lsvaty, mlehrer, sbonazzo, ylavi
Target Milestone: ovirt-4.0.1Keywords: ZStream
Target Release: 4.0.1Flags: rule-engine: ovirt-4.0.z+
ylavi: planning_ack+
sradco: devel_ack+
pstehlik: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Removed foreign keys from the history database to provide better sampling scale performance.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-12 14:11:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Metrics RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1353189    
Attachments:
Description Flags
sample query duration values none

Description Shirly Radco 2016-06-16 12:41:01 UTC
Description of problem:
In scale environments the sampling process took around 25s.
We want to lower it below 15s in order to run the sampling every 15s.

Version-Release number of selected component (if applicable):
4.0

How reproducible:


Steps to Reproduce:
1.Install engine +dwh and set up a scale environment
2.Run dwh in debug mode and with sampling intercal set to 15.
3.Check the sampling time in the dwh log file.

Actual results:
The sampling takes around 25 seconds 

Expected results:
Should be under15s.

Additional info:

Comment 1 mlehrer 2016-08-09 08:29:21 UTC
Created attachment 1189126 [details]
sample query duration values

Comment 2 mlehrer 2016-08-09 08:30:56 UTC
Tested on 4.0.2-1
DWH (App & DB) on same tier as Engine App

Dataset:
hosts: 541
  vms: 6322

Standard disk used for Database.


Without postgres tuning sample query finishes in 30-39s.
With tuned postgres sample query reduced to 5-6s.
Some sample query degradation occurs during delete jobs but only by a few additional seconds still remaining under 15s for most queries.

While we tune a few settings, in order to see sample query return under 15s please set:
checkpoint_segments = 128  
checkpoint_completion_target = 0.9

Further information available: https://mojo.redhat.com/docs/DOC-1089988
Last Sample values parsed, and attached.