Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1345947

Summary: [scale] - dashboard landing page generate inefficient query - COALESCE(AVG(memory_size_mb)
Product: [oVirt] ovirt-engine-dashboard Reporter: Eldad Marciano <emarcian>
Component: CoreAssignee: Nobody <nobody>
Status: CLOSED DEFERRED QA Contact: Eldad Marciano <emarcian>
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: amureini, bugs, emarcian, gklein, mperina, oourfali, sradco, ylavi
Target Milestone: ---Flags: rule-engine: planning_ack+
rule-engine: devel_ack+
pnovotny: testing_ack+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-20 07:09:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Metrics RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1365443    
Bug Blocks:    

Description Eldad Marciano 2016-06-13 13:24:05 UTC
Description of problem:
dashboard landing generate the following long query

exec time: 35.1s

SQL:
SELECT vm_name AS name, COALESCE(AVG(memory_size_mb), 0) AS memory_size_mb, COALESCE(AVG(memory_usage_percent), 0) AS memory_usage_percent, MAX(previous_memory_percent) AS previous_memory_percent FROM v4_0_statistics_vms_resources_usage_samples samples INNER JOIN v4_0_configuration_history_vms vms ON samples.vm_id = vms.vm_id LEFT JOIN (SELECT samples.vm_id, COALESCE(AVG(memory_usage_percent), 0) AS previous_memory_percent FROM v4_0_statistics_vms_resources_usage_samples samples INNER JOIN v4_0_configuration_history_vms vms ON samples.vm_id = vms.vm_id WHERE samples.vm_status = 1 AND history_datetime >= (CURRENT_TIMESTAMP - INTERVAL '10 minute') AND history_datetime < (CURRENT_TIMESTAMP - INTERVAL '5 minute') AND vms.history_id IN (SELECT MAX(a.history_id) FROM v4_0_configuration_history_vms a GROUP BY a.vm_id)GROUP BY samples.vm_id) AS previous_trend ON vms.vm_id = previous_trend.vm_id WHERE samples.vm_status = 1 AND history_datetime >= (CURRENT_TIMESTAMP - INTERVAL '5 minute') AND history_datetime < CURRENT_TIMESTAMP AND vms.history_id IN(SELECT MAX(a.history_id) FROM v4_0_configuration_history_vms a GROUP BY a.vm_id) GROUP BY vm_name ORDER BY memory_usage_percent DESC LIMIT 10



Version-Release number of selected component (if applicable):
4.0.7

How reproducible:
100%

Steps to Reproduce:
1. large scale env (39 hosts, 3K vms)
2. fetch the dashboard.
3

Actual results:
very long response time and sometimes the UI get stuck 

Expected results:
fast response time, stable UI

Additional info:

Comment 2 Eldad Marciano 2016-06-14 10:20:07 UTC
when breaking down the query seems like the biggest part is scanning datetime by history_datetime index 

see the following:
v4_0_statistics_vms_resources_usage_samples datetime index storing the most updated data last 
so order by datetime desc might improve the performance.


filter the timestamp first and than doing other filters might improve the performance as well.

Comment 3 Eldad Marciano 2016-06-14 11:56:20 UTC
notice - there is double usage in "v4_0_statistics_vms_resources_usage_samples"
hit this view once might reduce the response time.

Comment 4 Martin Perina 2016-07-04 13:11:18 UTC
Patch is included in ovirt-engine-4.0.1, moving to ON_QA

Comment 5 Pavel Novotny 2016-07-07 08:22:58 UTC
Moving verification to Scale team, as they have the appropriate environment for this.

Comment 6 Eldad Marciano 2016-08-09 11:39:18 UTC
verified failed his query runs for 7 sec.
LOG:  duration: 7026.370 ms  execute <unnamed>: SELECT vm_name AS name, COALESCE(AVG(memory_size_mb), 0) AS memory_size_mb, COALESCE(AVG(memory_usage_percent
), 0) AS memory_usage_percent, MAX(previous_memory_percent) AS previous_memory_percent FROM v4_0_statistics_vms_resources_usage_samples samples INNER JOIN v4
_0_configuration_history_vms vms ON samples.vm_id = vms.vm_id LEFT JOIN (SELECT samples.vm_id, COALESCE(AVG(memory_usage_percent), 0) AS previous_memory_perc
ent FROM v4_0_statistics_vms_resources_usage_samples samples INNER JOIN v4_0_configuration_history_vms vms ON samples.vm_id = vms.vm_id WHERE samples.vm_stat
us = 1 AND samples.history_id IN (SELECT MAX(history_id) FROM v4_0_statistics_vms_resources_usage_samples AS a WHERE history_datetime >= CURRENT_TIMESTAMP - 
INTERVAL '10 minute' AND history_datetime <= CURRENT_TIMESTAMP - INTERVAL '5 minute' GROUP BY a.vm_id) GROUP BY samples.vm_id) AS previous_trend ON vms.vm_id
 = previous_trend.vm_id WHERE samples.vm_status = 1 AND samples.history_id IN ( SELECT MAX(history_id) FROM v4_0_statistics_vms_resources_usage_samples AS a 
WHERE history_datetime >= CURRENT_TIMESTAMP - INTERVAL '5 minute' AND history_datetime <= CURRENT_TIMESTAMP GROUP BY a.vm_id ) GROUP BY vm_name ORDER BY memo
ry_usage_percent DESC LIMIT 10

Comment 7 Oved Ourfali 2016-08-10 12:27:46 UTC
I don't think we can improve this further. 
Are you running with the original number of objects reported in the bug description? 
How long does it take the dashboard to load?

Comment 8 Eldad Marciano 2016-08-10 19:59:07 UTC
it takes around 40sec to load.
we should re test it with 3.1K vms

Comment 9 Oved Ourfali 2016-11-09 10:02:59 UTC
As I understand it is a lot of DWH efforts, it will be fixed by caching.
Moving to future for now.

Comment 10 Oved Ourfali 2016-11-23 09:20:44 UTC
Moving to DWH for their decision.
For now we will focus on caching on the UX side.

Comment 11 Shirly Radco 2016-12-20 07:09:28 UTC
This issue was addressed by caching the queries every 5 minutes.

Comment 12 Oved Ourfali 2016-12-20 07:13:10 UTC
I must admit I think the root cause should be fixed, as this pre-fetching can cause load on the DB if the query is inefficient.