1345947 – [scale] - dashboard landing page generate inefficient query - COALESCE(AVG(memory_size_mb)

Bug 1345947 - [scale] - dashboard landing page generate inefficient query - COALESCE(AVG(memory_size_mb)

Summary: [scale] - dashboard landing page generate inefficient query - COALESCE(AVG(me...

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	ovirt-engine-dashboard
Classification:	oVirt
Component:	Core
Sub Component:
Version:	unspecified
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Nobody
QA Contact:	Eldad Marciano
Docs Contact:
URL:
Whiteboard:
Depends On:	1365443
Blocks:
TreeView+	depends on / blocked

Reported:	2016-06-13 13:24 UTC by Eldad Marciano
Modified:	2016-12-20 07:13 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-12-20 07:09:28 UTC
oVirt Team:	Metrics
Embargoed:
Flags:	rule-engine: planning_ack+ rule-engine: devel_ack+ pnovotny: testing_ack+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	59089	0	master	MERGED	dashboard: Optimize VM status query	2020-11-09 17:45:03 UTC
oVirt gerrit	59570	0	ovirt-engine-4.0	MERGED	dashboard: Optimize VM status query	2020-11-09 17:45:02 UTC

Description Eldad Marciano 2016-06-13 13:24:05 UTC

Description of problem:
dashboard landing generate the following long query

exec time: 35.1s

SQL:
SELECT vm_name AS name, COALESCE(AVG(memory_size_mb), 0) AS memory_size_mb, COALESCE(AVG(memory_usage_percent), 0) AS memory_usage_percent, MAX(previous_memory_percent) AS previous_memory_percent FROM v4_0_statistics_vms_resources_usage_samples samples INNER JOIN v4_0_configuration_history_vms vms ON samples.vm_id = vms.vm_id LEFT JOIN (SELECT samples.vm_id, COALESCE(AVG(memory_usage_percent), 0) AS previous_memory_percent FROM v4_0_statistics_vms_resources_usage_samples samples INNER JOIN v4_0_configuration_history_vms vms ON samples.vm_id = vms.vm_id WHERE samples.vm_status = 1 AND history_datetime >= (CURRENT_TIMESTAMP - INTERVAL '10 minute') AND history_datetime < (CURRENT_TIMESTAMP - INTERVAL '5 minute') AND vms.history_id IN (SELECT MAX(a.history_id) FROM v4_0_configuration_history_vms a GROUP BY a.vm_id)GROUP BY samples.vm_id) AS previous_trend ON vms.vm_id = previous_trend.vm_id WHERE samples.vm_status = 1 AND history_datetime >= (CURRENT_TIMESTAMP - INTERVAL '5 minute') AND history_datetime < CURRENT_TIMESTAMP AND vms.history_id IN(SELECT MAX(a.history_id) FROM v4_0_configuration_history_vms a GROUP BY a.vm_id) GROUP BY vm_name ORDER BY memory_usage_percent DESC LIMIT 10



Version-Release number of selected component (if applicable):
4.0.7

How reproducible:
100%

Steps to Reproduce:
1. large scale env (39 hosts, 3K vms)
2. fetch the dashboard.
3

Actual results:
very long response time and sometimes the UI get stuck 

Expected results:
fast response time, stable UI

Additional info:

Comment 2 Eldad Marciano 2016-06-14 10:20:07 UTC

when breaking down the query seems like the biggest part is scanning datetime by history_datetime index 

see the following:
v4_0_statistics_vms_resources_usage_samples datetime index storing the most updated data last 
so order by datetime desc might improve the performance.


filter the timestamp first and than doing other filters might improve the performance as well.

Comment 3 Eldad Marciano 2016-06-14 11:56:20 UTC

notice - there is double usage in "v4_0_statistics_vms_resources_usage_samples"
hit this view once might reduce the response time.

Comment 4 Martin Perina 2016-07-04 13:11:18 UTC

Patch is included in ovirt-engine-4.0.1, moving to ON_QA

Comment 5 Pavel Novotny 2016-07-07 08:22:58 UTC

Moving verification to Scale team, as they have the appropriate environment for this.

Comment 6 Eldad Marciano 2016-08-09 11:39:18 UTC

verified failed his query runs for 7 sec.
LOG:  duration: 7026.370 ms  execute <unnamed>: SELECT vm_name AS name, COALESCE(AVG(memory_size_mb), 0) AS memory_size_mb, COALESCE(AVG(memory_usage_percent
), 0) AS memory_usage_percent, MAX(previous_memory_percent) AS previous_memory_percent FROM v4_0_statistics_vms_resources_usage_samples samples INNER JOIN v4
_0_configuration_history_vms vms ON samples.vm_id = vms.vm_id LEFT JOIN (SELECT samples.vm_id, COALESCE(AVG(memory_usage_percent), 0) AS previous_memory_perc
ent FROM v4_0_statistics_vms_resources_usage_samples samples INNER JOIN v4_0_configuration_history_vms vms ON samples.vm_id = vms.vm_id WHERE samples.vm_stat
us = 1 AND samples.history_id IN (SELECT MAX(history_id) FROM v4_0_statistics_vms_resources_usage_samples AS a WHERE history_datetime >= CURRENT_TIMESTAMP - 
INTERVAL '10 minute' AND history_datetime <= CURRENT_TIMESTAMP - INTERVAL '5 minute' GROUP BY a.vm_id) GROUP BY samples.vm_id) AS previous_trend ON vms.vm_id
 = previous_trend.vm_id WHERE samples.vm_status = 1 AND samples.history_id IN ( SELECT MAX(history_id) FROM v4_0_statistics_vms_resources_usage_samples AS a 
WHERE history_datetime >= CURRENT_TIMESTAMP - INTERVAL '5 minute' AND history_datetime <= CURRENT_TIMESTAMP GROUP BY a.vm_id ) GROUP BY vm_name ORDER BY memo
ry_usage_percent DESC LIMIT 10

Comment 7 Oved Ourfali 2016-08-10 12:27:46 UTC

I don't think we can improve this further. 
Are you running with the original number of objects reported in the bug description? 
How long does it take the dashboard to load?

Comment 8 Eldad Marciano 2016-08-10 19:59:07 UTC

it takes around 40sec to load.
we should re test it with 3.1K vms

Comment 9 Oved Ourfali 2016-11-09 10:02:59 UTC

As I understand it is a lot of DWH efforts, it will be fixed by caching.
Moving to future for now.

Comment 10 Oved Ourfali 2016-11-23 09:20:44 UTC

Moving to DWH for their decision.
For now we will focus on caching on the UX side.

Comment 11 Shirly Radco 2016-12-20 07:09:28 UTC

This issue was addressed by caching the queries every 5 minutes.

Comment 12 Oved Ourfali 2016-12-20 07:13:10 UTC

I must admit I think the root cause should be fixed, as this pre-fetching can cause load on the DB if the query is inefficient.

Note You need to log in before you can comment on or make changes to this bug.