Description of problem: dashboard landing generate the following long query exec time: 35.1s SQL: SELECT vm_name AS name, COALESCE(AVG(memory_size_mb), 0) AS memory_size_mb, COALESCE(AVG(memory_usage_percent), 0) AS memory_usage_percent, MAX(previous_memory_percent) AS previous_memory_percent FROM v4_0_statistics_vms_resources_usage_samples samples INNER JOIN v4_0_configuration_history_vms vms ON samples.vm_id = vms.vm_id LEFT JOIN (SELECT samples.vm_id, COALESCE(AVG(memory_usage_percent), 0) AS previous_memory_percent FROM v4_0_statistics_vms_resources_usage_samples samples INNER JOIN v4_0_configuration_history_vms vms ON samples.vm_id = vms.vm_id WHERE samples.vm_status = 1 AND history_datetime >= (CURRENT_TIMESTAMP - INTERVAL '10 minute') AND history_datetime < (CURRENT_TIMESTAMP - INTERVAL '5 minute') AND vms.history_id IN (SELECT MAX(a.history_id) FROM v4_0_configuration_history_vms a GROUP BY a.vm_id)GROUP BY samples.vm_id) AS previous_trend ON vms.vm_id = previous_trend.vm_id WHERE samples.vm_status = 1 AND history_datetime >= (CURRENT_TIMESTAMP - INTERVAL '5 minute') AND history_datetime < CURRENT_TIMESTAMP AND vms.history_id IN(SELECT MAX(a.history_id) FROM v4_0_configuration_history_vms a GROUP BY a.vm_id) GROUP BY vm_name ORDER BY memory_usage_percent DESC LIMIT 10 Version-Release number of selected component (if applicable): 4.0.7 How reproducible: 100% Steps to Reproduce: 1. large scale env (39 hosts, 3K vms) 2. fetch the dashboard. 3 Actual results: very long response time and sometimes the UI get stuck Expected results: fast response time, stable UI Additional info:
when breaking down the query seems like the biggest part is scanning datetime by history_datetime index see the following: v4_0_statistics_vms_resources_usage_samples datetime index storing the most updated data last so order by datetime desc might improve the performance. filter the timestamp first and than doing other filters might improve the performance as well.
notice - there is double usage in "v4_0_statistics_vms_resources_usage_samples" hit this view once might reduce the response time.
Patch is included in ovirt-engine-4.0.1, moving to ON_QA
Moving verification to Scale team, as they have the appropriate environment for this.
verified failed his query runs for 7 sec. LOG: duration: 7026.370 ms execute <unnamed>: SELECT vm_name AS name, COALESCE(AVG(memory_size_mb), 0) AS memory_size_mb, COALESCE(AVG(memory_usage_percent ), 0) AS memory_usage_percent, MAX(previous_memory_percent) AS previous_memory_percent FROM v4_0_statistics_vms_resources_usage_samples samples INNER JOIN v4 _0_configuration_history_vms vms ON samples.vm_id = vms.vm_id LEFT JOIN (SELECT samples.vm_id, COALESCE(AVG(memory_usage_percent), 0) AS previous_memory_perc ent FROM v4_0_statistics_vms_resources_usage_samples samples INNER JOIN v4_0_configuration_history_vms vms ON samples.vm_id = vms.vm_id WHERE samples.vm_stat us = 1 AND samples.history_id IN (SELECT MAX(history_id) FROM v4_0_statistics_vms_resources_usage_samples AS a WHERE history_datetime >= CURRENT_TIMESTAMP - INTERVAL '10 minute' AND history_datetime <= CURRENT_TIMESTAMP - INTERVAL '5 minute' GROUP BY a.vm_id) GROUP BY samples.vm_id) AS previous_trend ON vms.vm_id = previous_trend.vm_id WHERE samples.vm_status = 1 AND samples.history_id IN ( SELECT MAX(history_id) FROM v4_0_statistics_vms_resources_usage_samples AS a WHERE history_datetime >= CURRENT_TIMESTAMP - INTERVAL '5 minute' AND history_datetime <= CURRENT_TIMESTAMP GROUP BY a.vm_id ) GROUP BY vm_name ORDER BY memo ry_usage_percent DESC LIMIT 10
I don't think we can improve this further. Are you running with the original number of objects reported in the bug description? How long does it take the dashboard to load?
it takes around 40sec to load. we should re test it with 3.1K vms
As I understand it is a lot of DWH efforts, it will be fixed by caching. Moving to future for now.
Moving to DWH for their decision. For now we will focus on caching on the UX side.
This issue was addressed by caching the queries every 5 minutes.
I must admit I think the root cause should be fixed, as this pre-fetching can cause load on the DB if the query is inefficient.