Bug 1345934

Summary: [scale] - dashboard landing page generate inefficient query - history.cpu_per_socket
Product: [oVirt] ovirt-engine Reporter: Eldad Marciano <emarcian>
Component: GeneralAssignee: Alexander Wels <awels>
Status: CLOSED CURRENTRELEASE QA Contact: Eldad Marciano <emarcian>
Severity: high Docs Contact:
Priority: high    
Version: 4.0.0CC: amureini, awels, bugs, gklein, oourfali
Target Milestone: ovirt-4.0.1Flags: rule-engine: ovirt-4.0.z+
rule-engine: planning_ack+
rule-engine: devel_ack+
pnovotny: testing_ack+
Target Release: 4.0.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-12 14:11:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: UX RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eldad Marciano 2016-06-13 13:10:24 UTC
Description of problem:
when browsing to landing page it takes very long time to fetch the dashboard


the query is:
SELECT SUM(COALESCE(history.cpu_per_socket, 0) * COALESCE (history.number_of_sockets, 0)) AS cpu_total_vms, SUM(CASE WHEN samples.vm_status = 1 THEN COALESCE (history.cpu_per_socket, 0) * COALESCE (history.number_of_sockets, 0) ELSE 0 END ) AS cpu_used_vms, CAST(SUM(COALESCE(history.memory_size_mb, 0.00)) AS float) AS mem_total_vms, CAST(SUM(CASE WHEN samples.vm_status = 1 THEN COALESCE(history.memory_size_mb, 0.00) ELSE 0.00 END) AS float) AS mem_used_vms FROM v4_0_statistics_vms_resources_usage_samples samples INNER JOIN v4_0_configuration_history_vms history ON history.history_id = samples.vm_configuration_version INNER JOIN v4_0_latest_configuration_vms a ON a.vm_id = samples.vm_id WHERE history_datetime >= CURRENT_TIMESTAMP - INTERVAL '5 minute' AND history_datetime <= CURRENT_TIMESTAMP AND samples.history_id IN (SELECT MAX(history_id) FROM v4_0_statistics_vms_resources_usage_samples b GROUP BY vm_id )


Version-Release number of selected component (if applicable):
4.0.7

How reproducible:
100%

Steps to Reproduce:
1. large scale env (39 hosts, 3K vms)
2. fetch the dashboard.
3.

Actual results:
very long response time and sometimes the UI get stuck 

Expected results:
fast response time, stable UI

Additional info:

Comment 1 Eldad Marciano 2016-06-13 13:11:01 UTC
I forgot to mention this is the most heavy query and it takes like 132s to resolve it.

Comment 2 Yaniv Kaul 2016-06-13 14:39:23 UTC
Reducing severity - this is taking a lot of time on high scale. I assume it is working in a sane manner on a smaller scale setup.

Comment 3 Alexander Wels 2016-06-13 14:43:25 UTC
I am analyzing the queries now, looks like we might need some indexes on some tables that we are using.

Comment 4 Alexander Wels 2016-06-13 19:40:20 UTC
Got the query down to about 20 seconds. Not sure what else to do to improve it.

Comment 5 Pavel Novotny 2016-07-07 08:23:07 UTC
Moving verification to Scale team, as they have the appropriate environment for this.

Comment 6 Eldad Marciano 2016-08-09 11:45:26 UTC
verified execution time is 2 sec.