Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1811866

Summary: [Scale] Webadmin clusters list view response time is too long because of excessive amount of qos related sql queries
Product: Red Hat Enterprise Virtualization Manager Reporter: eraviv
Component: ovirt-engineAssignee: eraviv
Status: CLOSED DEFERRED QA Contact: mlehrer
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2.0CC: dagur, dholler, mburman, michal.skrivanek, mlehrer, mperina, mtessun
Target Milestone: ---Keywords: Performance
Target Release: ---Flags: dagur: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine 4.4.0-26 b5b5c99ca2f Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-20 09:44:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1766815    

Description eraviv 2020-03-10 01:59:21 UTC
Description of problem:

Webadmin clusters view and REST GET /clusters -
response time is too long because of excessive amount of qos related sql queries

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. create 95 host environment with 300 vlans on one host interface (non mgmt)
2. Setup: 
- No REST calls
- VdsRefreshRate=default
- Host in UP status
3. let engine run idle for 15 minutes after startup
4. measure load time of:
- webadmin clusters list view page
- 'curl <engine>/ovirt-engine/api/clusters'

Actual results:
webadmin freezes, REST call takes long minutes to return

Expected results:
no more than a few seconds to complete both flows

Additional info:
https://docs.google.com/document/d/1-Mv2JjDkV2gKiXsomVKCtMf5r4TZbxjiFRtbygBd-fU/edit?usp=sharing

Comment 1 eraviv 2020-03-10 07:40:49 UTC
verifying the REST API is redundant here bc it was done in BZ#1769463

Comment 4 mlehrer 2020-05-13 17:28:31 UTC
#Env: 150 Hosts with 100 networks
4.4.0-0.31.master.el8
HE environment with 200 nested hosts of which 150 hosts have 100 networks per host.
DWH separated and JVM set to 4G with engine set with with 200 pool connections / 250 db connections.



# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c ""SELECT count(*) from vds_interface_view""
 count
-------
 15825
(1 row)"


Flow UI Compute clicking Cluster view in UI with 150 hosts and 100 networks per host
Current status:  Takes very long and has substantial impact on engine performance in 4.4

Reason for the slowness is due to excessive queries that occur on initial Page load and in background related to GenericApiGWTServices.
1 GenericApiGWTServices can take 35s of which 29s is spent issuing 10 unique sql queries which are executed many times for 1 GenericApiGWTServices request shown below.



+--------------------------------------------------------------------------+------------+-------------------+----------------------+-----------------+
|               /ovirt-engine/webadmin/GenericApiGWTService                | Total time | "Execution_count" | "Time_per_execution" | "Rows_returned" |
+--------------------------------------------------------------------------+------------+-------------------+----------------------+-----------------+
|                                                                          |            |                   |                      |                 |
| select * from getvdsinterfacebyname(?, ?)                                |   5,838.50 |            15,310 |                 0.38 |               1 |
| select * from getnameserversbydnsresolverconfigurationid(?)              |   5,186.10 |            15,512 |                 0.33 |               1 |
| select * from getnetworkattachmentwithqosbynicidandnetworkid(?, ?)       |   5,050.30 |            15,512 |                 0.33 |               1 |
| select * from getnetwork_clusterbycluster_idandbynetwork_id(?, ?)        |   4,606.60 |            15,512 |                  0.3 |               1 |
| select * from getqosbyqosid(?)                                           |   4,519.30 |            15,310 |                  0.3 |               0 |
| select * from getdnsresolverconfigurationbydnsresolverconfigurationid(?) |   4,333.70 |            15,512 |                 0.28 |               1 |
| select * from getinterfaceswithqosbyclusterid(?)                         |      161.7 |                 5 |                 32.3 |        3,144.80 |
| select * from getallnetworkbyclusterid(?, ?, ?)                          |       12.9 |                 5 |                  2.6 |             104 |
| select * from getclusterbyclusterid(?, ?, ?)                             |        3.6 |                 5 |                 0.72 |               1 |
| select * from getallqosbyqostype(?)                                      |        1.4 |                 5 |                 0.29 |               0 |
+--------------------------------------------------------------------------+------------+-------------------+----------------------+-----------------+


When initially opening “Compute => Clusters”  3 GenericApiGWTServices are initiated at the same time per cluster.  When the Cluster UI view via browser is open we see a constant polling action (of GenericApiGWTService which get issued 1 per min) in the background as long as the Cluster View is in browser is running.

The following is a link to the detailed breakdown here: https://docs.google.com/document/d/1WxPytn65q--ax6hTQstp6v4VO5tkkahh-S5WR8xgSFA/edit?usp=sharing

Comment 6 Sandro Bonazzola 2020-05-18 14:46:43 UTC
Moved to 4.4.1 not being marked as blocker for 4.4.0 and we are preparing to GA.

Comment 8 Daniel Gur 2020-05-27 08:41:06 UTC
New bug - maybe relevant to this one - Bug 1839660 - [UI] Cluster edit , takes too long even without changes while submitting OK (edit)