Bug 1858638

Summary: [Scale] Poor Performing VM search by cluster_name and partial host name
Product: [oVirt] ovirt-engine Reporter: mlehrer
Component: Search-BackendAssignee: Eli Mesika <emesika>
Status: CLOSED CURRENTRELEASE QA Contact: Tzahi Ashkenazi <tashkena>
Severity: high Docs Contact:
Priority: high    
Version: 4.4.1.8CC: bugs, emesika, mperina
Target Milestone: ovirt-4.4.3Keywords: Performance
Target Release: 4.4.3.2Flags: pm-rhel: ovirt-4.4+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.3.2 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-11 06:41:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
explain analyze output, vmstat, iostat, for manual execution and via UI none

Description mlehrer 2020-07-19 20:17:14 UTC
Created attachment 1701680 [details]
explain analyze output, vmstat, iostat, for manual execution and via UI

Description of problem:

In Webadmin, goto VMs tab.
Perform search: 'Cluster = Clustername and Host = firstpartofhostname*'
The CPU goes very high, and the query take more than several minutes to respond.  Negatively impacts postgres and engine performance.



Version-Release number of selected component (if applicable):

RHV-4.4.1-11
VDSM-4.40.22-1

How reproducible:
Reproduces

Steps to Reproduce:
1. Search for vms by Cluster = clustername and host = shortfqdn*
2. Wait for search to complete


Actual results:
Search takes more than 10 minutes to complete.


Expected results:
Less than 500ms

Additional info:
In addition to the poor query performance it should be noted the long running query impacts postgres which then impacts the ovirt-engine.  This matters because 1 user should not be able to degrade the  performance of the engine by executing 1 query.  

Attached are samples from running this query isolated in psql cli to generate the explain analyze output and also from the UI which shows a stronger impact on engine resources due to refresh rates defaulting to 5s and the aggressiveness of GWTservice polling.

Comment 2 Eli Mesika 2020-07-21 02:46:35 UTC
Please provide :

1) Exact search expression used 
2) Number of Clusters 
3) Number of Hosts 
4) Number of VMs 
5) Is the engine DB local or remote 
6) Do we have active DWH on the same machine of engine DB?

Can I get the machine/s details in private for analysis ?

Comment 7 Tzahi Ashkenazi 2020-10-01 14:21:03 UTC
tested on :
      Engine :  rhev-red01  
      rhv-release-4.4.3-5-001.noarch

On Cluster - L0_Group_0  with 1350 VMs ,135 vms per host 

with the following query from the UI: Compute > virtual machines 
     Cluster = L0_Group_0 and Host = f01-h05*

the query was completed for around 2 seconds from the UI , no high memory or cpu consumption  was occurred 

from glowroot :

   SELECT * FROM (SELECT * FROM vms WHERE ( vm_guid IN (SELECT distinct vms_with_tags.vm_guid FROM vms_with_tags LEFT OUTER JOIN vds_with_tags ON vms_with_tags.run_on_vds=vds_with_tags.vds_id WHERE ( 
   vms_with_tags.cluster_name ILIKE 'L0\_Group\_0' AND ( vds_with_tags.cluster_name ILIKE '%f01-h05%%' OR vds_with_tags.cpu_model ILIKE '%f01-h05%%' OR vds_with_tags.free_text_comment ILIKE '%f01-h05%%' OR 
   vds_with_tags.host_name ILIKE '%f01-h05%%' OR vds_with_tags.software_version ILIKE '%f01-h05%%' OR vds_with_tags.storage_pool_name ILIKE '%f01-h05%%' OR vds_with_tags.tag_name ILIKE '%f01-h05%%' OR 
   vds_with_tags.vds_name ILIKE '%f01-h05%%' ) ))) ORDER BY vm_name ASC ) as T1 OFFSET (1 -1) LIMIT 100

total time (ms):  1309ms

Comment 8 Sandro Bonazzola 2020-11-11 06:41:26 UTC
This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.