Bug 1029106

Summary: getallfromvmtemplates stored in procedure execution takes long time making VM creation take long time when having more than 80 templates
Product: Red Hat Enterprise Virtualization Manager Reporter: Tomas Dosek <tdosek>
Component: ovirt-engineAssignee: Liran Zelkha <lzelkha>
Status: CLOSED ERRATA QA Contact: Pavel Novotny <pnovotny>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.2.0CC: acathrow, bazulay, emesika, iheim, lpeer, luvilla, lyarwood, lzelkha, mavital, michal.skrivanek, Rhev-m-bugs, rvdwees, tdosek, ukar, yeylon
Target Milestone: ---   
Target Release: 3.3.0   
Hardware: All   
OS: All   
Whiteboard: virt
Fixed In Version: is27 Doc Type: Bug Fix
Doc Text:
Retrieving a list of virtual machine templates required a long time when there was a large number of templates, which also resulted in a long time to create virtual machines from templates. This update improves the performance of the search query for templates by a factor of 5, so creating virtual machines from templates is now faster.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-21 17:39:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1049022    

Description Tomas Dosek 2013-11-11 16:50:21 UTC
Description of problem:
Customers scenario describes large system load being generated when using api
to create VM from a template. Even showing the templates in Webadmin portal takes noticeable amount of time.

Description of function we use to list all templates:

engine=# \df+ getallfromvmtemplates 
List of functions
-[ RECORD 1 ]----------+------------------------------------------------------------------------------------------------
Schema                 | public
Name                   | getallfromvmtemplates
Data type of result    | SETOF vm_templates_view
Data type of parameter | v_user_id uuid, v_is_filtered boolean
Type                   | normal
Volatility             | volatile
Owner                  | postgres
Language               | plpgsql
Source code            | 
                       : BEGIN
                       :       RETURN QUERY SELECT vm_templates.*
                       :       FROM vm_templates_view vm_templates
                       :       WHERE (NOT v_is_filtered OR EXISTS (SELECT 1
                       :                                           FROM   user_vm_template_permissions_view
                       :                                           WHERE  user_id = v_user_id AND entity_id = vmt_guid))
                       :       ORDER BY name;
                       : END; 
Description            | 

Taking a look at explain analyze of the query (run using user that has no template, except the default blank on, assigned and filtered, in environment that has only 2 templates). It takes 20ms to get the result:

engine=# EXPLAIN ANALYZE VERBOSE SELECT * FROM getallfromvmtemplates('c1736ed2-ef83-40b0-9222-dbcbb24538a9','t');
                                                                                          QUERY PLAN                                                                                                 
 Function Scan on getallfromvmtemplates  (cost=0.00..260.00 rows=1000 width=3805) (actual time=19.920..19.920 rows=1 loops=1)
   Output: vmt_guid, name, mem_size_mb, os, creation_date, child_count, num_of_sockets, cpu_per_socket, num_of_cpus, description, vds_group_id, domain, num_of_monitors, allow_console_reconnect, sta
tus, usb_policy, time_zone, is_auto_suspend, fail_back, vds_group_name, vm_type, hypervisor_type, operation_mode, nice_level, storage_pool_id, storage_pool_name, quota_enforcement_type, default_boo
t_sequence, default_display_type, priority, auto_startup, is_stateless, iso_path, origin, initrd_url, kernel_url, kernel_params, quota_id, quota_name, migration_support, dedicated_vm_for_vds
 Total runtime: 20.015 ms
(3 rows)

Version-Release number of selected component (if applicable):

How reproducible:
100 %

Steps to Reproduce:
1. Have large amount of templates
2. Navigate to templates tab

Actual results:
Takes noticeable amount of time and on database level generates quite noticeable load

Expected results:
Should not take so long and should not generate such load.

Comment 1 Liran Zelkha 2013-11-12 09:32:00 UTC
Hi Tomas,

1. How many templates do the customer have?
2. How many VMs? 
3. Can you provide 3 jstacks of the engine while running the customer scenario?
4. Is it a dedicated DB or an internal one?

Comment 3 Liran Zelkha 2013-11-13 11:59:26 UTC
Hi Tomas,

A very interesting usecase. Wild guess - the constant insert/delete flows might hurt Postgres Analyzer performance.
Can you run "ANALYZE vm_static" on the database and see if the performance is improved?

Comment 4 Tomas Dosek 2013-11-13 12:33:59 UTC
Hi Liran,

Here's the output of my testing environment

engine=# analyze verbose vm_static;
INFO:  analyzing "public.vm_static"
INFO:  "vm_static": scanned 7 of 7 pages, containing 197 live rows and 0 dead rows; 197 rows in sample, 197 estimated total rows

engine=# explain ANALYZE select * from getallfromvmtemplates('c1736ed2-ef83-40b0-9222-dbcbb24538a9','t');
                                                         QUERY PLAN                                                         
 Function Scan on getallfromvmtemplates  (cost=0.00..260.00 rows=1000 width=3805) (actual time=9.884..9.884 rows=1 loops=1)
 Total runtime: 9.948 ms
(2 rows)

Apparently collecting fresh stats and refreshing the catalog statistics works quite well here. (Still 10 ms for finding the result given the amount of data - 8 users, 2 templates, no rights given to the user on any template - that quite a lot)

But 100 % improvement in speed is quite nice.

The question is whether the performance degradation in customer's scenario is by any chance influenced by massive changes in vm_static and nonconsistent index. (my wild guess is it could be the case).

Comment 5 Liran Zelkha 2013-11-13 14:00:34 UTC
Our current implementation stores templates and vms in the same table. So if they remove 800 rows from the table, and insert new ones - it probably hurt the optimizer. Try to run it on their environment, and tell me the outcome. If this is the problem, we'll need to find a good way to do it periodically.

Comment 10 Liran Zelkha 2013-11-13 20:26:48 UTC
Got it. Will review and get back to you.

Comment 12 Liran Zelkha 2013-11-17 21:10:54 UTC
Hi Tomas,

I've checked. Are you 100% sure we use the function to list all the templates? From a debug session it seems like we're using a dynamic search for that. The query we're running for the System-->Templates tab is:
SELECT * FROM (SELECT * FROM vm_templates_view WHERE ( vmt_guid IN (SELECT vm_templates_storage_domain.vmt_guid FROM  vm_templates_storage_domain  ))  ORDER BY name ASC ) as T1 OFFSET (1 -1) LIMIT 100.

Now, this is of course not optimal, and it so happens that I have a patch that is being tested that can be extended to solve the templates issue. 

If not, and the function is called, please detail the scenario again, as I can't reproduce this error.


Comment 14 Liran Zelkha 2013-11-19 08:11:25 UTC
Hi Tomas,

I apologize for the ping-pong. Can you confirm the customer uses the following API - /ovirt-engine/api/templates
If so, it uses a dynamic search query. 
Also, which user is customer uses to authenticate? Is it an admin user?

Comment 17 Michal Skrivanek 2013-11-26 16:12:42 UTC
is this going to be relevant for all customers? or is there any risk?

Comment 19 Liran Zelkha 2013-11-26 19:36:51 UTC
Always a risk. This patch will solve this specific scenario (API of getting a list of templates based on a data center)

Comment 22 Liran Zelkha 2013-12-03 07:14:48 UTC
Hi Ron,

I can't confirm permissions will make a difference in this case. From my tests it seems like there is an issue with our query generation. I'm working on the patch.

Comment 24 Pavel Novotny 2013-12-13 20:49:07 UTC
Verified in rhevm-3.3.0-0.40.rc.el6ev.noarch (is27).

I used RHEVM with 700 VMs (from template) and 80 templates (I added and removed them several times in a row).
The Templates tab in webadmin then took cca 3 seconds to load.
Executing the query from comment 15 ( /api/templates?search=datacenter...) took about 4-5 seconds. I used admin@internal as well as an IPA user, but it didn't make any difference. The system load on RHEVM during the query execution was also within acceptable limits - in my case ~50% CPU load (on a weaker HW than is officialy recommended in docs).

Comment 25 errata-xmlrpc 2014-01-21 17:39:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.