Bug 1029106 - getallfromvmtemplates stored in procedure execution takes long time making VM creation take long time when having more than 80 templates
getallfromvmtemplates stored in procedure execution takes long time making VM...
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
All All
urgent Severity urgent
: ---
: 3.3.0
Assigned To: Liran Zelkha
Pavel Novotny
Depends On:
Blocks: 3.3rc1
  Show dependency treegraph
Reported: 2013-11-11 11:50 EST by Tomas Dosek
Modified: 2014-07-25 00:13 EDT (History)
15 users (show)

See Also:
Fixed In Version: is27
Doc Type: Bug Fix
Doc Text:
Retrieving a list of virtual machine templates required a long time when there was a large number of templates, which also resulted in a long time to create virtual machines from templates. This update improves the performance of the search query for templates by a factor of 5, so creating virtual machines from templates is now faster.
Story Points: ---
Clone Of:
Last Closed: 2014-01-21 12:39:06 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 21714 None None None Never
oVirt gerrit 22228 None None None Never
oVirt gerrit 22246 None None None Never

  None (edit)
Description Tomas Dosek 2013-11-11 11:50:21 EST
Description of problem:
Customers scenario describes large system load being generated when using api
to create VM from a template. Even showing the templates in Webadmin portal takes noticeable amount of time.

Description of function we use to list all templates:

engine=# \df+ getallfromvmtemplates 
List of functions
-[ RECORD 1 ]----------+------------------------------------------------------------------------------------------------
Schema                 | public
Name                   | getallfromvmtemplates
Data type of result    | SETOF vm_templates_view
Data type of parameter | v_user_id uuid, v_is_filtered boolean
Type                   | normal
Volatility             | volatile
Owner                  | postgres
Language               | plpgsql
Source code            | 
                       : BEGIN
                       :       RETURN QUERY SELECT vm_templates.*
                       :       FROM vm_templates_view vm_templates
                       :       WHERE (NOT v_is_filtered OR EXISTS (SELECT 1
                       :                                           FROM   user_vm_template_permissions_view
                       :                                           WHERE  user_id = v_user_id AND entity_id = vmt_guid))
                       :       ORDER BY name;
                       : END; 
Description            | 

Taking a look at explain analyze of the query (run using user that has no template, except the default blank on, assigned and filtered, in environment that has only 2 templates). It takes 20ms to get the result:

engine=# EXPLAIN ANALYZE VERBOSE SELECT * FROM getallfromvmtemplates('c1736ed2-ef83-40b0-9222-dbcbb24538a9','t');
                                                                                          QUERY PLAN                                                                                                 
 Function Scan on getallfromvmtemplates  (cost=0.00..260.00 rows=1000 width=3805) (actual time=19.920..19.920 rows=1 loops=1)
   Output: vmt_guid, name, mem_size_mb, os, creation_date, child_count, num_of_sockets, cpu_per_socket, num_of_cpus, description, vds_group_id, domain, num_of_monitors, allow_console_reconnect, sta
tus, usb_policy, time_zone, is_auto_suspend, fail_back, vds_group_name, vm_type, hypervisor_type, operation_mode, nice_level, storage_pool_id, storage_pool_name, quota_enforcement_type, default_boo
t_sequence, default_display_type, priority, auto_startup, is_stateless, iso_path, origin, initrd_url, kernel_url, kernel_params, quota_id, quota_name, migration_support, dedicated_vm_for_vds
 Total runtime: 20.015 ms
(3 rows)

Version-Release number of selected component (if applicable):

How reproducible:
100 %

Steps to Reproduce:
1. Have large amount of templates
2. Navigate to templates tab

Actual results:
Takes noticeable amount of time and on database level generates quite noticeable load

Expected results:
Should not take so long and should not generate such load.
Comment 1 Liran Zelkha 2013-11-12 04:32:00 EST
Hi Tomas,

1. How many templates do the customer have?
2. How many VMs? 
3. Can you provide 3 jstacks of the engine while running the customer scenario?
4. Is it a dedicated DB or an internal one?
Comment 3 Liran Zelkha 2013-11-13 06:59:26 EST
Hi Tomas,

A very interesting usecase. Wild guess - the constant insert/delete flows might hurt Postgres Analyzer performance.
Can you run "ANALYZE vm_static" on the database and see if the performance is improved?
Comment 4 Tomas Dosek 2013-11-13 07:33:59 EST
Hi Liran,

Here's the output of my testing environment

engine=# analyze verbose vm_static;
INFO:  analyzing "public.vm_static"
INFO:  "vm_static": scanned 7 of 7 pages, containing 197 live rows and 0 dead rows; 197 rows in sample, 197 estimated total rows

engine=# explain ANALYZE select * from getallfromvmtemplates('c1736ed2-ef83-40b0-9222-dbcbb24538a9','t');
                                                         QUERY PLAN                                                         
 Function Scan on getallfromvmtemplates  (cost=0.00..260.00 rows=1000 width=3805) (actual time=9.884..9.884 rows=1 loops=1)
 Total runtime: 9.948 ms
(2 rows)

Apparently collecting fresh stats and refreshing the catalog statistics works quite well here. (Still 10 ms for finding the result given the amount of data - 8 users, 2 templates, no rights given to the user on any template - that quite a lot)

But 100 % improvement in speed is quite nice.

The question is whether the performance degradation in customer's scenario is by any chance influenced by massive changes in vm_static and nonconsistent index. (my wild guess is it could be the case).
Comment 5 Liran Zelkha 2013-11-13 09:00:34 EST
Our current implementation stores templates and vms in the same table. So if they remove 800 rows from the table, and insert new ones - it probably hurt the optimizer. Try to run it on their environment, and tell me the outcome. If this is the problem, we'll need to find a good way to do it periodically.
Comment 10 Liran Zelkha 2013-11-13 15:26:48 EST
Got it. Will review and get back to you.
Comment 12 Liran Zelkha 2013-11-17 16:10:54 EST
Hi Tomas,

I've checked. Are you 100% sure we use the function to list all the templates? From a debug session it seems like we're using a dynamic search for that. The query we're running for the System-->Templates tab is:
SELECT * FROM (SELECT * FROM vm_templates_view WHERE ( vmt_guid IN (SELECT vm_templates_storage_domain.vmt_guid FROM  vm_templates_storage_domain  ))  ORDER BY name ASC ) as T1 OFFSET (1 -1) LIMIT 100.

Now, this is of course not optimal, and it so happens that I have a patch that is being tested that can be extended to solve the templates issue. 

If not, and the function is called, please detail the scenario again, as I can't reproduce this error.

Comment 14 Liran Zelkha 2013-11-19 03:11:25 EST
Hi Tomas,

I apologize for the ping-pong. Can you confirm the customer uses the following API - /ovirt-engine/api/templates
If so, it uses a dynamic search query. 
Also, which user is customer uses to authenticate? Is it an admin user?
Comment 17 Michal Skrivanek 2013-11-26 11:12:42 EST
is this going to be relevant for all customers? or is there any risk?
Comment 19 Liran Zelkha 2013-11-26 14:36:51 EST
Always a risk. This patch will solve this specific scenario (API of getting a list of templates based on a data center)
Comment 22 Liran Zelkha 2013-12-03 02:14:48 EST
Hi Ron,

I can't confirm permissions will make a difference in this case. From my tests it seems like there is an issue with our query generation. I'm working on the patch.
Comment 24 Pavel Novotny 2013-12-13 15:49:07 EST
Verified in rhevm-3.3.0-0.40.rc.el6ev.noarch (is27).

I used RHEVM with 700 VMs (from template) and 80 templates (I added and removed them several times in a row).
The Templates tab in webadmin then took cca 3 seconds to load.
Executing the query from comment 15 ( /api/templates?search=datacenter...) took about 4-5 seconds. I used admin@internal as well as an IPA user, but it didn't make any difference. The system load on RHEVM during the query execution was also within acceptable limits - in my case ~50% CPU load (on a weaker HW than is officialy recommended in docs).
Comment 25 errata-xmlrpc 2014-01-21 12:39:06 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.