Bug 1308478

Summary: [SCALE] Create new VM in webadmin portal shows only spinning ring.
Product: Red Hat Enterprise Virtualization Manager Reporter: Roman Hodain <rhodain>
Component: ovirt-engineAssignee: Shmuel Melamud <smelamud>
Status: CLOSED CURRENTRELEASE QA Contact: Eldad Marciano <emarcian>
Severity: high Docs Contact:
Priority: high    
Version: 3.5.7CC: emarcian, gklein, lsurette, mgoldboi, michal.skrivanek, rbalakri, Rhev-m-bugs, rhodain, sbonazzo, srevivo, tjelinek, ykaul
Target Milestone: ovirt-4.0.0-betaKeywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-10 11:57:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
explain of the SQL command none

Description Roman Hodain 2016-02-15 10:34:04 UTC
Description of problem:
When clicking on "New VM" in the "Virtual machines" the pop up show only a spinning circle for many minutes. 

Version-Release number of selected component (if applicable):
RHEV-M 3.5.7

How reproducible:
100%

Steps to Reproduce:
1. Create 1000 templates
2. Create a new VM

Actual results:
Then "New VM" window shows the spinning circle for minutes.

Expected results:
The pop up shows the form immediately.

Additional info:
Webadmin calls GetVmTemplatesByStoragePoolIdQuery in order to populate the form. It is very expensive operation.

It gets the list of templates by GetVmTemplatesByStoragePoolId stored procedure which is very fast, but then it tries to get the related disks for each of the templates and that is very expensive. The query is started for each of the templates. 

    select * from  getdisksvmguid('UUID', 'f' NULL, 'f')

In this case it will be called 1000 times. It is very expensive as it involves view all_disks_including_snapshots which is complicated on its own. Here is an example of one of the SQL commands:

SELECT all_disks_including_snapshots.*
   FROM all_disks_including_snapshots
   LEFT JOIN vm_device ON vm_device.device_id = all_disks_including_snapshots.image_group_id AND (NOT 'f' OR is_plugged)
   WHERE vm_device.vm_id = '03c6b305-9a14-4038-a613-b8adde9e291b'
   AND ((vm_device.snapshot_id IS NULL AND all_disks_including_snapshots.active IS NOT FALSE)
       OR vm_device.snapshot_id = all_disks_including_snapshots.vm_snapshot_id)
   AND (NOT 'f' OR EXISTS (SELECT 1
                  FROM   user_disk_permissions_view
                  WHERE  user_id is NULL AND entity_id = all_disks_including_snapshots.disk_id));

The execution may take more than one second.
We should load the information about the disks on demand or optimise the SQL.

Comment 1 Roman Hodain 2016-02-15 10:39:40 UTC
Created attachment 1127228 [details]
explain of the SQL command

Comment 3 Michal Skrivanek 2016-02-16 07:56:13 UTC
we don't need disks before you choose a particular template, right?

Comment 4 Tomas Jelinek 2016-02-16 09:49:52 UTC
No, we don't need it in advance - it is enough to have it only for the selected one.

Comment 5 RHEL Program Management 2016-04-03 07:45:17 UTC
Product Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 6 Eldad Marciano 2016-04-06 08:00:20 UTC
this bug still reproduced in 3.6.5.1, the spining ring at least not running forever.
the dialog box became available after ~20-30 sec.
and seems like
getdisksvmguid has been called several time serially while the dialog loading.

any chance to execute them in parallel? 

moreover each one of the execution takes around 5 sec.

is getstorage_domains_by_storagepoolid executed by this flow as well?

further profiler session required.

here we have the pg log while we hit the diablog box for new templates.
LOG:  duration: 5995.970 ms  execute <unnamed>: select * from  getdisksvmguid($1, $2, $3, $4)
LOG:  duration: 5857.623 ms  execute <unnamed>: select * from  getstorage_domains_by_storagepoolid($1, $2, $3)
LOG:  duration: 5905.632 ms  execute <unnamed>: select * from  getdisksvmguid($1, $2, $3, $4)
LOG:  duration: 5607.681 ms  execute <unnamed>: select * from  getdisksvmguid($1, $2, $3, $4)
LOG:  duration: 5619.016 ms  execute <unnamed>: select * from  getdisksvmguid($1, $2, $3, $4)
LOG:  duration: 5926.344 ms  execute <unnamed>: select * from  getdisksvmguid($1, $2, $3, $4)
LOG:  duration: 5957.711 ms  execute <unnamed>: select * from  getdisksvmguid($1, $2, $3, $4)
LOG:  duration: 5897.029 ms  execute <unnamed>: select * from  getstorage_domains_by_storagepoolid($1, $2, $3)
LOG:  duration: 5942.826 ms  execute <unnamed>: select * from  getdisksvmguid($1, $2, $3, $4)
LOG:  duration: 5644.935 ms  execute <unnamed>: select * from  getdisksvmguid($1, $2, $3, $4)
LOG:  duration: 5976.697 ms  execute <unnamed>: select * from  getdisksvmguid($1, $2, $3, $4)
LOG:  duration: 5897.564 ms  execute <unnamed>: select * from  getdisksvmguid($1, $2, $3, $4)
LOG:  duration: 5877.607 ms  execute S_31: select * from  getstorage_domains_by_storagepoolid($1, $2, $3)
LOG:  duration: 6021.145 ms  execute <unnamed>: select * from  getdisksvmguid($1, $2, $3, $4)
LOG:  duration: 6105.581 ms  execute <unnamed>: select * from  getstorage_domains_by_storage_pool_id_with_permitted_action($1, $2, $3)
LOG:  duration: 6477.493 ms  execute S_29: select * from  getstorage_domains_by_storagepoolid($1, $2, $3)
LOG:  duration: 5572.327 ms  execute S_32: select * from  getstorage_domains_by_storagepoolid($1, $2, $3)

Comment 7 Yaniv Kaul 2016-04-06 08:05:16 UTC
(In reply to Eldad Marciano from comment #6)
> this bug still reproduced in 3.6.5.1, the spining ring at least not running
> forever.
> the dialog box became available after ~20-30 sec.

How many VMs, disks, templates, storage domains, ... ?

> and seems like
> getdisksvmguid has been called several time serially while the dialog
> loading.
> 
> any chance to execute them in parallel? 

Why? If it is a high scale env., I'm OK with 20-30 seconds. What is the storage of the DB? Just a plain disk? Move it to something better.

Comment 8 Eldad Marciano 2016-04-06 09:44:10 UTC
(In reply to Yaniv Kaul from comment #7)
> (In reply to Eldad Marciano from comment #6)
> > this bug still reproduced in 3.6.5.1, the spining ring at least not running
> > forever.
> > the dialog box became available after ~20-30 sec.
> 
> How many VMs, disks, templates, storage domains, ... ?

overall:
vms 10K
disks 29.8K
templates 400
storage domains 13

for this particular data center (real):
vms 3K
disks 1 disk per vm 3K 
templates 400
storage domains 12

> 
> > and seems like
> > getdisksvmguid has been called several time serially while the dialog
> > loading.
> > 
> > any chance to execute them in parallel? 
> 
> Why? If it is a high scale env., I'm OK with 20-30 seconds. What is the
> storage of the DB? Just a plain disk? Move it to something better.
Seems like it's inefficent query, we probably can make it faster, by optimize it, without any hardware changes.

I need to check with Allon. seems like we already optimize this query.

Comment 9 Michal Skrivanek 2016-04-06 11:01:45 UTC
let's keep it for further investigation for a while. But technically I think it improved enough (no "minutes" anymore) so Roman, I guess you can report that back. What do you think?

Comment 10 Roman Hodain 2016-04-20 05:55:15 UTC
(In reply to Michal Skrivanek from comment #9)
> let's keep it for further investigation for a while. But technically I think
> it improved enough (no "minutes" anymore) so Roman, I guess you can report
> that back. What do you think?

It was communicated and we are waiting for response.

Comment 12 Yaniv Lavi 2016-05-09 11:06:22 UTC
oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.

Comment 16 Michal Skrivanek 2016-05-10 11:57:29 UTC
Seems good enough in 3.6.5