Bug 1302752

Summary: [scale] - getdisksvmguid inefficient query, hit the performance
Product: [oVirt] ovirt-engine Reporter: Eldad Marciano <emarcian>
Component: Database.CoreAssignee: Allon Mureinik <amureini>
Status: CLOSED CURRENTRELEASE QA Contact: Eldad Marciano <emarcian>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.2CC: amureini, bugs, emarcian, emesika, gklein, sbonazzo, tnisan
Target Milestone: ovirt-4.0.4Keywords: Performance
Target Release: 4.0.4Flags: amureini: ovirt-4.0.z?
gklein: blocker?
rule-engine: planning_ack?
rule-engine: devel_ack+
rule-engine: testing_ack+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-26 12:37:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1142762    
Bug Blocks:    

Description Eldad Marciano 2016-01-28 14:32:43 UTC
Description of problem:
engine=> EXPLAIN ANALYZE select * from  getdisksvmguid('fa464094-a656-49dc-8384-465c5308cff7', 't', NULL, 'f');
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Function Scan on getdisksvmguid  (cost=0.00..260.00 rows=1000 width=3170) (actual time=6643.344..6643.345 rows=1 loops=1)
 Total runtime: 6643.390 ms


Version-Release number of selected component (if applicable):
3.6.2

How reproducible:
100%

Steps to Reproduce:
1. loaded engine (500 hosts 7Kvms * 3 disks)

Actual results:
slow query, hit the CPU usage.

Expected results:
stable CPU, faster query.

Additional info:

Comment 1 Eldad Marciano 2016-01-28 15:19:32 UTC
removing the "GROUP BY" from all_disks_including_snapshots save ~6 sec.

by doing that the same amount of rows returns and query runs much faster:
"Function Scan on getdisksvmguid  (cost=0.00..260.00 rows=1000 width=3170) (actual time=1735.764..1735.764 rows=1 loops=1)"
"Total runtime: 1735.820 ms"



but it still needs some improvements. 
cause it takes too much CPU when running it in bulks.

Comment 2 Allon Mureinik 2016-01-31 19:13:20 UTC
(In reply to Eldad Marciano from comment #1)
> removing the "GROUP BY" from all_disks_including_snapshots save ~6 sec.
> 
> by doing that the same amount of rows returns and query runs much faster:
> "Function Scan on getdisksvmguid  (cost=0.00..260.00 rows=1000 width=3170)
> (actual time=1735.764..1735.764 rows=1 loops=1)"
> "Total runtime: 1735.820 ms"
> 
> 
> 
> but it still needs some improvements. 
> cause it takes too much CPU when running it in bulks.

Removing the GROUP BY is wrong - it will return multiple entries for templates that have multiple copies.

Comment 3 Eldad Marciano 2016-04-21 10:20:57 UTC
this bug has big impact in terms of performance wise.

any chance to fix it for the next release?

Comment 4 Allon Mureinik 2016-04-21 12:05:46 UTC
(In reply to Eldad Marciano from comment #3)
> this bug has big impact in terms of performance wise.
> 
> any chance to fix it for the next release?

This mainly depends on the RFE in bug 1142762. Once that's done, we could have a better estimation.

Comment 5 Sandro Bonazzola 2016-05-02 09:53:44 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 6 Yaniv Lavi 2016-05-23 13:16:07 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 7 Yaniv Lavi 2016-05-23 13:21:54 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 8 Allon Mureinik 2016-07-17 10:53:55 UTC
This bug was reported on 3.6, and probably existed way before hand. This is not a blocker.

Comment 9 Yaniv Kaul 2016-07-17 13:43:23 UTC
(In reply to Allon Mureinik from comment #8)
> This bug was reported on 3.6, and probably existed way before hand. This is
> not a blocker.

Eldad - do we have the latest on 4.0? Just for comparison? (we've changed so much - platform, Postgres, JBoss, Java, disk refactoring in engine... I hope we are in the same or better performance than 3.6!)

Comment 10 Eldad Marciano 2016-07-20 14:10:10 UTC
we about to start with 4.0 very soon. 
anyway it seems like that SP getdisksvmguid didn't change since we discovered this problem so we probably face it in 4.0 as well.
w'll update as soon as we have some results.

Comment 12 Eldad Marciano 2016-08-10 19:50:24 UTC
by using that patch https://gerrit.ovirt.org/#/c/62044/2
this query SP runs by ~200 ms.

Comment 13 Allon Mureinik 2016-08-11 12:29:59 UTC
(In reply to Eldad Marciano from comment #0)
> Description of problem:
> engine=> EXPLAIN ANALYZE select * from 
> getdisksvmguid('fa464094-a656-49dc-8384-465c5308cff7', 't', NULL, 'f');
>                                                         QUERY PLAN          
> 
> -----------------------------------------------------------------------------
> ----------------------------------------------
>  Function Scan on getdisksvmguid  (cost=0.00..260.00 rows=1000 width=3170)
> (actual time=6643.344..6643.345 rows=1 loops=1)
>  Total runtime: 6643.390 ms

(In reply to Eldad Marciano from comment #12)
> by using that patch https://gerrit.ovirt.org/#/c/62044/2
> this query SP runs by ~200 ms.

So it improves by a factor of ~33, which sounds like a pretty decent improvement to me.
Thanks Eldad!

Comment 14 Allon Mureinik 2016-08-11 12:40:12 UTC
Retargetting to 4.0.4 based on this comment.