1417471 – [scale] - SP getdisksvmguid produce inefficient query

Bug 1417471 - [scale] - SP getdisksvmguid produce inefficient query

Summary: [scale] - SP getdisksvmguid produce inefficient query

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Backend.Core
Sub Component:
Version:	4.1.0.1
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	ovirt-4.2.0
Target Release:	---
Assignee:	Allon Mureinik
QA Contact:	eberman
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-29 16:56 UTC by Eldad Marciano
Modified:	2019-04-28 08:37 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-06-22 13:40:45 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.2+ ykaul: exception+

Attachments	(Terms of Use)

Description Eldad Marciano 2017-01-29 16:56:29 UTC

Description of problem:
In part of the scale out scenario,
we faced response time degradation, that mainly caused by CPU time degradation on the DB machine. 

the following SP getdisksvmguid response time affected from the CPU load on the machine.

once the engine have lots of disks the DB CPU raising and same the response time of getdisksvmguid.

lots of getdisksvmguid queries were detected during the ramp up, and most of them have long response time around more 10K registered disks.


we also found that ovirt-engine-dwh consume lots some of the CPU load.
once ovirt-engine-dwhd daemon stopped, DB CPU utilization went down, and the query runs faster (query exec time went from ~8 sec to ~1).


adding some spec on the local disk of the DB machine:

random-read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.1.10
Starting 1 process
random-read: Laying out IO file(s) (1 file(s) / 128MB)
Jobs: 1 (f=1): [r] [-.-% done] [60371KB/0KB/0KB /s] [15.1K/0/0 iops] [eta 00m:00s]
random-read: (groupid=0, jobs=1): err= 0: pid=41185: Sun Jan 29 16:16:12 2017
  read : io=131072KB, bw=60681KB/s, iops=15170, runt=  2160msec
    clat (usec): min=53, max=18245, avg=61.92, stdev=102.75
     lat (usec): min=54, max=18245, avg=62.17, stdev=102.75
    clat percentiles (usec):
     |  1.00th=[   55],  5.00th=[   56], 10.00th=[   56], 20.00th=[   56],
     | 30.00th=[   57], 40.00th=[   57], 50.00th=[   58], 60.00th=[   59],
     | 70.00th=[   60], 80.00th=[   63], 90.00th=[   73], 95.00th=[   78],
     | 99.00th=[  100], 99.50th=[  106], 99.90th=[  117], 99.95th=[  129],
     | 99.99th=[  233]
    bw (KB  /s): min=58944, max=61680, per=100.00%, avg=60790.00, stdev=1276.45
    lat (usec) : 100=98.94%, 250=1.05%, 500=0.01%
    lat (msec) : 4=0.01%, 20=0.01%
  cpu          : usr=9.08%, sys=38.68%, ctx=32769, majf=0, minf=36
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=32768/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=131072KB, aggrb=60681KB/s, minb=60681KB/s, maxb=60681KB/s, mint=2160msec, maxt=2160msec

Disk stats (read/write):
    dm-0: ios=30980/98, merge=0/0, ticks=1216/10, in_queue=1226, util=57.02%, aggrios=32768/102, aggrmerge=0/0, aggrticks=1251/11, aggrin_queue=1239, aggrutil=54.13%
  sda: ios=32768/102, merge=0/0, ticks=1251/11, in_queue=1239, util=54.13%

Version-Release number of selected component (if applicable):
4.1.0-8

How reproducible:
100%

Steps to Reproduce:
1. engine populated with 250 hosts and 10K vm disks.


Actual results:
create vm, start vm, add disk to vm - having very long response time 
which caused by getdiskvmguid

Expected results:
fast and stable query response

Additional info:

Comment 1 Yaniv Kaul 2017-01-30 07:38:00 UTC

Please attach relevant logs and other data. 
Please open a separate bug on DWH

Comment 2 Eldad Marciano 2017-01-30 09:05:56 UTC

(In reply to Yaniv Kaul from comment #1)
> Please attach relevant logs and other data. 
> Please open a separate bug on DWH

seems like there is no relevant logs beside the DB logs, in order to track the queries.

we can have as well profiling session for the JVM.

but lets start with this.

about the separate bug, I and rgolan decided to address it via this https://bugzilla.redhat.com/show_bug.cgi?id=1395608

Comment 4 Tal Nisan 2017-01-30 10:14:04 UTC

Roy, can you have a look?

Comment 5 Roy Golan 2017-01-30 11:56:03 UTC

under load, more than 10000 disks, this query is quite a beast for the planner and may take seconds to complete. It affects specialy create VM which keeps degrading as the num of disks grows. Migrate Vm is also affected by this.


Before changing the query we can maybe use more suitable query/s. Examples from RunVmValidator and its internal validator - I find the usages of getVmDisks()  and getVmDiskImages method:

1. only need the disk list size here:
protected ValidationResult validateStoragePoolUp(VM vm, StoragePool storagePool, List<DiskImage> vmImages) {
        if (vmImages.isEmpty() || vm.isAutoStartup()) {


2. ImageHandler only needs the domain id of the disks
public static Set<Guid> getAllStorageIdsForImageIds(Collection<DiskImage> images) {
        Set<Guid> domainsIds = new HashSet<>();
        for (DiskImage image : images) {
            domainsIds.addAll(image.getStorageIds());
        }

3. Only need disk image status check here
private ValidationResult diskImagesNotInStatus(ImageStatus status, EngineMessage failMessage) {



Tal can't we use some non view tables for this tasks or perhaps create a thinner view if we don't have something already for that? the amount of info is just no propotional to what we use in this flow.

Comment 6 Allon Mureinik 2017-02-08 23:22:24 UTC

Liron, isn't this a dup of bug 1412626?

Comment 7 Liron Aravot 2017-02-22 10:37:09 UTC

The fix for bz 1412626 should improve this scenario as well. On Eldad db with dwh enabled the query time was 50% lower with that change.
Regardless, we should look into improving that query and reduce the usages of it. 

Eldad, if you can - please confirm that the situation has improved.
I believe we can keep this BZ to make the query improvement/lowering its usage.

Regards,
Liron

Comment 8 Yaniv Kaul 2017-03-02 13:26:43 UTC

Moving to 4.1.3, as it simply waits on scale QE results and only happens on fairly large environments.

Comment 9 Eldad Marciano 2017-04-19 11:23:03 UTC

(In reply to Liron Aravot from comment #7)
> The fix for bz 1412626 should improve this scenario as well. On Eldad db
> with dwh enabled the query time was 50% lower with that change.
> Regardless, we should look into improving that query and reduce the usages
> of it. 
> 
> Eldad, if you can - please confirm that the situation has improved.
> I believe we can keep this BZ to make the query improvement/lowering its
> usage.
> 
> Regards,
> Liron

Eyal, due to other priorities, we need to coordinate this test in order to reproduce it

Comment 10 Allon Mureinik 2017-04-20 12:24:09 UTC

(In reply to Yaniv Kaul from comment #8)
> Moving to 4.1.3, as it simply waits on scale QE results and only happens on
> fairly large environments.

Moving to ON_QA to validate if there's a justification for continued dev work.

Comment 11 Eldad Marciano 2017-04-24 15:04:10 UTC

Eyal, this bug can be reproduced with vdsmfake and high scale env.
we need to coordinate it per the severity.
currently it's medium, seems like we can push it to 4.2

or coordinate it to 4.1.x

Comment 12 Eldad Marciano 2017-05-14 13:42:27 UTC

A short feedback about this bug,
we have populated the setup with 6500 disks, divided by ~3000 vms.

when creating vms in bulks, we still seeing high cpu peaks 40-80%.

we're able to reproduce the issue manually with simple script[1] that simulate 50 sql calls of getdisksvmguid simultaneously.
while running this  script we can identify 80-90% CPU utilization on a 24 cores box.

looks like this query generates high cpu, specially for simultaneously SQL's. 

[1]
#!/bin/sh

db=$1
sql=$2
parallel=$3

for p in `seq 1 $parallel`; do
nohup psql -U $db -c "$sql" > /dev/null 2>&1 & echo "";
done

Comment 13 Allon Mureinik 2017-05-28 15:01:00 UTC

I wouldn't be surprised if 50 concurrent calls to this SP generate a high load. The question is whether there's a relevant flow that uses the engine that results in such a peak?

Comment 14 Eldad Marciano 2017-06-01 11:10:18 UTC

(In reply to Allon Mureinik from comment #13)
> I wouldn't be surprised if 50 concurrent calls to this SP generate a high
> load. The question is whether there's a relevant flow that uses the engine
> that results in such a peak?

Yes, the flow is when creating \ starting vms in bulk of 50 vms.
considering creating \ starting vms in bulks we might faced it.
Im pretty sure, some costumers doing it, Moran any insights ?

Comment 15 Eldad Marciano 2017-06-04 22:39:58 UTC

Moran, please advise per comment https://bugzilla.redhat.com/show_bug.cgi?id=1417471#c14

Comment 18 Roy Golan 2017-06-07 10:28:13 UTC

I don't think this bug is in high priority, but run VM time is probably suffering from this, but it is on large scale ( more than 1000 or 1500 vms and concurrent VM start of > 30 ). For most of the use cases it is less relevant. Moran your welcome to fill in here.

emarcian, eberman when you come to test this make sure
- run everything with engine, don't simulate parallel using a script
- use a good lun disk for the db
- check the time it takes to run/create vm

Comment 19 Roy Golan 2017-06-07 10:29:36 UTC

I doubt if this needs an excecption flag.

Comment 20 Yaniv Kaul 2017-06-07 11:00:16 UTC

(In reply to Roy Golan from comment #19)
> I doubt if this needs an excecption flag.

It doesn't, but then it should not be in 4.1.3. Doesn't matter right now as it is on qa already.

Comment 21 Eldad Marciano 2017-06-12 11:40:02 UTC

(In reply to Roy Golan from comment #18)
> I don't think this bug is in high priority, but run VM time is probably
> suffering from this, but it is on large scale ( more than 1000 or 1500 vms
> and concurrent VM start of > 30 ). For most of the use cases it is less
> relevant. Moran your welcome to fill in here.
> 
> emarcian, eberman when you come to test this make sure
> - run everything with engine, don't simulate parallel using a script
> - use a good lun disk for the db
> - check the time it takes to run/create vm

Roy, I profile the engine with scsi lun as disk (flash drive) + run everything from engine.

topology (using fake)
12 SD's, 250 hosts, ~1300 vms, ~3800 disks.


I used jprofiler to measure the performance on top of this scale.

- there is good perfomance for batch creation until but just for few hundred vms.
- there is bad performance for ~1000+ vms just for batch calls - 22 sec and above.
- single call not affected form this such of scale (currently ~1300 vms 3800 disks). and it runs by sec.

bottom line concurrency + high scale = bigger impact.

when I breakdown the response time (for batch call in size of 50 threads)
I see the "insertnewvm" query runs by 10-15 sec.
the sp getdiskvmguid runs by 2-4 sec.

it looks like it's all about of concurrency in high scale(1000+ vms, ~3000 disks)

there is more room for improvements but for now it is a questions of priority.
also, we should consider that we run postgres on top of flash drive, which give us a boost to handle more concurrency, and specially this $SUBJECT SP.
also we test this on stronger machine in terms of CPU, than we had on the previous lab.

Comment 22 Roy Golan 2017-06-20 12:55:29 UTC

Based on this comment I think its fair to push this 4.2 as medium. It will be nice to make the use of our queries more efficiently by picking the right places to use them as suggested in comment 5.

Indeed we expect that environments that want to > 1500 vms and large number of disk to at least use an SSD disk for db and normal CPU so there is no surprise that the numbers are better on the current test setup.

Comment 23 Moran Goldboim 2017-06-22 09:22:14 UTC

(In reply to Roy Golan from comment #22)
> Based on this comment I think its fair to push this 4.2 as medium. It will
> be nice to make the use of our queries more efficiently by picking the right
> places to use them as suggested in comment 5.
> 
> Indeed we expect that environments that want to > 1500 vms and large number
> of disk to at least use an SSD disk for db and normal CPU so there is no
> surprise that the numbers are better on the current test setup.

pushing to 4.2, with the suggested changes.

Comment 24 Red Hat Bugzilla Rules Engine 2017-06-22 09:22:21 UTC

Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 25 Allon Mureinik 2017-06-22 13:40:45 UTC

(In reply to Roy Golan from comment #22)
> Based on this comment I think its fair to push this 4.2 as medium. It will
> be nice to make the use of our queries more efficiently by picking the right
> places to use them as suggested in comment 5.
Please open a new bug on this, there's way to much stuff going on on this one.
E.g., Eldad's verification steps would be meaningless for such an effort, as they call this SP in a loop, and the required behavior is for RHV to use different queries.

Note You need to log in before you can comment on or make changes to this bug.