Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1430285

Summary: [scale] - core.bll.RunVmCommandBase.delay locked the cluster for long time.
Product: [oVirt] ovirt-engine Reporter: Eldad Marciano <emarcian>
Component: Backend.CoreAssignee: Andrej Krejcir <akrejcir>
Status: CLOSED CURRENTRELEASE QA Contact: eberman
Severity: low Docs Contact:
Priority: medium    
Version: 4.1.1.2CC: akrejcir, apinnick, bugs, dfediuck, eberman, michal.skrivanek, msivak, rgolan, stirabos
Target Milestone: ovirt-4.1.4Keywords: Performance
Target Release: ---Flags: rule-engine: ovirt-4.1+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Release Note
Doc Text:
In order to run VMs on hosts with limited memory resources, the cluster scheduling policy should be set to "evenly_distributed", with "maxFreeMemoryForOverUtilized = 99". This configuration enables the engine to schedule VMs on hosts with more free memory, for better distribution.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-28 14:20:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
50 bulks of "create vms" results
none
logs none

Description Eldad Marciano 2017-03-08 09:41:14 UTC
Description of problem:
In part of the scale out nested scenarios we found the engine locked for long time specially for running vms.

we identify the issue by very long response time of starting vms (around 1hour for start vm api call).

the following code core.bll.RunVmCommandBase.delay
will take the cluster lock for at least 3 sec, otherwise 10sec.
In case we have more vms that waiting for start in the queue it will increase the lock time.

see the following thread dump:
[root@master-vds9 ~]# cat /tmp/jstck.log |grep -c 'SchedulingManager.lockCluster'
94

we have 94 threads that wait for the lock in order to start vm.


the delay code is specially for extreme case, when the host have memory starvation.

this is exactly our case, since we dealing with nested hosts, each host have 1.4GB and we trying to run 3 vms in size of 300mb.

this will trigger the RunVmCommandBase.delay.

Version-Release number of selected component (if applicable):
rhevm-4.1-1.02

How reproducible:
100%

Steps to Reproduce:
1. run vms in bulk of 50 or sequentially add more vms to run .
2. make sure those vms will run on top of poor host (memory wise).

Actual results:
very long response time for vm start.

Expected results:
stable and reasonable response time for vm start 

Additional info:

Comment 3 Michal Skrivanek 2017-03-08 13:10:46 UTC
This is inside the scheduler.
Discussed that with msivak recently, there are some proposals how to improve that

Comment 4 Yaniv Kaul 2017-03-08 16:29:57 UTC
Reducing severity, as the scenario is quite corner case.

Comment 5 Roy Golan 2017-03-09 07:57:22 UTC
first , workaround:
- set config value SchedulerAllowOverBooking to 'True' and optionally tweak the SchedulerOverBookingThreshold to N which is close to the size of the bulk you want to run

When *over-committing* we can hit a serial delay * H * V where H is num of UP hosts overcommitted and and V is the number of vms we want to run. Here it summed up to an hour, but even on smaller scale, it could add up to minutes.

Comment 6 Doron Fediuck 2017-03-09 10:33:10 UTC
There is a design decision that takes a lock at the cluster level, to ensure we have consistency while making a scheduling decision.
This is by design and will not change. For emergencies or temporary high demand, we included an override option, which should only be used for a short amount of time (Monday morning effect):
Edit cluster -> Scheduling policy -> Optimize for Speed

We've seen this in the past with Bug 1149701. Is there something new now?

Comment 7 Eldad Marciano 2017-03-13 21:46:54 UTC
(In reply to Doron Fediuck from comment #6)
> There is a design decision that takes a lock at the cluster level, to ensure
> we have consistency while making a scheduling decision.
> This is by design and will not change. For emergencies or temporary high
> demand, we included an override option, which should only be used for a
> short amount of time (Monday morning effect):
> Edit cluster -> Scheduling policy -> Optimize for Speed
> 
> We've seen this in the past with Bug 1149701. Is there something new now?

Well, may be someone from SLA or virt can tell more precisely.
it looks like the same bug, or at least as you mention the same cluster look.
but this case is different now, and more related to RunVmCommandBase.delay code

the delay method called in some cases as I talked to Roy, In our case when a host suffer from memory starvation the "RunVmCommand" will call the delay method.

Comment 8 Martin Sivák 2017-03-15 11:24:26 UTC
Andrej, can we please refactor this so we wait only once per scheduling run and only when no host was left in the memory policy unit? So if there is at least one usable host returned from that policy unit, we won't wait. Otherwise we wait, refresh host memory information and try again, but only once.

Comment 11 eberman 2017-07-17 08:19:14 UTC
Created attachment 1299689 [details]
50 bulks of "create vms" results

50 bulks of "create vms" results

Comment 13 eberman 2017-07-17 08:52:14 UTC
Created attachment 1299718 [details]
logs