Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1430285 - [scale] - core.bll.RunVmCommandBase.delay locked the cluster for long time.
[scale] - core.bll.RunVmCommandBase.delay locked the cluster for long time.
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core (Show other bugs)
4.1.1.2
x86_64 Linux
medium Severity low (vote)
: ovirt-4.1.4
: ---
Assigned To: Andrej Krejcir
eberman
: Performance
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-03-08 04:41 EST by Eldad Marciano
Modified: 2017-07-28 10:20 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Release Note
Doc Text:
In order to run VMs on hosts with limited memory resources, the cluster scheduling policy should be set to "evenly_distributed", with "maxFreeMemoryForOverUtilized = 99". This configuration enables the engine to schedule VMs on hosts with more free memory, for better distribution.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-07-28 10:20:22 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: SLA
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.1+


Attachments (Terms of Use)
50 bulks of "create vms" results (47.38 KB, image/png)
2017-07-17 04:19 EDT, eberman
no flags Details
logs (1.28 MB, application/zip)
2017-07-17 04:52 EDT, eberman
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 74198 master MERGED core: Optimize waiting for pending memory in MemoryPolicyUnit 2017-05-25 08:31 EDT
oVirt gerrit 74318 master MERGED core: Allow releasing pending resources during scheduling 2017-05-25 08:31 EDT
oVirt gerrit 77331 master MERGED core: PendingResourceManager uses atomic compute methods on maps 2017-05-30 08:48 EDT
oVirt gerrit 77338 ovirt-engine-4.1 MERGED core: Optimize waiting for pending memory in MemoryPolicyUnit 2017-05-30 09:58 EDT
oVirt gerrit 77339 ovirt-engine-4.1 MERGED core: Allow releasing pending resources during scheduling 2017-05-30 09:58 EDT
oVirt gerrit 77552 ovirt-engine-4.1 MERGED core: PendingResourceManager uses atomic compute methods on maps 2017-06-16 04:53 EDT

  None (edit)
Description Eldad Marciano 2017-03-08 04:41:14 EST
Description of problem:
In part of the scale out nested scenarios we found the engine locked for long time specially for running vms.

we identify the issue by very long response time of starting vms (around 1hour for start vm api call).

the following code core.bll.RunVmCommandBase.delay
will take the cluster lock for at least 3 sec, otherwise 10sec.
In case we have more vms that waiting for start in the queue it will increase the lock time.

see the following thread dump:
[root@master-vds9 ~]# cat /tmp/jstck.log |grep -c 'SchedulingManager.lockCluster'
94

we have 94 threads that wait for the lock in order to start vm.


the delay code is specially for extreme case, when the host have memory starvation.

this is exactly our case, since we dealing with nested hosts, each host have 1.4GB and we trying to run 3 vms in size of 300mb.

this will trigger the RunVmCommandBase.delay.

Version-Release number of selected component (if applicable):
rhevm-4.1-1.02

How reproducible:
100%

Steps to Reproduce:
1. run vms in bulk of 50 or sequentially add more vms to run .
2. make sure those vms will run on top of poor host (memory wise).

Actual results:
very long response time for vm start.

Expected results:
stable and reasonable response time for vm start 

Additional info:
Comment 3 Michal Skrivanek 2017-03-08 08:10:46 EST
This is inside the scheduler.
Discussed that with msivak recently, there are some proposals how to improve that
Comment 4 Yaniv Kaul 2017-03-08 11:29:57 EST
Reducing severity, as the scenario is quite corner case.
Comment 5 Roy Golan 2017-03-09 02:57:22 EST
first , workaround:
- set config value SchedulerAllowOverBooking to 'True' and optionally tweak the SchedulerOverBookingThreshold to N which is close to the size of the bulk you want to run

When *over-committing* we can hit a serial delay * H * V where H is num of UP hosts overcommitted and and V is the number of vms we want to run. Here it summed up to an hour, but even on smaller scale, it could add up to minutes.
Comment 6 Doron Fediuck 2017-03-09 05:33:10 EST
There is a design decision that takes a lock at the cluster level, to ensure we have consistency while making a scheduling decision.
This is by design and will not change. For emergencies or temporary high demand, we included an override option, which should only be used for a short amount of time (Monday morning effect):
Edit cluster -> Scheduling policy -> Optimize for Speed

We've seen this in the past with Bug 1149701. Is there something new now?
Comment 7 Eldad Marciano 2017-03-13 17:46:54 EDT
(In reply to Doron Fediuck from comment #6)
> There is a design decision that takes a lock at the cluster level, to ensure
> we have consistency while making a scheduling decision.
> This is by design and will not change. For emergencies or temporary high
> demand, we included an override option, which should only be used for a
> short amount of time (Monday morning effect):
> Edit cluster -> Scheduling policy -> Optimize for Speed
> 
> We've seen this in the past with Bug 1149701. Is there something new now?

Well, may be someone from SLA or virt can tell more precisely.
it looks like the same bug, or at least as you mention the same cluster look.
but this case is different now, and more related to RunVmCommandBase.delay code

the delay method called in some cases as I talked to Roy, In our case when a host suffer from memory starvation the "RunVmCommand" will call the delay method.
Comment 8 Martin Sivák 2017-03-15 07:24:26 EDT
Andrej, can we please refactor this so we wait only once per scheduling run and only when no host was left in the memory policy unit? So if there is at least one usable host returned from that policy unit, we won't wait. Otherwise we wait, refresh host memory information and try again, but only once.
Comment 11 eberman 2017-07-17 04:19 EDT
Created attachment 1299689 [details]
50 bulks of "create vms" results

50 bulks of "create vms" results
Comment 13 eberman 2017-07-17 04:52 EDT
Created attachment 1299718 [details]
logs

Note You need to log in before you can comment on or make changes to this bug.