Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1149496

Summary: Starting vms simultaneously broke affinity rules
Product: Red Hat Enterprise Virtualization Manager Reporter: Artyom <alukiano>
Component: ovirt-engineAssignee: Roy Golan <rgolan>
Status: CLOSED CURRENTRELEASE QA Contact: Artyom <alukiano>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: dfediuck, gklein, lpeer, lsurette, mavital, rbalakri, Rhev-m-bugs, sherold, srevivo, ykaul
Target Milestone: ovirt-3.6.0-rc3Keywords: Triaged
Target Release: 3.6.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 3.6.0-9 Doc Type: Bug Fix
Doc Text:
Cause: The engine VM scheduler didn't have any trace on VMs which are about to start (WaitForLaunch) and belong to the same affinity group Consequence: 2 VMs starting simultaneously ended on 2 different hosts (if positive affinity groups) Fix: keep track of the pending VMs (started by engine and still no started in VDSM) and make scheduling decision based on those vms as well. Result: Those 2 Vms would end up on the same host.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-20 01:35:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log none

Description Artyom 2014-10-05 13:32:20 UTC
Created attachment 944042 [details]
engine log

Description of problem:
Starting two vms simultaneously broke affinity rules 

Version-Release number of selected component (if applicable):
rhevm-3.4.3-1.1.el6ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Have rhevm with at least two hosts, create two vms with amount of memory that close to amount of memory of host(it will guarantee that vms runs on different hosts, because memory filter)
2. Create hard positive affinity group and add two vms to group
3. Start two vms simultaneously

Actual results:
Vms runs on different hosts

Expected results:
One of vms failed to run because that all hosts filtered(memory and affinity)

Additional info:

Comment 1 Doron Fediuck 2014-10-07 14:29:11 UTC
We should probably consider a mechanism similar to pending memory for affinity
rules; this means we save in memory a map of hosts and scheduled VMs which are
not running yet.

Comment 2 Roy Golan 2014-10-08 08:40:13 UTC
currently there's a built-in race. the period in between the result of filtering and selection till the actual vm creation on VDSM takes a while. (its network round-trip after all)

example -
 Run_Vm_1 and Run_Vm_2 are two separate run VM flows with positive affinity.

Run_Vm_1 ask for a host from the scheduler. scheduler sees VM_2 is down than he can pick whatever host eligible.

Run_Vm_1 is now under the flow of starting up. the db would be updated with run_on_vds field after VDSM call to create() will return (the VM is still not in up status)

Run_Vm_2 comes right after and repeats the same process. Chances are that 
RUn_VM_1 call is on the way or didn't finish yet so run_on_vds for VM_1 isn't set. so the scheduler picks some eligible host for VM_2


we might be able minimize the race by setting the run_on_vds field just before the call to VDSM. in case of error we will set it back to null.


btw - according to what Artyom wrote the scheduler doesn't check if the related VMs in the affinity group can also run on the designated host (that's how he reproduced it)

so we should check if all VMs in the affinity group could fit as well and it will minimize the selection

Comment 3 Doron Fediuck 2014-10-12 14:26:44 UTC
(In reply to Roy Golan from comment #2)
> currently there's a built-in race. the period in between the result of
> filtering and selection till the actual vm creation on VDSM takes a while.
> (its network round-trip after all)
> 
> example -
>  Run_Vm_1 and Run_Vm_2 are two separate run VM flows with positive affinity.
> 
> Run_Vm_1 ask for a host from the scheduler. scheduler sees VM_2 is down than
> he can pick whatever host eligible.
> 
> Run_Vm_1 is now under the flow of starting up. the db would be updated with
> run_on_vds field after VDSM call to create() will return (the VM is still
> not in up status)
> 
> Run_Vm_2 comes right after and repeats the same process. Chances are that 
> RUn_VM_1 call is on the way or didn't finish yet so run_on_vds for VM_1
> isn't set. so the scheduler picks some eligible host for VM_2
> 
> 
> we might be able minimize the race by setting the run_on_vds field just
> before the call to VDSM. in case of error we will set it back to null.
> 
> 
> btw - according to what Artyom wrote the scheduler doesn't check if the
> related VMs in the affinity group can also run on the designated host
> (that's how he reproduced it)
> 
> so we should check if all VMs in the affinity group could fit as well and it
> will minimize the selection

Another way to prevent the race is mentioned in comment 1;
The same way we allocate memory resources today into pending memory, we
can do with pending affinity. So running a VM which is a member of an
affinity group will consider pending affinity members which did not start
running yet.

Comment 4 Eyal Edri 2015-02-25 08:39:46 UTC
3.5.1 is already full with bugs (over 80), and since none of these bugs were added as urgent for 3.5.1 release in the tracker bug, moving to 3.5.2

Comment 5 Roy Golan 2015-06-10 09:33:09 UTC
A pretty big fix and intrusive. This won't make it to z.

Comment 6 Eyal Edri 2015-08-13 10:36:49 UTC
moving old bug fixed before ovirt alpha release as fixed in current beta2, 
3.6.0-9.

Comment 7 Artyom 2015-09-01 11:49:00 UTC
Verified on rhevm-backend-3.6.0-0.12.master.el6.noarch
Run of two vms simultaneously, one of them failed to run because no valid host exist.

Comment 8 Eyal Edri 2015-11-01 14:26:17 UTC
this bug has both 3.5.z & 3.6.0 flags, in bugzilla lang it means its a clone candidate from 3.6.0 to 3.5.z meaning it's pending a clone and wasn't fixed for 3.5.z.

if this isn't the case, please fix flags accordingly,
if it is the case, then please clone the bugs to 3.5.7 (3.5.6 was built already)