Bug 1149496
| Summary: | Starting vms simultaneously broke affinity rules | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Artyom <alukiano> | ||||
| Component: | ovirt-engine | Assignee: | Roy Golan <rgolan> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Artyom <alukiano> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.4.0 | CC: | dfediuck, gklein, lpeer, lsurette, mavital, rbalakri, Rhev-m-bugs, sherold, srevivo, ykaul | ||||
| Target Milestone: | ovirt-3.6.0-rc3 | Keywords: | Triaged | ||||
| Target Release: | 3.6.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | 3.6.0-9 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Cause: The engine VM scheduler didn't have any trace on VMs which are about to start (WaitForLaunch) and belong to the same affinity group
Consequence: 2 VMs starting simultaneously ended on 2 different hosts (if positive affinity groups)
Fix: keep track of the pending VMs (started by engine and still no started in VDSM) and make scheduling decision based on those vms as well.
Result: Those 2 Vms would end up on the same host.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-04-20 01:35:46 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
We should probably consider a mechanism similar to pending memory for affinity rules; this means we save in memory a map of hosts and scheduled VMs which are not running yet. currently there's a built-in race. the period in between the result of filtering and selection till the actual vm creation on VDSM takes a while. (its network round-trip after all) example - Run_Vm_1 and Run_Vm_2 are two separate run VM flows with positive affinity. Run_Vm_1 ask for a host from the scheduler. scheduler sees VM_2 is down than he can pick whatever host eligible. Run_Vm_1 is now under the flow of starting up. the db would be updated with run_on_vds field after VDSM call to create() will return (the VM is still not in up status) Run_Vm_2 comes right after and repeats the same process. Chances are that RUn_VM_1 call is on the way or didn't finish yet so run_on_vds for VM_1 isn't set. so the scheduler picks some eligible host for VM_2 we might be able minimize the race by setting the run_on_vds field just before the call to VDSM. in case of error we will set it back to null. btw - according to what Artyom wrote the scheduler doesn't check if the related VMs in the affinity group can also run on the designated host (that's how he reproduced it) so we should check if all VMs in the affinity group could fit as well and it will minimize the selection (In reply to Roy Golan from comment #2) > currently there's a built-in race. the period in between the result of > filtering and selection till the actual vm creation on VDSM takes a while. > (its network round-trip after all) > > example - > Run_Vm_1 and Run_Vm_2 are two separate run VM flows with positive affinity. > > Run_Vm_1 ask for a host from the scheduler. scheduler sees VM_2 is down than > he can pick whatever host eligible. > > Run_Vm_1 is now under the flow of starting up. the db would be updated with > run_on_vds field after VDSM call to create() will return (the VM is still > not in up status) > > Run_Vm_2 comes right after and repeats the same process. Chances are that > RUn_VM_1 call is on the way or didn't finish yet so run_on_vds for VM_1 > isn't set. so the scheduler picks some eligible host for VM_2 > > > we might be able minimize the race by setting the run_on_vds field just > before the call to VDSM. in case of error we will set it back to null. > > > btw - according to what Artyom wrote the scheduler doesn't check if the > related VMs in the affinity group can also run on the designated host > (that's how he reproduced it) > > so we should check if all VMs in the affinity group could fit as well and it > will minimize the selection Another way to prevent the race is mentioned in comment 1; The same way we allocate memory resources today into pending memory, we can do with pending affinity. So running a VM which is a member of an affinity group will consider pending affinity members which did not start running yet. 3.5.1 is already full with bugs (over 80), and since none of these bugs were added as urgent for 3.5.1 release in the tracker bug, moving to 3.5.2 A pretty big fix and intrusive. This won't make it to z. moving old bug fixed before ovirt alpha release as fixed in current beta2, 3.6.0-9. Verified on rhevm-backend-3.6.0-0.12.master.el6.noarch Run of two vms simultaneously, one of them failed to run because no valid host exist. this bug has both 3.5.z & 3.6.0 flags, in bugzilla lang it means its a clone candidate from 3.6.0 to 3.5.z meaning it's pending a clone and wasn't fixed for 3.5.z. if this isn't the case, please fix flags accordingly, if it is the case, then please clone the bugs to 3.5.7 (3.5.6 was built already) |
Created attachment 944042 [details] engine log Description of problem: Starting two vms simultaneously broke affinity rules Version-Release number of selected component (if applicable): rhevm-3.4.3-1.1.el6ev.noarch How reproducible: Always Steps to Reproduce: 1. Have rhevm with at least two hosts, create two vms with amount of memory that close to amount of memory of host(it will guarantee that vms runs on different hosts, because memory filter) 2. Create hard positive affinity group and add two vms to group 3. Start two vms simultaneously Actual results: Vms runs on different hosts Expected results: One of vms failed to run because that all hosts filtered(memory and affinity) Additional info: