1674386 – VM in host affinity are processed always in the same order.

Bug 1674386 - VM in host affinity are processed always in the same order.

Summary: VM in host affinity are processed always in the same order.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.2.8-2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.3.5
Target Release:	---
Assignee:	Andrej Krejcir
QA Contact:	Polina
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	gss_rhv_4_3_4
TreeView+	depends on / blocked

Reported:	2019-02-11 08:57 UTC by Roman Hodain
Modified:	2019-08-12 11:53 UTC (History)
CC List:	8 users (show)
Fixed In Version:	(missing build info)
Doc Type:	Bug Fix
Doc Text:	Previously, the Affinity Rules Enforcer tried to migrate only one Virtual Machine, but if the migration failed, it did not attempt another migration. In this release, the Affinity Rules Enforcer tries to migrate multiple Virtual Machines until a migration succeeds.
Clone Of:
Environment:
Last Closed:	2019-08-12 11:53:27 UTC
oVirt Team:	SLA
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2019:2431	None	None	None	2019-08-12 11:53:39 UTC
oVirt gerrit	99182	master	MERGED	core: Affinity rules enforcer tries to migrate multiple VMs until success	2019-05-16 11:56:52 UTC
oVirt gerrit	100121	None	None	None	2019-05-23 09:58:20 UTC

Description Roman Hodain 2019-02-11 08:57:15 UTC

Description of the problem:
The engine processes VMs in no particular (intended) order so the VMs are processed as returned by the DB subsystem (the same order) when the engine tries to resolve the affinity conflicts. As the engine tries to migrate only one VM per the evaluation cycle, there is a high chance that one VM that cannot be migrated will block the rest of the VMs in the same affinity group from being migrated.

Version-Release number of selected component (if applicable):
rhvm-4.2.8.2-0.1

How reproducible:
100%

Steps to Reproduce:
1. Create a set of VMs some with a high amount of RAM and some of a low amount of RAM
2. Have hypervisor with just less memory than the memory needed by the VMs. This just o simulate environment with more hypervisor in the same affinity group.
3. You will see that the engine will not migrate all the VMs even if it could as it will be blocked by a VM with a high amount of memory that is repeatedly tried to be migrated on this hypervisor.

Actual results:
There are VMs that could be migrated on the hypervisor, but the engine will not try as it is endlessly trying to migrate VM that cannot fit there.

Expected results:
If there is a VM that cannot be migrated the engine should skip it and try another VM in the list.
Obviously, also the entire algorithm should be improved so there is a higher chance that all the VMs will fir there if more hypervisors are available.

Additional info:
The VMs are actually sorted by the number of conflicts, but in real life, there are only one or two conflicts and the majority of the VMs have the same number of conflicts (mostly just one).

The issue increases with the number of VMs in the affinity group and the diversity of the VM (small and big VMs).

Comment 1 Andrej Krejcir 2019-02-18 10:17:50 UTC

What is the exact configuration of VMs and hosts? How much memory the VMs need and the hosts have?

The engine checks if a VM can be migrated and if it cannot, it will try to migrate a different VM.
This bug may be an edge case, where the VM can be migrated, but the best host for it is the one where it is currently running, so it is not moved.

Comment 6 Andrej Krejcir 2019-05-02 08:15:25 UTC

This issue will be solved by some of the patches that solve Bug 1651747.

Comment 7 Andrej Krejcir 2019-05-23 09:57:12 UTC

The other bug is in MODIFIED.

Comment 8 Dusan Fodor 2019-05-24 14:29:57 UTC

(In reply to Andrej Krejcir from comment #6)
> This issue will be solved by some of the patches that solve Bug 1651747.
Shouldn't this be retargeted to TM 4.3.5 as 1651747 is?

Comment 11 RHV bug bot 2019-06-27 11:39:42 UTC

WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]

For more info please contact: rhv-devops

Comment 12 RHV bug bot 2019-06-27 11:48:31 UTC

WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]

For more info please contact: rhv-devops

Comment 14 Andrej Krejcir 2019-07-03 15:44:44 UTC

Possible steps to verify:

1. Have 3 VMs (VM1, VM2, VM3) and 2 hosts (Host1, Host2). VM1 and VM2 are running on Host1, VM3 on Host2.
2. Create these VM to host affinity groups:
  - positive hard (VM3, Host2)
  - positive soft (VM1, Host1)
  - positive soft (VM2, Host1)

3. Create positive hard VM affinity group with all VMs.
4. Check the engine.log.

Expected results:
The affinity rules enforcer runs every minute by default. 
It should try to migrate both VM1 and VM2 every time it runs, but they will not migrate.
The log should contain "Running command: BalanceVmCommand" for both VMs no more than a minute apart.


The reason for this setup, is that VM1 and VM2 can be migrated, but because of their VM to host soft affinity,
the scheduler chooses Host1 as the best host for them. 
As a result, when one of them is not migrated the affinity rules enforcer tries to migrate the other one.

Comment 15 Polina 2019-07-08 16:05:22 UTC

Verified according to steps described in https://bugzilla.redhat.com/show_bug.cgi?id=1674386#c14

29d8f9d8-23a0-4028-935c-96ac9bba8c27 - VM1 (golden_env_mixed_virtio_0)
932c013e-5f5d-4ea0-8382-b132dea73c33 - VM2 (golden_env_mixed_virtio_1)
a4871027-040a-49fa-b869-673a31270f5f - VM3 (golden_env_mixed_virtio_2)

once a minute there is BalanceVmCommand report in engine.log for VM1 and VM2.
2019-07-08 18:46:07,259+03 INFO  [org.ovirt.engine.core.bll.BalanceVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-93) [33e8f107] Running command: BalanceVmCommand internal: true. Entities affected :  ID: 29d8f9d8-23a0-4028-935c-96ac9bba8c27
2019-07-08 18:46:07,444+03 INFO  [org.ovirt.engine.core.bll.BalanceVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-93) [4051b966] Running command: BalanceVmCommand internal: true. Entities affected :  ID: 932c013e-5f5d-4ea0-8382-b132dea73c33

2019-07-08 18:46:07,292+03 WARN  [org.ovirt.engine.core.bll.scheduling.policyunits.VmAffinityPolicyUnit] (EE-ManagedThreadFactory-engineScheduled-Thread-93) [33e8f107] Invalid affinity situation was detected while scheduling VMs: 'golden_env_mixed_virtio_0' (29d8f9d8-23a0-4028-935c-96ac9bba8c27). VMs belonging to the same positive enforcing affinity groups are running on more than one host.
2019-07-08 18:46:07,450+03 WARN  [org.ovirt.engine.core.bll.scheduling.policyunits.VmAffinityPolicyUnit] (EE-ManagedThreadFactory-engineScheduled-Thread-93) [4051b966] Invalid affinity situation was detected while scheduling VMs: 'golden_env_mixed_virtio_1' (932c013e-5f5d-4ea0-8382-b132dea73c33). VMs belonging to the same positive enforcing affinity groups are running on more than one host.

2019-07-08 18:46:07,292+03 WARN  [org.ovirt.engine.core.bll.scheduling.policyunits.VmAffinityPolicyUnit] (EE-ManagedThreadFactory-engineScheduled-Thread-93) [33e8f107] Invalid affinity situation was detected while scheduling VMs: 'golden_env_mixed_virtio_0' (29d8f9d8-23a0-4028-935c-96ac9bba8c27). VMs belonging to the same positive enforcing affinity groups are running on more than one host.
2019-07-08 18:46:07,450+03 WARN  [org.ovirt.engine.core.bll.scheduling.policyunits.VmAffinityPolicyUnit] (EE-ManagedThreadFactory-engineScheduled-Thread-93) [4051b966] Invalid affinity situation was detected while scheduling VMs: 'golden_env_mixed_virtio_1' (932c013e-5f5d-4ea0-8382-b132dea73c33). VMs belonging to the same positive enforcing affinity groups are running on more than one host.

Comment 19 errata-xmlrpc 2019-08-12 11:53:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2431

Note You need to log in before you can comment on or make changes to this bug.