Bug 1421771 - Affinity enforcement tries to balance HE VM
Summary: Affinity enforcement tries to balance HE VM
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 4.1.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.1.3
: 4.1.3.2
Assignee: Yanir Quinn
QA Contact: Artyom
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-13 16:02 UTC by Artyom
Modified: 2017-07-06 13:08 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-07-06 13:08:51 UTC
oVirt Team: SLA
Embargoed:
rule-engine: ovirt-4.1+
mgoldboi: exception+
mgoldboi: planning_ack+
rule-engine: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
engine log (5.35 MB, text/plain)
2017-02-13 16:02 UTC, Artyom
no flags Details
screenshot with failed tasks (146.63 KB, image/png)
2017-02-13 16:03 UTC, Artyom
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 72325 0 master MERGED scheduling: Dismiss HE VM in affinity migration 2017-02-16 12:26:16 UTC
oVirt gerrit 72425 0 ovirt-engine-4.1 MERGED scheduling: Dismiss HE VM in affinity migration 2017-02-22 12:23:07 UTC
oVirt gerrit 76710 0 master MERGED core: Nonmigratable VMs are ignored by affinity rules enforcer 2017-05-26 10:04:46 UTC
oVirt gerrit 76772 0 master MERGED scheduler: VmAffinityWeightPolicyUnit prefers hosts with pinned VMs 2017-05-26 10:04:50 UTC
oVirt gerrit 76822 0 master MERGED core: Affinity enforcer checks VMs on all hosts in a group 2017-05-26 10:04:56 UTC
oVirt gerrit 76995 0 master MERGED scheduler: Refactor VM affinity policy units and add tests 2017-05-26 10:04:53 UTC
oVirt gerrit 77390 0 ovirt-engine-4.1 MERGED core: Affinity enforcer checks VMs on all hosts in a group 2017-06-05 14:22:15 UTC
oVirt gerrit 77391 0 ovirt-engine-4.1 MERGED core: Nonmigratable VMs are ignored by affinity rules enforcer 2017-06-05 14:22:09 UTC
oVirt gerrit 77392 0 ovirt-engine-4.1 MERGED scheduler: Refactor VM affinity policy units and add tests 2017-06-05 14:22:03 UTC
oVirt gerrit 77393 0 ovirt-engine-4.1 MERGED scheduler: VmAffinityWeightPolicyUnit prefers hosts with pinned VMs 2017-06-05 14:21:57 UTC

Description Artyom 2017-02-13 16:02:27 UTC
Created attachment 1249923 [details]
engine log

Description of problem:
I have HE VM that runs on the host_mixed_2 and the non-HE VM that runs on the host_mixed_1 after I added this two VM's to the hard positive affinity group, the starts to show me failed tasks with the message "Balancing VM HostedEngine", so I assume that the affinity enforcement tries to balance HE VM and migrate it to the host_mixed_1

Version-Release number of selected component (if applicable):
rhevm-4.1.1-0.1.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. HE VM runs on the host_2
2. Start the non-HE VM on the host_1(has more free memory than host_2)
3. Add both VM's to the hard positive affinity group

Actual results:
The engine starts to show me the failed tasks with the message
"Balancing VM HostedEngine"

Expected results:
I believe we need to exclude the "HE VM from possible VM's for migration" and instead of always balance the cluster with non-HE VM's.

Additional info:

Comment 1 Artyom 2017-02-13 16:03:22 UTC
Created attachment 1249924 [details]
screenshot with failed tasks

Comment 2 Doron Fediuck 2017-02-14 14:03:18 UTC
If we allow affinity for the HE VM we should try to handle this.
Having that said we're not going to move the HE VM around. This means
that we will try to align by making the non-HE VMs join the HE VM.
It's up to the admin to ensure there's sufficient capacity for such
a deployment or he will get such errors since affinity is broken
and cannot be fixed.

Comment 3 Red Hat Bugzilla Rules Engine 2017-02-14 14:03:24 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 4 Martin Sivák 2017-02-14 14:20:27 UTC
> If we allow affinity for the HE VM we should try to handle this.

We should just skip hosted engine VM when AREM (or balancing) is figuring out what VM needs to be migrated.

> Having that said we're not going to move the HE VM around. This means
> that we will try to align by making the non-HE VMs join the HE VM.

Exactly.

Comment 5 Yanir Quinn 2017-02-14 14:37:46 UTC
(In reply to Martin Sivák from comment #4)
> > If we allow affinity for the HE VM we should try to handle this.
> 
> We should just skip hosted engine VM when AREM (or balancing) is figuring
> out what VM needs to be migrated.
   
  Skip the VM or dismiss the affinity group it belongs to ?

Comment 6 Martin Sivák 2017-02-14 14:40:18 UTC
Do everything as usual and only the step that selects the VM to be migrated needs to skip the hosted engine VM.

Comment 7 Eyal Edri 2017-03-05 13:40:23 UTC
Moving back to MODIFIED ( maybe even POST?).
This patch is merged on ovirt-4.1 but not ovirt-4.1.1.z which is the right branch for ovirt-4.1.1 milestone.

if this bug is still targeted to 4.1.1, please backport patch and update bug status, if not, please move bug milestone to 4.1.2

Comment 8 Artyom 2017-04-26 15:00:36 UTC
Checked on rhevm-4.1.2-0.1.el7.noarch

I do not sure that we have now the desired behavior, in my case:
1) My HE VM(VM name HostedEngine) runs on the host_1
2) Start additional VM(VM name test_vm) on the host_2
3) Create new VM to VM affinity group(hard, positive) and add HostedEngine VM and test_vm to the affinity group
4) Wait some time

Result:
Nothing happens, the engine does not try to migrate the test_vm to the host_1(what I expect), I believe it happens because that everytime the AREM chooses the HostedEngine VM for migration.

private boolean isVmMigrationValid(Cluster cluster, VM candidateVm) {

        if (candidateVm.isHostedEngine()) {
            log.debug("VM {} is NOT a viable candidate for solving the affinity group violation situation"
                            + " since its a hosted engine VM.",
                    candidateVm.getId());
            return false;
        }
I think we just need to skip HE VM as canditate VM for migration before we send candidate VM to the method.

Any thoughts?

Comment 9 Yanir Quinn 2017-04-27 07:18:56 UTC
might be right.  what happens when you use vm to host affinity ?
HE_VM and test_vm with (hard,positive) to host_1 ?

Comment 10 Artyom 2017-04-27 12:38:44 UTC
I tried it with the Host to VM affinity but the result is the same:
1) Have two VM's(HostedEngine and test_vm), both runs on the host_mixed_1
2) Create new affinity group:
   <affinity_group href="/ovirt-engine/api/clusters/00000002-0002-0002-0002-00000000017a/affinitygroups/d331fa8c-0047-436f-b925-e486aca1bd73" id="d331fa8c-0047-436f-b925-e486aca1bd73">
<name>test_affinity</name>
<link href="/ovirt-engine/api/clusters/00000002-0002-0002-0002-00000000017a/affinitygroups/d331fa8c-0047-436f-b925-e486aca1bd73/vms" rel="vms" />
<enforcing>true</enforcing>
<hosts_rule>
  <enabled>true</enabled>
  <enforcing>true</enforcing>
  <positive>true</positive>
</hosts_rule>
<vms_rule>
  <enabled>false</enabled>
  <enforcing>true</enforcing>
  <positive>false</positive>
</vms_rule>
<cluster href="/ovirt-engine/api/clusters/00000002-0002-0002-0002-00000000017a" id="00000002-0002-0002-0002-00000000017a" />
<hosts>
  <host id="fca7300c-760a-45aa-aa81-a8968fb8abef" /> - host_mixed_2
</hosts>
<vms>
  <vm id="065e3895-555d-43fb-a553-4aa537963c4b" /> - HostedEngine
  <vm id="17779151-67c0-4218-999e-70338ca39dcf" /> - test_vm
</vms>
</affinity_group>

Both VM's stay on the host host_mixed_1

And under the log I can see only:
2017-04-27 08:28:13,519-04 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-21) [519fb0a5-c741-490c-8883-45b6940b0e83] EVENT_ID: USER_UPDATED_AFFINITY_GROUP(10,352), Correlation ID: 519fb0a5-c741-490c-8883-45b6940b0e83, Call Stack: null, Custom Event ID: -1, Message: Affinity Group test_affinity was updated. (User: admin@internal-authz)
2017-04-27 08:28:32,580-04 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (DefaultQuartzScheduler5) [] Candidate host 'host_mixed_1' ('19d88d3d-7d9f-4d19-8372-b8f1086f309e') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'VmToHostsAffinityGroups' (correlation id: null)
2017-04-27 08:28:32,580-04 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (DefaultQuartzScheduler5) [] Candidate host 'host_mixed_3' ('2c7be430-a43f-4a87-a393-601f999fd3e6') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'VmToHostsAffinityGroups' (correlation id: null)
2017-04-27 08:29:32,880-04 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (DefaultQuartzScheduler5) [] Candidate host 'host_mixed_1' ('19d88d3d-7d9f-4d19-8372-b8f1086f309e') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'VmToHostsAffinityGroups' (correlation id: null)
2017-04-27 08:29:32,880-04 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (DefaultQuartzScheduler5) [] Candidate host 'host_mixed_3' ('2c7be430-a43f-4a87-a393-601f999fd3e6') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'VmToHostsAffinityGroups' (correlation id: null)

Comment 11 Artyom 2017-05-03 12:19:51 UTC
I checked a little deeper in the code, and found to problematic functions:
1) protected Guid chooseCandidateHostForMigration(....
return the best host with VM's for the balancing, but in the case when the host includes only the HE VM, nothing to balance.

2) private List<Guid> findVmViolatingNegativeAg(...
   ...
       if (firstAssignment.containsKey(host)) {
                violatingVms.add(vm);
                violatingVms.add(firstAssignment.get(host));
            } else {
                firstAssignment.put(host, vm);
            }
   ...
   
   if the first VM on the host is regular VM, the code will not add it to the violatingVms and if the second VM the HE VM, it will be added to the violatingVms.

Comment 12 Red Hat Bugzilla Rules Engine 2017-05-03 12:19:58 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 13 Andrej Krejcir 2017-05-11 15:44:44 UTC
In the new patch, when the hosted engine is in a positive affinity group, the AREM will ignore all VMs in the group that are running on the same host as the hosted engine. It will pick a VM from another host to be migrated.
However, the scheduler may still decide to not migrate the VM, because it is running on the best host.

Comment 14 Artyom 2017-06-18 11:53:55 UTC
Verified on rhevm-4.1.3.2-0.1.el7.noarch

Comment 15 Red Hat Bugzilla Rules Engine 2017-06-18 11:54:03 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.


Note You need to log in before you can comment on or make changes to this bug.