Description of problem:
Virtual machine does not migrate after enabling VM-to-VM soft Affinity (Positive VM Polarity). If we migrate one VM to other host, other VM in the affinity group does not migrate even after very long time.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1) Create new Scheduling Policy for the Cluster (Administration -> Configure -> Scheduling Policies -> vm_evenly_distributed -> Copy (as Affinity_testing) -> Affinity_testing -> Edit:
Leave Filter Modules as-is, remove all but VmAffinityGroups Weight Module, set its Factor to 100, disable Load Balancer, leave Properties as-is.
2) Select the Policy for the Cluster from Compute -> Clusters -> Default -> Edit -> Scheduling Policy -> Affinity_testing and leave everything as defaults.
3) Create two VMs on Compute -> Virtual Machines and in the VM view check that both have under Affinity Groups an Affinity Group with VM Affinity Rule Positive (non-Enforcing) and Host Affinity Rule Disabled, both VMs listed under Virtual Machines.
4) Shutdown both VMs, start them both, notice how they start on the same Host. Migrate either of them to another Host and notice how the other VM does not follow it even after a long period of time.
VMs in an affinity group does not migrate even after migrating one VM from the affinity group to another host.
The following documentation says "Soft enforcement - indicates a preference for virtual machines in an affinity group to run on the specified host or hosts in a group when possible."
So, all VMs in an affinity group should run on same host or hosts in a group when possible.
This still appears to be reproducible (I encountered it today)
Yanir, are there any changes to the default scheduling policy needed to make this work as expected? The policies I looked at today already had values for affinity set to priority 10
Looking at the code, the component responsible for migrating VMs that break affinity rules (AffinityRulesEnforcementManager) does not consider soft VM-VM affinity when choosing a VM to migrate. I think this is a bug, AREM should try to correct soft VM-VM affinity too.
Currently the soft VM-VM affinity is only used when starting or migrating a VM and during balancing. But the balancing chooses which VM to migrate based on CPU and memory load, not affinity rules. So in case the hosts are not overloaded, the VMs will not be migrated even if they break soft affinity.
Can someone help to check whether this bug is similar to this bug: Bug 1678708
This bug is about enabling the automatic migration which fixes broken affinity groups even for soft VM affinity groups.
And Bug 1678708 looks to be about VM soft affinity not applying when starting VMs.
These bugs are different.
I'm testing on ovirt-engine-18.104.22.168-0.1.el7.noarch. according to the bz the patch must be there . But I see some unexpected behavior.
Sometimes I migrate the VM1 (placed in soft positive VMs affinity rule) and the VM2 is balanced as expected (because of the affinity rule) to the same new host to be together with VM1.
But in most cases I see another behavior:
I migrate the VM1 and wait for the VM2 to be balanced. Instead, though I see that the VM1 is balanced back to the source host.
like this :
Jun 23, 2019, 3:40:04 PM Migration started (VM: golden_env_mixed_virtio_0, Source: host_mixed_1, Destination: host_mixed_2, User: admin@internal-authz).
Jun 23, 2019, 3:40:27 PM Migration completed (VM: golden_env_mixed_virtio_0, Source: host_mixed_1, Destination: host_mixed_2, Duration: 12 seconds, Total: 22 seconds, Actual downtime: 59ms)
Jun 23, 2019, 3:40:57 PM Migration initiated by system (VM: golden_env_mixed_virtio_0, Source: host_mixed_2, Destination: host_mixed_1, Reason: Affinity rules enforcement)
Jun 23, 2019, 3:41:12 PM Migration completed (VM: golden_env_mixed_virtio_0, Source: host_mixed_2, Destination: host_mixed_1, Duration: 4 seconds, Total: 14 seconds, Actual downtime: 63ms)
I attach engine log with scheduler debug . Please see example starting from the line
2019-06-23 15:40:04,502+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-3) [5a79e5ef-c2aa-409d-b606-c26b6f778088] EVENT_ID: VM_MIGRATION_START(62), Migration started (VM: golden_env_mixed_virtio_0, Source: host_mixed_1, Destination: host_mixed_2, User: admin@internal-authz).
Two VMs are running on host_mixed_1 . I migrate VM golden_env_mixed_virtio_0 to the host_mixed_2 . It succeeds and then after 30 sec this VM is balanced back instead of expected the second VM golden_env_mixed_virtio_1 to be migrated to the host host_mixed_2.
Could you please look?
Created attachment 1583695 [details]
engine with debug
That is the expected behavior. The affinity enforcement just tries to migrate VMs such that the affinity is not broken.
It does not matter if VM2 migrates to VM1 or VM1 migrates back to VM2, both cases are ok, since the affinity is fixed.
In case the user does not want VM1 to migrate back, its migration mode should be set to "manual migration only".
verified on the base of https://bugzilla.redhat.com/show_bug.cgi?id=1651747#c22 and #c24
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.