I've defined a soft negative affinity group for two VMs. To my understanding if the there are at least 2 nodes available in the cluster then the VMs SHOULD start on different nodes. This does not happen. They start on the same node. If I make it hard then it works. However I don't want to make it hard since if there is only one node available in the cluster then one vm will stay down.
any news on this ? thanks
Hi, soft affinity only modifies the one number in the weight table of all relevant hosts so if there was something that was considered to be more serious (cpu load, memory..) then it was possible that the VMs started on the same host. We would need debug logs from the actual run to be certain though. What you can try is to open the current cluster policy and increase the factor multiplier for affinity weight module to some bigger number to make it more important. Or you can disable weight modules for other resources (memory, cpu..).
Thanks for the reply. You've touched another part of ovirt for which i'm not very satisfied and that's cluster policy. Ovirt seems to have a preference to overload one node before starting VMs to other nodes... I've made a copy and applied evenly_distributed policy with the values bellow: OptimalForEvenDistribution: 1 HA: 1 OptimalForHaReservation: 1 VMAffinityGroups: 10 with properties CpuOverCommitDuration: 2 HighUtilization: 50 They still start on the same node (and not the one with the least load)... In engine.log I don't have something interesting. How can I enable DEBUG?
What is the CPU load of your hosts? Evenly balanced uses only CPU load to balance VMs. The balancing is behaving like you describe when the VMs are doing mostly nothing and the host sees 0% CPU load. We are introducing additional memory load factors to balancing in 3.6.
CPU load is ~ 45-55% for all nodes
Ah.. but that seems to be a quite correct distribution. A new VM that is not doing anything might not contribute to the load and so a second VM can end up on the same host. Can you attach an engine.log from the oVirt engine machine? And how does the overloading of a single host look like? Are you mass starting the VMs (multiple VMs at once)? We need at least some details about the situation and actions to be able to reproduce this as a something different than the known limitation in the CPU based scheduling. Try giving the affinity groups even higher factor (100 for example).
Created attachment 1044683 [details] engine log Sorry for the delay. I've just upgraded to 3.5.3 (vdsm also latest) and had the same result. VMAffinityGroups was set to 100 (on policy) Both vms started on same node. CPU on all all nodes was around 50~ on all of them.
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015. Please review this bug and if not a blocker, please postpone to a later release. All bugs not postponed on GA release will be automatically re-targeted to - 3.6.1 if severity >= high - 4.0 if severity < high
This bug is flagged for 3.6, yet the milestone is for 4.0 version, therefore the milestone has been reset. Please set the correct milestone or add the flag.
(In reply to Kapetanakis Giannis from comment #7) > Created attachment 1044683 [details] > engine log > > Sorry for the delay. > > I've just upgraded to 3.5.3 (vdsm also latest) and had the same result. > VMAffinityGroups was set to 100 (on policy) > > Both vms started on same node. > CPU on all all nodes was around 50~ on all of them. Martin if the factor is 100 then the soft best effort isn't working right.
The issue is that we do not use normalized numbers for weight policy unit. And the memory policy unit uses big numbers (megabytes) that almost always overweight everything else (affinity uses pretty low numbers..). The solution would be to either normalize weighting or use rank based weighting similar to what I did in the oVirt patchset I just attached.
Assigning to a placeholder email to stop polluting our lists. We will assign it to a proper person once the bug is prioritized again.
I am moving this to ON_QA with TestOnly keyword since https://gerrit.ovirt.org/#/c/67707/ is now merged and will make weighting factors behave much more predictably (there is no need for insanely high values now). The normalization feature is documented here: http://www.ovirt.org/develop/release-management/features/sla/scheduling-weight-normalization/ The change should make it into the next 4.1 build (this week according to the current plan).
Verified on rhevm-4.1.0-0.3.beta2.el7.noarch I have 3 hosts under my environment, one of the hosts has more memory and CPU's. 1) Create soft negative affinity group and add vm_1, vm_2 and vm_3 2) Start vm_1 and vm_2 3) Start vm_3 when factor of the VmAffinityGroups equal to 1 - vm_3 starts on the host with more CPU and memory Host with more CPU and memory - 353417ed-25a8-4bbd-8940-df481f3b16e3 Ranking selector: *;factor;246d5c0e-7ad0-4522-95ff-4c7f5069ac8d;;353417ed-25a8-4bbd-8940-df481f3b16e3;;bfa1dbe1-e405-4070-9840-eb4390a42e0a; 98e92667-6161-41fb-b3fa-34f820ccbc4b;1; 2;1; 2;1; 2;1 84e6ddee-ab0d-42dd-82f0-c297779db567;1; 1;1000; 1;1000; 2;1 427aed70-dae3-48ba-8fe9-a902a9d563c8;1; 2;1; 2;1; 2;1 7db4ab05-81ab-42e8-868a-aee2df483edb;1; 1;2; 2;1; 1;2 7f262d70-6cac-11e3-981f-0800200c9a66;1; 2;0; 2;0; 2;0 591cdb81-ba67-45b4-9642-e28f61a97d57;1; 2;10000; 2;10000; 2;10000 4134247a-9c58-4b9a-8593-530bb9e37c59;1; 1;359; 2;1; 0;543 Ranks of the hosts: 246d5c0e-7ad0-4522-95ff-4c7f5069ac8d - 11 353417ed-25a8-4bbd-8940-df481f3b16e3 - 13 bfa1dbe1-e405-4070-9840-eb4390a42e0a - 11 4) Stop vm_3 5) Change the VmAffinityGroups factor to 3 6) Start vm_3 - vm_3 starts on the host bfa1dbe1-e405-4070-9840-eb4390a42e0a because of the affinity a56-410e-b972-982a87ea4289] Ranking selector: *;factor;246d5c0e-7ad0-4522-95ff-4c7f5069ac8d;;353417ed-25a8-4bbd-8940-df481f3b16e3;;bfa1dbe1-e405-4070-9840-eb4390a42e0a; 98e92667-6161-41fb-b3fa-34f820ccbc4b;1; 2;1; 2;1; 2;1 84e6ddee-ab0d-42dd-82f0-c297779db567;3; 1;1000; 1;1000; 2;1 427aed70-dae3-48ba-8fe9-a902a9d563c8;1; 2;1; 2;1; 2;1 7db4ab05-81ab-42e8-868a-aee2df483edb;1; 1;2; 2;1; 1;2 7f262d70-6cac-11e3-981f-0800200c9a66;1; 2;0; 2;0; 2;0 591cdb81-ba67-45b4-9642-e28f61a97d57;1; 2;10000; 2;10000; 2;10000 4134247a-9c58-4b9a-8593-530bb9e37c59;1; 1;359; 2;1; 0;495 Ranks of the hosts: 246d5c0e-7ad0-4522-95ff-4c7f5069ac8d - 11 353417ed-25a8-4bbd-8940-df481f3b16e3 - 13 bfa1dbe1-e405-4070-9840-eb4390a42e0a - 15