Description of problem: In function getSecondaryDestinations in backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/scheduling/policyunits/PowerSavingBalancePolicyUnit.java we evaluate tooMuchMemory based on HIGH_MEMORY_LIMIT_FOR_UNDER_UTILIZED as follows: [pseudocode] notEnoughMemory = defined ? LOW_MEMORY_LIMIT_FOR_OVER_UTILIZED : 0L tooMuchMemory = defined ? HIGH_MEMORY_LIMIT_FOR_UNDER_UTILIZED : 0L [/pseudocode] Then we do getNormallyUtilizedMemoryHosts(candidateHosts, notEnoughMemory, tooMuchMemory); That is, if there is no defined value for HIGH_MEMORY_LIMIT_FOR_UNDER_UTILIZED, we actually call protected List<VDS> getNormallyUtilizedMemoryHosts(Collection<VDS> hosts, long minFreeMemory, long maxFreeMemory) with the following parameters: getNormallyUtilizedMemoryHosts(candidateHosts, notEnoughMemory, 0); This actually means that in getNormallyUtilizedMemoryHosts, maxFreeMemory is 0. for (VDS h: hosts) { if (h.getMaxSchedulingMemory() >= minFreeMemory && h.getMaxSchedulingMemory() <= maxFreeMemory) { result.add(h); } } So, hosts will only be considered if: h.getMaxSchedulingMemory() <= 0 That is, if they have ZERO or less memory. This means all hosts are filtered out, as no hosts have less than 0 memory (even worse, scheduling memory). I believe the bug is in getSecondarySources: [pseudocode] notEnoughMemory = defined ? LOW_MEMORY_LIMIT_FOR_OVER_UTILIZED : 0L tooMuchMemory = defined ? HIGH_MEMORY_LIMIT_FOR_UNDER_UTILIZED : 0L [/pseudocode] Should be [pseudocode] notEnoughMemory = defined ? LOW_MEMORY_LIMIT_FOR_OVER_UTILIZED : 0L tooMuchMemory = defined ? HIGH_MEMORY_LIMIT_FOR_UNDER_UTILIZED : LONG_MAX [/pseudocode] Version-Release number of selected component (if applicable): ovirt-engine-3.6.6 Latest upstream seems to be doing the exact same thing How reproducible: 100% Steps to Reproduce: 1. Prepare a OptimalForPowerSaving policy. 2. Leave MinFreeMemoryForUnderUtilized without a value Actual results: All hosts filtered out Expected results: Not filtering all hosts
While we can use the fix and change what happens when the MinFreeMemoryForUnderUtilized is missing with regards to moving VMs from over utilized hosts elsewhere, the other direction (hosts with only handful of VM that can be cleared and shut down) still won't work. The feature page [1] actually says that memory balancing is disabled when the memory values are set to 0. We never defined what happens when only one is set. [1] http://old.ovirt.org/Features/Sla/MemoryBasedBalancing The same issue is present in Equally balanced mode and there both values have to be set or memory balancing won't work and we can't do anything about it really.
Hi Martin, First of all thank you for looking into this. Please allow me to make some points. 1. If one parameter is left unset, the Load Balancer is actually executing, producing logs saying all hosts have been filtered out. Not really desirable as it can be misleading. 2. Setting a high value for tooMuchMemory seems to do the trick when one of the values is left out. Same thing as 0 for notEnoughMemory. 3. Why don't we have in RHEV manuals such nice(!) docs as ovirt has for this? * Should we file a docs bug? 4. The ovirt documentation states: "The memory based balancing can be disabled by using 0 MB as both high and low thresholds." * From what I understand this means it will be disable only if BOTH are not set, not just one of them. Therefore, I believe the solution here would be one of the points below (or a combination of them): A. Update Product Documentation clearly stating that both parameters must be configured. B. Effectively disable Load Balancing, letting the user know it is missing a parameter (not let it run producing misleading logs). C. Evaluate setting a high/low default value to make it work when one of them is missed, making it all work. We have a customer who set an extremely high number for tooMuchMemory (more than his hosts actually have), so they are never filtered out due to underutilized for memory. He only cares about overutilized. And it seems to be working quite well. Hopefully this makes sense. Cheers, Germano
Hi Martin, You are right, the other direction would still be a problem, as it would never be underutilized based on memory. But still, how bad would this be? I am afraid hosts could still be considered underutilized based on CPU, migrating VMs off eventually. No?
(In reply to Germano Veit Michel from comment #3) > Hi Martin, > > You are right, the other direction would still be a problem, as it would > never be underutilized based on memory. But still, how bad would this be? I > am afraid hosts could still be considered underutilized based on CPU, > migrating VMs off eventually. No? Hi Germano, The approach is - Perform balancing (VM migration) based on CPU - If no candidate sources to migrate from / candidate destination to migrate to are to be found using the CPU based approach, then the memory based approach is used. There is no mix between these methods (CPU/Memory)
Yanir, thanks for the clarification.
Verified on rhevm-4.0.4.2-0.1.el7ev.noarch 1) Have two hosts under the engine(both with 16GB) 2) Start on the host_1 two VM's on 12Gb and one with 1Gb 3) Change scheduling policy to the power saving policy with MaxFreeMemoryForOverUtilizied=10Gb 4) 1Gb VM migrates on the host_2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-1967.html