Bug 1354281

Summary: All hosts filtered out when memory underutilized parameter left out
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Yanir Quinn <yquinn>
Status: CLOSED ERRATA QA Contact: Artyom <alukiano>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.7CC: dfediuck, gklein, lsurette, mgoldboi, rbalakri, rgolan, Rhev-m-bugs, sbonazzo, srevivo, trichard, ykaul, ylavi
Target Milestone: ovirt-4.0.4Keywords: Triaged, ZStream
Target Release: 4.0.4   
Hardware: x86_64   
OS: Linux   
Whiteboard: EasyFix
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Setting only one of the thresholds for power saving/evenly distributed memory based balancing (high or low) can lead to unexpected results. For example, when in power saving load balancing the threshold for memory over utilized hosts was set with a value, and the threshold for memory under utilized hosts was undefined thus getting a default value of 0. All hosts were considered as under utilized hosts and were chosen as sources for migration, but no host was chosen as a destination for migration. This has now been changed so that when the threshold for memory under utilized host is undefined, it gets a default value of Long.MAX. Now, when the threshold for memory over utilized hosts is set with a value, and the threshold for memory under utilized host is undefined, only over utilized hosts will be selected as sources for migration, and destination hosts will be hosts that are not over utilized.
Story Points: ---
Clone Of:
: 1359767 (view as bug list) Environment:
Last Closed: 2016-09-28 22:15:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1359767, 1376289    

Description Germano Veit Michel 2016-07-11 05:45:24 UTC
Description of problem:

In function getSecondaryDestinations in backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/scheduling/policyunits/PowerSavingBalancePolicyUnit.java we evaluate tooMuchMemory based on HIGH_MEMORY_LIMIT_FOR_UNDER_UTILIZED as follows:

[pseudocode]
notEnoughMemory = defined ? LOW_MEMORY_LIMIT_FOR_OVER_UTILIZED   : 0L
tooMuchMemory   = defined ? HIGH_MEMORY_LIMIT_FOR_UNDER_UTILIZED : 0L
[/pseudocode]

Then we do 

getNormallyUtilizedMemoryHosts(candidateHosts, notEnoughMemory, tooMuchMemory);

That is, if there is no defined value for HIGH_MEMORY_LIMIT_FOR_UNDER_UTILIZED, we actually call 

protected List<VDS> getNormallyUtilizedMemoryHosts(Collection<VDS> hosts, long minFreeMemory, long maxFreeMemory)

with the following parameters:

getNormallyUtilizedMemoryHosts(candidateHosts, notEnoughMemory, 0);

This actually means that in getNormallyUtilizedMemoryHosts, maxFreeMemory is 0.

for (VDS h: hosts) {
    if (h.getMaxSchedulingMemory() >= minFreeMemory
          && h.getMaxSchedulingMemory() <= maxFreeMemory) { 
    result.add(h);
    }
}

So, hosts will only be considered if:

h.getMaxSchedulingMemory() <= 0

That is, if they have ZERO or less memory. This means all hosts are filtered out, as no hosts have less than 0 memory (even worse, scheduling memory).

I believe the bug is in getSecondarySources:

[pseudocode]
notEnoughMemory = defined ? LOW_MEMORY_LIMIT_FOR_OVER_UTILIZED   : 0L
tooMuchMemory   = defined ? HIGH_MEMORY_LIMIT_FOR_UNDER_UTILIZED : 0L
[/pseudocode]

Should be

[pseudocode]
notEnoughMemory = defined ? LOW_MEMORY_LIMIT_FOR_OVER_UTILIZED   : 0L
tooMuchMemory   = defined ? HIGH_MEMORY_LIMIT_FOR_UNDER_UTILIZED : LONG_MAX
[/pseudocode]

Version-Release number of selected component (if applicable):
ovirt-engine-3.6.6
Latest upstream seems to be doing the exact same thing

How reproducible:
100%

Steps to Reproduce:
1. Prepare a OptimalForPowerSaving policy.
2. Leave MinFreeMemoryForUnderUtilized without a value

Actual results:
All hosts filtered out

Expected results:
Not filtering all hosts

Comment 1 Martin Sivák 2016-07-18 13:58:06 UTC
While we can use the fix and change what happens when the MinFreeMemoryForUnderUtilized is missing with regards to moving VMs from over utilized hosts elsewhere, the other direction (hosts with only handful of VM that can be cleared and shut down) still won't work.

The feature page [1] actually says that memory balancing is disabled when the memory values are set to 0. We never defined what happens when only one is set.

[1] http://old.ovirt.org/Features/Sla/MemoryBasedBalancing


The same issue is present in Equally balanced mode and there both values have to be set or memory balancing won't work and we can't do anything about it really.

Comment 2 Germano Veit Michel 2016-07-19 01:07:23 UTC
Hi Martin,

First of all thank you for looking into this. Please allow me to make some points.

1. If one parameter is left unset, the Load Balancer is actually executing, producing logs saying all hosts have been filtered out. Not really desirable as it can be misleading.
2. Setting a high value for tooMuchMemory seems to do the trick when one of the values is left out. Same thing as 0 for notEnoughMemory.
3. Why don't we have in RHEV manuals such nice(!) docs as ovirt has for this? 
   * Should we file a docs bug?
4. The ovirt documentation states: "The memory based balancing can be disabled by using 0 MB as both high and low thresholds."
   * From what I understand this means it will be disable only if BOTH are not set, not just one of them.

Therefore, I believe the solution here would be one of the points below (or a combination of them):

A. Update Product Documentation clearly stating that both parameters must be configured.
B. Effectively disable Load Balancing, letting the user know it is missing a parameter (not let it run producing misleading logs).
C. Evaluate setting a high/low default value to make it work when one of them is missed, making it all work.

We have a customer who set an extremely high number for tooMuchMemory (more than his hosts actually have), so they are never filtered out due to underutilized for memory. He only cares about overutilized. And it seems to be working quite well.

Hopefully this makes sense.

Cheers,
Germano

Comment 3 Germano Veit Michel 2016-07-19 01:12:04 UTC
Hi Martin,

You are right, the other direction would still be a problem, as it would never be underutilized based on memory. But still, how bad would this be? I am afraid hosts could still be considered underutilized based on CPU, migrating VMs off eventually. No?

Comment 4 Yanir Quinn 2016-07-20 08:24:57 UTC
(In reply to Germano Veit Michel from comment #3)
> Hi Martin,
> 
> You are right, the other direction would still be a problem, as it would
> never be underutilized based on memory. But still, how bad would this be? I
> am afraid hosts could still be considered underutilized based on CPU,
> migrating VMs off eventually. No?

Hi Germano,
The approach is
- Perform balancing (VM migration) based on CPU 
- If no candidate sources to migrate from / candidate destination to migrate to are to be found using the CPU based approach, 
then the memory based approach is used. 
There is no mix between these methods (CPU/Memory)

Comment 5 Germano Veit Michel 2016-07-25 00:12:28 UTC
Yanir, thanks for the clarification.

Comment 8 Artyom 2016-09-13 08:55:38 UTC
Verified on rhevm-4.0.4.2-0.1.el7ev.noarch

1) Have two hosts under the engine(both with 16GB)
2) Start on the host_1 two VM's on 12Gb and one with 1Gb
3) Change scheduling policy to the power saving policy with MaxFreeMemoryForOverUtilizied=10Gb
4) 1Gb VM migrates on the host_2

Comment 10 errata-xmlrpc 2016-09-28 22:15:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-1967.html