Bug 849157

Summary: Load balancer doesn't take host memory size and utilization into account, causing unnecessary memory crunch
Product: Red Hat Enterprise Virtualization Manager Reporter: David Jaša <djasa>
Component: ovirt-engineAssignee: Nobody's working on this, feel free to take it <nobody>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 3.0.2CC: dyasny, iheim, lpeer, Rhev-m-bugs, yeylon, ykaul
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-18 17:44:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Jaša 2012-08-17 13:26:17 UTC
Description of problem:
Load balancer doesn't take host memory size and utilization into account

Version-Release number of selected component (if applicable):
spotted in 3.0.2

How reproducible:
didn't try to reproduce

Steps to Reproduce:
1. have a cluster with default "even distribution" policy
2. have two hosts in a cluster with different memory 
  sizes: ram_size(host1) << ram_size(host2)
3. start VMs so that: 
  * ram_size(host1) < total_ram_taken_by VMs < ram_size(host2)
  * all VMs are idle once started save for the one in the next step
  note: VMs should have maximum possible CPU cores whils still being 
        able to migrate
4. in one VM in host2, start long-running task that utilizes all CPUs available
  
Actual results:
1. RHEV-M detects high CPU load on host2
2. RHEV-M migrates idle VMs from host2 to host1
3. point 2. is repeated until host1 gets it's memory fully utilized but RHEV-M still tries to migrate VMs there

Expected results:
while "actual results" are clearly wrong (putting host1 into totally unnecessary memory crunch), getting this right seems way more difficult. IMO these things should be taken into account at least:
  * all three main resources (CPU, RAM, network bw) should be considered
  * resource availability and utilization accross the cluster
  * level of competition of VMs over a resource

In my particular use case, these two approaches to the situation may be also valid:
  * ignore CPU load of the host completely because no other VMs want it
  * migrate CPU-hungry VM to host1 with lesser memory and migrate some 
    idle VMs from host1 to host2 till safe RAM utilization level on
    host1 is reached, stop the effort then


Additional info:
probably could be reproduced on homogenous 2-host cluster as well if total memory guaranteed to VMs exceeds RAM available on a single host.

Comment 1 Itamar Heim 2012-08-18 17:44:41 UTC

*** This bug has been marked as a duplicate of bug 516963 ***