Bug 1632055

Summary: PowerSaving keeps VMs on over-utilized hosts while a host is empty and on.
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Andrej Krejcir <akrejcir>
Status: CLOSED ERRATA QA Contact: Polina <pagranat>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.6CC: akrejcir, michal.skrivanek, mtessun, pagranat, rbarry, Rhev-m-bugs, sborella
Target Milestone: ovirt-4.3.0Flags: lsvaty: testing_plan_complete-
Target Release: 4.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: verified_upstream
Fixed In Version: ovirt-engine-4.3.0_alpha Doc Type: Bug Fix
Doc Text:
This release updates the Red Hat Virtualization Manager power saving policy to allow VM migration from over-utilized hosts to under-utilized hosts to ensure proper balancing.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-08 12:38:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Germano Veit Michel 2018-09-23 23:11:47 UTC
Description of problem:

Once reaching the following scenario, balancing does not produce any results and the system does not scale up.

Host1:
 - Over-utilized by CPU
Host2:
 - Over-utilized by CPU
Host3:
 - Empty, powered on, 0% CPU usage.

The problem is Host3 is considered Under Utilized and therefore never considered for migrating VMs from Host1 or Host2.

Version-Release number of selected component (if applicable):
ovirt-engine-4.2.6.4-1.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1. Add 3 hosts, set power_saving policy as follows:
   - HighUtilization 75
   - lowUtilization 10
   - HostsInReserve 1
   - CpuOverCommitDurationMinutes 1
2. Load 2 hosts with > 75% of CPU
3. Observe no migrations happen, Host3 is always filtered
4. Drop lowUtilization to 0 and see the migrations happening

Actual results: VMs stuck on Host1 and Host2.

Expected results: VMs migrated to Host3.

Additional info:
In function getPrimaryDestinations() from PowerSavingBalancePolicyUnit.java, the list of hosts returned as candidates for migrating VMs is empty. This is because only Normally Utilized hosts are considered. An empty host, such as Host3 above has 0-1% of CPU usage, so it is filtered as underutilized and VMs are never migrated to it, even though the other 2 hosts reach 100% CPU usage and are providing poor SLA to the VMs.

final List<VDS> result = getNormallyUtilizedCPUHosts(cluster,         
         candidateHosts,
         highUtilization,
         cpuOverCommitDurationMinutes,
         lowUtilization);  <- works for scale down, but not up.
return result;

Comment 1 Andrej Krejcir 2018-10-15 09:06:25 UTC
What is the target milestone? Do we want to backport this to 4.2?

Comment 3 Polina 2018-10-18 13:43:03 UTC
Verification steps on ovirt-engine-4.3.0-0.0.master.20181016132820.gite60d148.el7.noarch

Steps:
pre-condition:
The environment where the scenario runs has 3 hosts and HE VM runs on host1. 

The CPU usage is: host1 - 16%, host2 - 6%, host3 - 3% 
1. Select power_saving policy in Cluster/Edit/Scheduling Policy/ and configure:
HighUtilization = 75
LowUtilization = 10
HostInReserve = 1
CpuOverCommitDurationMinutes 1
2. Run two VMs on host1, two VMs on host2.
3. Load CPU on host1 and host2 (98 %). Wait.

Result: all four VMs are migrated to the underutilized host3 (with no need to decrease the LowUtilization value to 0)

HE VM itself remains on host1 and doesn't migrate which is ok since the scheduler doesn't balance HE VM

Comment 5 RHV bug bot 2018-12-10 15:13:14 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ]

For more info please contact: rhv-devops

Comment 6 RHV bug bot 2019-01-15 23:35:41 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ]

For more info please contact: rhv-devops

Comment 8 errata-xmlrpc 2019-05-08 12:38:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1085

Comment 9 Daniel Gur 2019-08-28 13:12:54 UTC
sync2jira

Comment 10 Daniel Gur 2019-08-28 13:17:06 UTC
sync2jira