1632055 – PowerSaving keeps VMs on over-utilized hosts while a host is empty and on.

Bug 1632055 - PowerSaving keeps VMs on over-utilized hosts while a host is empty and on.

Summary: PowerSaving keeps VMs on over-utilized hosts while a host is empty and on.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.2.6
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	ovirt-4.3.0
Target Release:	4.3.0
Assignee:	Andrej Krejcir
QA Contact:	Polina
Docs Contact:
URL:
Whiteboard:	verified_upstream
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-09-23 23:11 UTC by Germano Veit Michel
Modified:	2023-03-24 14:15 UTC (History)
CC List:	7 users (show)
Fixed In Version:	ovirt-engine-4.3.0_alpha
Doc Type:	Bug Fix
Doc Text:	This release updates the Red Hat Virtualization Manager power saving policy to allow VM migration from over-utilized hosts to under-utilized hosts to ensure proper balancing.
Clone Of:
Environment:
Last Closed:	2019-05-08 12:38:22 UTC
oVirt Team:	SLA
Target Upstream Version:
Embargoed:
Flags:	lsvaty: testing_plan_complete-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2019:1085	None	None	None	2019-05-08 12:38:39 UTC
oVirt gerrit	94527	master	MERGED	scheduling: PowerSaving balancing prioritizes over-utilized hosts	2021-02-15 18:01:37 UTC
oVirt gerrit	94544	master	MERGED	scheduling: Small cleanup of balancing units	2021-02-15 18:01:37 UTC

Description Germano Veit Michel 2018-09-23 23:11:47 UTC

Description of problem:

Once reaching the following scenario, balancing does not produce any results and the system does not scale up.

Host1:
 - Over-utilized by CPU
Host2:
 - Over-utilized by CPU
Host3:
 - Empty, powered on, 0% CPU usage.

The problem is Host3 is considered Under Utilized and therefore never considered for migrating VMs from Host1 or Host2.

Version-Release number of selected component (if applicable):
ovirt-engine-4.2.6.4-1.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1. Add 3 hosts, set power_saving policy as follows:
   - HighUtilization 75
   - lowUtilization 10
   - HostsInReserve 1
   - CpuOverCommitDurationMinutes 1
2. Load 2 hosts with > 75% of CPU
3. Observe no migrations happen, Host3 is always filtered
4. Drop lowUtilization to 0 and see the migrations happening

Actual results: VMs stuck on Host1 and Host2.

Expected results: VMs migrated to Host3.

Additional info:
In function getPrimaryDestinations() from PowerSavingBalancePolicyUnit.java, the list of hosts returned as candidates for migrating VMs is empty. This is because only Normally Utilized hosts are considered. An empty host, such as Host3 above has 0-1% of CPU usage, so it is filtered as underutilized and VMs are never migrated to it, even though the other 2 hosts reach 100% CPU usage and are providing poor SLA to the VMs.

final List<VDS> result = getNormallyUtilizedCPUHosts(cluster,         
         candidateHosts,
         highUtilization,
         cpuOverCommitDurationMinutes,
         lowUtilization);  <- works for scale down, but not up.
return result;

Comment 1 Andrej Krejcir 2018-10-15 09:06:25 UTC

What is the target milestone? Do we want to backport this to 4.2?

Comment 3 Polina 2018-10-18 13:43:03 UTC

Verification steps on ovirt-engine-4.3.0-0.0.master.20181016132820.gite60d148.el7.noarch

Steps:
pre-condition:
The environment where the scenario runs has 3 hosts and HE VM runs on host1. 

The CPU usage is: host1 - 16%, host2 - 6%, host3 - 3% 
1. Select power_saving policy in Cluster/Edit/Scheduling Policy/ and configure:
HighUtilization = 75
LowUtilization = 10
HostInReserve = 1
CpuOverCommitDurationMinutes 1
2. Run two VMs on host1, two VMs on host2.
3. Load CPU on host1 and host2 (98 %). Wait.

Result: all four VMs are migrated to the underutilized host3 (with no need to decrease the LowUtilization value to 0)

HE VM itself remains on host1 and doesn't migrate which is ok since the scheduler doesn't balance HE VM

Comment 5 RHV bug bot 2018-12-10 15:13:14 UTC

WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ]

For more info please contact: rhv-devops

Comment 6 RHV bug bot 2019-01-15 23:35:41 UTC

WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ]

For more info please contact: rhv-devops

Comment 8 errata-xmlrpc 2019-05-08 12:38:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1085

Comment 9 Daniel Gur 2019-08-28 13:12:54 UTC

sync2jira

Comment 10 Daniel Gur 2019-08-28 13:17:06 UTC

sync2jira

Note You need to log in before you can comment on or make changes to this bug.