Description of problem: Once reaching the following scenario, balancing does not produce any results and the system does not scale up. Host1: - Over-utilized by CPU Host2: - Over-utilized by CPU Host3: - Empty, powered on, 0% CPU usage. The problem is Host3 is considered Under Utilized and therefore never considered for migrating VMs from Host1 or Host2. Version-Release number of selected component (if applicable): ovirt-engine-4.2.6.4-1.el7.noarch How reproducible: 100% Steps to Reproduce: 1. Add 3 hosts, set power_saving policy as follows: - HighUtilization 75 - lowUtilization 10 - HostsInReserve 1 - CpuOverCommitDurationMinutes 1 2. Load 2 hosts with > 75% of CPU 3. Observe no migrations happen, Host3 is always filtered 4. Drop lowUtilization to 0 and see the migrations happening Actual results: VMs stuck on Host1 and Host2. Expected results: VMs migrated to Host3. Additional info: In function getPrimaryDestinations() from PowerSavingBalancePolicyUnit.java, the list of hosts returned as candidates for migrating VMs is empty. This is because only Normally Utilized hosts are considered. An empty host, such as Host3 above has 0-1% of CPU usage, so it is filtered as underutilized and VMs are never migrated to it, even though the other 2 hosts reach 100% CPU usage and are providing poor SLA to the VMs. final List<VDS> result = getNormallyUtilizedCPUHosts(cluster, candidateHosts, highUtilization, cpuOverCommitDurationMinutes, lowUtilization); <- works for scale down, but not up. return result;
What is the target milestone? Do we want to backport this to 4.2?
Verification steps on ovirt-engine-4.3.0-0.0.master.20181016132820.gite60d148.el7.noarch Steps: pre-condition: The environment where the scenario runs has 3 hosts and HE VM runs on host1. The CPU usage is: host1 - 16%, host2 - 6%, host3 - 3% 1. Select power_saving policy in Cluster/Edit/Scheduling Policy/ and configure: HighUtilization = 75 LowUtilization = 10 HostInReserve = 1 CpuOverCommitDurationMinutes 1 2. Run two VMs on host1, two VMs on host2. 3. Load CPU on host1 and host2 (98 %). Wait. Result: all four VMs are migrated to the underutilized host3 (with no need to decrease the LowUtilization value to 0) HE VM itself remains on host1 and doesn't migrate which is ok since the scheduler doesn't balance HE VM
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ] For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ] For more info please contact: rhv-devops
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:1085
sync2jira