Bug 1653752 - Scheduler computes incorrect CPU load of the host where the scheduled VM is running
Summary: Scheduler computes incorrect CPU load of the host where the scheduled VM is r...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 4.2.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.3.2
: 4.3.2.1
Assignee: Andrej Krejcir
QA Contact: Liran Rotenberg
URL:
Whiteboard:
Depends On:
Blocks: 1583009 1658653
TreeView+ depends on / blocked
 
Reported: 2018-11-27 15:08 UTC by Andrej Krejcir
Modified: 2019-04-18 09:16 UTC (History)
3 users (show)

Fixed In Version: ovirt-engine-4.3.2.1
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1658653 (view as bug list)
Environment:
Last Closed: 2019-03-19 10:05:23 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.3+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 96122 0 master MERGED scheduler: Fix subtracting CPU load from host where VM is running 2020-05-31 07:32:58 UTC
oVirt gerrit 96358 0 ovirt-engine-4.2 MERGED scheduler: Fix subtracting CPU load from host where VM is running 2020-05-31 07:32:57 UTC
oVirt gerrit 97674 0 master MERGED core: Balancing always counts threads as cores. 2020-05-31 07:32:58 UTC

Description Andrej Krejcir 2018-11-27 15:08:30 UTC
Description of problem:
When migrating a VM, the scheduler subtracts the CPU load of the VM from the host where it is currently running. So that the host can appear as if the VM is not running there.

This calculation is incorrect when the cluster does not have option 'count threads as cores' set.

Version-Release number of selected component (if applicable):
4.3 and 4.2

Steps to Reproduce:
1. Unset the 'count threads as cores' option for the cluster.
2. Have 2 hosts (host1, host2) with 2 threads per CPU core.
3. Run 2 VMs (VM1, VM2) on host1. With the same number of cores as the host, but 1 thread per core.
4. Add VM1 and host1 to an affinity group, positive, enforcing.
5. Create 100% CPU load on VM2. (for example by running: python -c 'while True: pass')
6. Enable EvenDistribution balancing on the cluster. Set HighUtilization to 50.
7. Wait to see if the VM2 will be migrated to host2 by the balancing.

Actual results:
VM2 is not migrated. 
Scheduler incorrectly computes the CPU load of host1, so the host is considered the best candidate for migration and the VM does not migrate.

Expected results:
VM2 migrates to host2.

Comment 2 Liran Rotenberg 2019-01-09 11:58:12 UTC
Verification failed on:
ovirt-engine-4.3.0-0.4.master.20190103151009.git5251adc.el7.noarch

Steps:
1. Unset the 'count threads as cores' option for the cluster.
2. Have 2 hosts (host1, host2) with 2 threads per CPU core.
3. Run 2 VMs (VM1, VM2) on host1. With the same number of cores as the host, but 1 thread per core.
4. Add VM1 and host1 to an affinity group, positive, enforcing.
5. Create 100% CPU load on VM2. (for example by running: python -c 'while True: pass')
6. Enable EvenDistribution balancing on the cluster. Set HighUtilization to 50.
7. Wait to see if the VM2 will be migrated to host2 by the balancing.

Actual results:
VM2 is not migrated.

Expected results:
VM2 migrates to host2.

Additional info:
I used HE environment. At start point the 3 VMs were on Host1.
The HE VM have 4vcpu, as cores.
Each host have 24 cores and 2 threads per core.

The scheduler migrated the HE VM first. The load on the host stayed at 50%.
Nothing continued afterwards(just a loop).
In this state, setting the "count threads as cores" immediately caused VM2 to migrate.

Comment 3 Ryan Barry 2019-01-21 14:53:49 UTC
Re-targeting to 4.3.1 since it is missing a patch, an acked blocker flag, or both

Comment 4 Liran Rotenberg 2019-03-06 09:10:53 UTC
Verification failed on:
ovirt-engine-4.3.2-0.1.el7.noarch

Steps:
1. Unset the 'count threads as cores' option for the cluster.
2. Have 2 hosts (host1, host2) with 2 threads per CPU core.
3. Run 2 VMs (VM1, VM2) on host1. With the same number of cores as the host, but 1 thread per core.
4. Add VM1 and host1 to an affinity group, positive, enforcing.
5. Create 100% CPU load on VM2. (for example by running: python -c 'while True: pass')
6. Enable EvenDistribution balancing on the cluster. Set HighUtilization to 50.
7. Wait to see if the VM2 will be migrated to host2 by the balancing.

Actual results:
VM2 is not migrated.

Expected results:
VM2 migrates to host2.

Additional info:
I used HE environment. At start point the 3 VMs were on Host1.
The HE VM have 4vcpu, as cores.
Each host have 24 cores and 2 threads per core.

The scheduler migrated the HE VM first. The load on the host stayed at 50%.
Nothing continued afterwards(just a loop).

Afterwards I tried to change VM2 (without affinity to host) load to 80% and loading VM1 to 25% load (total on host > 50%)
Same result as above, no migration happened.
Then, I set the 'count threads as cores' and immediately VM2 migrated away.

Comment 5 Andrej Krejcir 2019-03-06 09:36:40 UTC
The version ovirt-engine-4.3.2-0.1.el7.noarch does not yet contain the patch that fixes this bug.
It was released before the  patch was merged to master.

Please verify this on a newer version or on the latest master snapshot.

Comment 6 Liran Rotenberg 2019-03-07 08:10:08 UTC
Verified on:
ovirt-engine-4.3.2.1-0.0.master.20190305140204.git3649df7.el7.noarch

Steps:
1. Unset the 'count threads as cores' option for the cluster.
2. Have 2 hosts (host1, host2) with 2 threads per CPU core.
3. Run 2 VMs (VM1, VM2) on host1. With the same number of cores as the host, but 1 thread per core.
4. Add VM1 and host1 to an affinity group, positive, enforcing.
5. Create 100% CPU load on VM2. (for example by running: python -c 'while True: pass')
6. Enable EvenDistribution balancing on the cluster. Set HighUtilization to 50.
7. Wait to see if the VM2 will be migrated to host2 by the balancing.

Actual results:
HE VM migrated first-from host1 to host2 (least load VM), afterwards VM2 migrated to host2.

Comment 7 Sandro Bonazzola 2019-03-19 10:05:23 UTC
This bugzilla is included in oVirt 4.3.2 release, published on March 19th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.