Bug 1653752

Summary: Scheduler computes incorrect CPU load of the host where the scheduled VM is running
Product: [oVirt] ovirt-engine Reporter: Andrej Krejcir <akrejcir>
Component: Backend.CoreAssignee: Andrej Krejcir <akrejcir>
Status: CLOSED CURRENTRELEASE QA Contact: Liran Rotenberg <lrotenbe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2.7CC: bugs, lrotenbe, rbarry
Target Milestone: ovirt-4.3.2Keywords: ZStream
Target Release: 4.3.2.1Flags: rule-engine: ovirt-4.3+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.3.2.1 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1658653 (view as bug list) Environment:
Last Closed: 2019-03-19 10:05:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1583009, 1658653    

Description Andrej Krejcir 2018-11-27 15:08:30 UTC
Description of problem:
When migrating a VM, the scheduler subtracts the CPU load of the VM from the host where it is currently running. So that the host can appear as if the VM is not running there.

This calculation is incorrect when the cluster does not have option 'count threads as cores' set.

Version-Release number of selected component (if applicable):
4.3 and 4.2

Steps to Reproduce:
1. Unset the 'count threads as cores' option for the cluster.
2. Have 2 hosts (host1, host2) with 2 threads per CPU core.
3. Run 2 VMs (VM1, VM2) on host1. With the same number of cores as the host, but 1 thread per core.
4. Add VM1 and host1 to an affinity group, positive, enforcing.
5. Create 100% CPU load on VM2. (for example by running: python -c 'while True: pass')
6. Enable EvenDistribution balancing on the cluster. Set HighUtilization to 50.
7. Wait to see if the VM2 will be migrated to host2 by the balancing.

Actual results:
VM2 is not migrated. 
Scheduler incorrectly computes the CPU load of host1, so the host is considered the best candidate for migration and the VM does not migrate.

Expected results:
VM2 migrates to host2.

Comment 2 Liran Rotenberg 2019-01-09 11:58:12 UTC
Verification failed on:
ovirt-engine-4.3.0-0.4.master.20190103151009.git5251adc.el7.noarch

Steps:
1. Unset the 'count threads as cores' option for the cluster.
2. Have 2 hosts (host1, host2) with 2 threads per CPU core.
3. Run 2 VMs (VM1, VM2) on host1. With the same number of cores as the host, but 1 thread per core.
4. Add VM1 and host1 to an affinity group, positive, enforcing.
5. Create 100% CPU load on VM2. (for example by running: python -c 'while True: pass')
6. Enable EvenDistribution balancing on the cluster. Set HighUtilization to 50.
7. Wait to see if the VM2 will be migrated to host2 by the balancing.

Actual results:
VM2 is not migrated.

Expected results:
VM2 migrates to host2.

Additional info:
I used HE environment. At start point the 3 VMs were on Host1.
The HE VM have 4vcpu, as cores.
Each host have 24 cores and 2 threads per core.

The scheduler migrated the HE VM first. The load on the host stayed at 50%.
Nothing continued afterwards(just a loop).
In this state, setting the "count threads as cores" immediately caused VM2 to migrate.

Comment 3 Ryan Barry 2019-01-21 14:53:49 UTC
Re-targeting to 4.3.1 since it is missing a patch, an acked blocker flag, or both

Comment 4 Liran Rotenberg 2019-03-06 09:10:53 UTC
Verification failed on:
ovirt-engine-4.3.2-0.1.el7.noarch

Steps:
1. Unset the 'count threads as cores' option for the cluster.
2. Have 2 hosts (host1, host2) with 2 threads per CPU core.
3. Run 2 VMs (VM1, VM2) on host1. With the same number of cores as the host, but 1 thread per core.
4. Add VM1 and host1 to an affinity group, positive, enforcing.
5. Create 100% CPU load on VM2. (for example by running: python -c 'while True: pass')
6. Enable EvenDistribution balancing on the cluster. Set HighUtilization to 50.
7. Wait to see if the VM2 will be migrated to host2 by the balancing.

Actual results:
VM2 is not migrated.

Expected results:
VM2 migrates to host2.

Additional info:
I used HE environment. At start point the 3 VMs were on Host1.
The HE VM have 4vcpu, as cores.
Each host have 24 cores and 2 threads per core.

The scheduler migrated the HE VM first. The load on the host stayed at 50%.
Nothing continued afterwards(just a loop).

Afterwards I tried to change VM2 (without affinity to host) load to 80% and loading VM1 to 25% load (total on host > 50%)
Same result as above, no migration happened.
Then, I set the 'count threads as cores' and immediately VM2 migrated away.

Comment 5 Andrej Krejcir 2019-03-06 09:36:40 UTC
The version ovirt-engine-4.3.2-0.1.el7.noarch does not yet contain the patch that fixes this bug.
It was released before the  patch was merged to master.

Please verify this on a newer version or on the latest master snapshot.

Comment 6 Liran Rotenberg 2019-03-07 08:10:08 UTC
Verified on:
ovirt-engine-4.3.2.1-0.0.master.20190305140204.git3649df7.el7.noarch

Steps:
1. Unset the 'count threads as cores' option for the cluster.
2. Have 2 hosts (host1, host2) with 2 threads per CPU core.
3. Run 2 VMs (VM1, VM2) on host1. With the same number of cores as the host, but 1 thread per core.
4. Add VM1 and host1 to an affinity group, positive, enforcing.
5. Create 100% CPU load on VM2. (for example by running: python -c 'while True: pass')
6. Enable EvenDistribution balancing on the cluster. Set HighUtilization to 50.
7. Wait to see if the VM2 will be migrated to host2 by the balancing.

Actual results:
HE VM migrated first-from host1 to host2 (least load VM), afterwards VM2 migrated to host2.

Comment 7 Sandro Bonazzola 2019-03-19 10:05:23 UTC
This bugzilla is included in oVirt 4.3.2 release, published on March 19th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.