1430296 – [downstream clone - 4.0.7] With vm_evenly_distributed cluster scheduling policy, all VMs are migrated to a single host when a host is placed in maintenance mode

Bug 1430296 - [downstream clone - 4.0.7] With vm_evenly_distributed cluster scheduling policy, all VMs are migrated to a single host when a host is placed in maintenance mode

Summary: [downstream clone - 4.0.7] With vm_evenly_distributed cluster scheduling poli...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.0.4
Hardware:	Unspecified
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	ovirt-4.0.7
Target Release:	---
Assignee:	Martin Sivák
QA Contact:	Artyom
Docs Contact:
URL:
Whiteboard:
Depends On:	1411460
Blocks:
TreeView+	depends on / blocked

Reported:	2017-03-08 10:14 UTC by rhev-integ
Modified:	2020-06-11 13:27 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, the 'VM evenly distributed' policy was not properly taking the pending virtual machines (scheduled, but not yet started) into account. Each scheduling run saw the same situation and selected the same host for the virtual machine it was scheduling. Now, the policy also counts pending virtual machines, so proper balancing is applied.
Clone Of:	1411460
Environment:
Last Closed:	2017-03-16 15:35:12 UTC
oVirt Team:	SLA
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:0542	normal	SHIPPED_LIVE	Red Hat Virtualization Manager 4.0.7	2017-03-16 19:25:04 UTC
oVirt gerrit	69911	None	None	None	2017-03-08 10:15:20 UTC
oVirt gerrit	69922	None	None	None	2017-03-08 11:42:20 UTC

Description rhev-integ 2017-03-08 10:14:43 UTC

+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1411460 +++
======================================================================

Description of problem:

Using the 'vm_evenly_distributed' cluster scheduling policy, when a host is placed in maintenance mode, all VMs are migrated to a single host, followed by them getting redistributed to balance the cluster.

If no policy is used, the migration distribution appears to be evenly distributed.

So, the question is, with a 'vm_evenly_distributed' policy, is this the correct behaviour ?


Version-Release number of selected component (if applicable):

- RHV 4.0.4
- RHEL 7.3 hosts
 

How reproducible:

100% in my testing.


Steps to Reproduce:

1. Configure the 'vm_evenly_distributed' cluster policy with the following;

HighVmCount=4
SpmVmGrace=5
MigrationThreshold=2

2. Three hosts, one with 'N' VMs (I had 14), the other two with none.
3. Place the host with 13 VMs into maintenance mode.
4. Observe the initial distribution of VMs (as after they've all been migrated, some will get migrated again to balance the load). In my case, all 14 went to one host.

AND;

1. Configure the 'vm_evenly_distributed' cluster policy with the following;

HighVmCount=4
SpmVmGrace=5
MigrationThreshold=2

2. Three hosts, two with 5 VMs one with 4.
3. Place the host with 4 VMs into maintenance mode.
4. Observe the initial distribution of VMs (as after they've all been migrated, some will get migrated again to balance the load). In my case, all 4 went to one host.


Actual results:

All VMs are initially migrated to one host.


Expected results:

One might expect that with a 'vm_evenly_distributed' policy, that the VMs would be evenly distributed in this scenario.

The policy does kick in after the initial migrations and redistributes the VMs on an evenly distributed basis. However, this results in additional work and overhead.


Additional info:

(Originally by Gordon Watson)

Comment 3 rhev-integ 2017-03-08 10:14:55 UTC

Thanks for the information Gordon, we indeed have a bug there. The balancing rule works fine except for one small issue... we only count running VMs on destination hosts, but most of the VMs are only starting their migrations there and are missing from the computation.

We have a mechanism to track and count what we call pending VMs, we just somehow forgot to use it here.

(Originally by Martin Sivak)

Comment 6 rhev-integ 2017-03-08 10:15:10 UTC

Verified on rhevm-4.1.0.2-0.2.el7.noarch

Environment has 3 hosts(host_1, host_2, host_3), host_1 is SPM
1) Set scheduling policy to 'vm_evenly_distributed' with parameters
HighVmCount=4
SpmVmGrace=5
MigrationThreshold=2
2) Start 17 VM's
3) host_1 has 2 VM's
   host_2 has 8 VM's
   host_3 has 7 VM's
4) put host_2 to maintenance
5) host_1 has 6 VM's
   host_3 has 11 VM's

(Originally by Artyom Lukianov)

Comment 8 Artyom 2017-03-09 09:42:46 UTC

rhevm-4.0.7.1-0.1.el7ev.noarch

Environment has 3 hosts(host_1, host_2, host_3), host_2 is SPM
1) Set scheduling policy to 'vm_evenly_distributed' with parameters
HighVmCount=4
SpmVmGrace=5
MigrationThreshold=2
2) Start 17 VM's
3) host_1 has 10 VM's
   host_2 has 0 VM's
   host_3 has 5 VM's
4) put host_1 to maintenance
5) host_2 has 6 VM's
   host_3 has 11 VM's

Comment 10 errata-xmlrpc 2017-03-16 15:35:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0542.html

Note You need to log in before you can comment on or make changes to this bug.