Bug 1551582 - [downstream clone - 4.1.10] engine tries to balance vms that are down.
Summary: [downstream clone - 4.1.10] engine tries to balance vms that are down.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.1.6
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ovirt-4.1.10
: ---
Assignee: Andrej Krejcir
QA Contact: Polina
URL:
Whiteboard:
Depends On: 1537343
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-05 12:53 UTC by RHV bug bot
Modified: 2021-06-10 15:17 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, if a virtual machine with a positive affinity to a host was down, the affinity rules enforcer tried to migrate it, because it was not running on the specified host. When migration failed, the affinity rules enforcer repeated tried to migrate the same virtual machine, ignoring other virtual machines that violated affinity rules. In this release, the affinity rules enforcer ignores virtual machines that are down.
Clone Of: 1537343
Environment:
Last Closed: 2018-03-20 16:37:08 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3329091 0 None None None 2018-03-05 12:53:42 UTC
Red Hat Product Errata RHBA-2018:0562 0 None None None 2018-03-20 16:37:55 UTC
oVirt gerrit 87259 0 master MERGED core: Check affinity rules only for running VMs 2018-03-05 12:53:42 UTC
oVirt gerrit 87320 0 ovirt-engine-4.2 MERGED core: Check affinity rules only for running VMs 2018-03-05 12:53:42 UTC
oVirt gerrit 88478 0 ovirt-engine-4.1 MERGED core: Check affinity rules only for running VMs 2018-03-05 16:16:22 UTC

Description RHV bug bot 2018-03-05 12:53:06 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1537343 +++
======================================================================

Description of problem:

Logs spammed with:

2018-01-23 00:48:05,541Z WARN  [org.ovirt.engine.core.bll.BalanceVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-45) [ec22cb2] Validation of action 'BalanceVm' failed for user SYSTEM. Reasons: VAR__ACTION__MIGRATE,VAR__TYPE__VM,ACTION_TYPE_FAILED_VM_IS_NOT_RUNNING  

And tasks are full of failed BalanceVm.

Version-Release number of selected component (if applicable):
rhevm-4.1.6.2-0.1.el7.noarch
rhvm-4.2.0-0.6.el7.noarch (reproduced)

How reproducible:
100%

Steps to Reproduce:
1. Configure positive host affinity for 1 VM that is down
2. Check engine.log and tasks tab.

Actual results:
Flooded with errors

Expected results:
Don't balance (migrate) VMs that are down.

Additional info:

Inpecting the code, it looks like in function getVmToHostsAffinityGroupCandidates, especifically here:

         } else if (!affHosts.contains(vm.getRunOnVds()) && g.isVdsPositive()) {                                                                                                                                                        
             // Positive affinity violated                                                                                                                                                                                              
             vmToHostsAffinityMap.put(vm_id,                                                                                                                                                                                            
                     1 + vmToHostsAffinityMap.getOrDefault(vm_id, 0));                                                                                                                                                                  
         }  

affHosts is a Set and when vm is down getRunOnVds returns null. 
affHosts does not contain a null element so affinity is falsely violated? 
Maybe down vms need to be filtered to start with?

(Originally by Germano Veit Michel)

Comment 5 RHV bug bot 2018-03-05 12:53:27 UTC
verified in rhvm-4.2.2-0.1.el7.noarch

created 3 affinity groups for 3 VMs (see xml below).

Tried scenarios:
1. create affinity group for powered off VM. check in engine.log that there are no attempts to migrate. 
2. create affinity group for running VM. Then Power Off and check engine.log for no errors.

        <affinity_group>
        <name>af_group3</name>
        <hosts_rule>
            <enabled>true</enabled>
            <enforcing>true</enforcing>
            <positive>true</positive>
        </hosts_rule>
        <vms_rule>
            <enabled>false</enabled>
            <enforcing>true</enforcing>
            <positive>false</positive>
        </vms_rule>
        <hosts>
            <host href="/ovirt-engine/api/hosts/b97f27e5-1307-4dfa-a285-6fad766ebe82" id="b97f27e5-1307-4dfa-a285-6fad766ebe82"/>
        </hosts>
        <vms>
            <vm href="/ovirt-engine/api/vms/8b60e9a2-c834-4f47-a30b-dbb2e6d8f07b" id="8b60e9a2-c834-4f47-a30b-dbb2e6d8f07b"/>
        </vms>
        </affinity_group>
        
        <affinity_group>
        <name>af_group1</name>
        <hosts_rule>
            <enabled>true</enabled>
            <enforcing>false</enforcing>
            <positive>true</positive>
        </hosts_rule>
        <positive>true</positive>
        <vms_rule>
            <enabled>true</enabled>
            <enforcing>false</enforcing>
            <positive>true</positive>
        </vms_rule>
        <hosts>
            <host href="/ovirt-engine/api/hosts/9a50c448-61a1-4085-bfd1-62a6ee0b5525" id="9a50c448-61a1-4085-bfd1-62a6ee0b5525"/>
        </hosts>
        <vms>
            <vm href="/ovirt-engine/api/vms/0cded25e-63ef-43ed-996c-9cfc1934d37a" id="0cded25e-63ef-43ed-996c-9cfc1934d37a"/>
        </vms>
        </affinity_group>
        
        <affinity_group>
        <name>af_group2</name>
        <hosts_rule>
            <enabled>true</enabled>
            <enforcing>false</enforcing>
            <positive>true</positive>
        </hosts_rule>
        <positive>true</positive>
        <vms_rule>
            <enabled>true</enabled>
            <enforcing>false</enforcing>
            <positive>true</positive>
        </vms_rule>
        <hosts>
            <host href="/ovirt-engine/api/hosts/9a50c448-61a1-4085-bfd1-62a6ee0b5525" id="9a50c448-61a1-4085-bfd1-62a6ee0b5525"/>
        </hosts>
        <vms>
            <vm href="/ovirt-engine/api/vms/206a53a3-d04a-4e0c-83c5-f76b9757af40" id="206a53a3-d04a-4e0c-83c5-f76b9757af40"/>
        </vms>
        </affinity_group>

(Originally by Polina Agranat)

Comment 8 Polina 2018-03-14 09:12:22 UTC
verified on rhv-release-4.1.10-5-001.noarch.

Run the following scenarios. No errors in engine.log

Create affinity group with:
  - two VMs - Up and Down, the stop the running VM.
  - two VMs Down
the following groups:
  - VM affinity positive, Host affinity positive, Enforcing true/false.
  - VM affinity negative, Host affinity negative, Enforcing true/false.
  - VM affinity positive, Host affinity negative, Enforcing true/false.
  - VM affinity negative, Host affinity positive, Enforcing true/false.

Comment 11 errata-xmlrpc 2018-03-20 16:37:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0562

Comment 12 Franta Kust 2019-05-16 13:09:23 UTC
BZ<2>Jira Resync

Comment 13 Daniel Gur 2019-08-28 13:15:09 UTC
sync2jira

Comment 14 Daniel Gur 2019-08-28 13:20:11 UTC
sync2jira


Note You need to log in before you can comment on or make changes to this bug.