Bug 1537343
| Summary: | engine tries to balance vms that are down. | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Germano Veit Michel <gveitmic> | |
| Component: | ovirt-engine | Assignee: | Andrej Krejcir <akrejcir> | |
| Status: | CLOSED ERRATA | QA Contact: | Polina <pagranat> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.1.6 | CC: | akrejcir, apinnick, bperkins, lsurette, lveyde, mavital, mgoldboi, rbalakri, Rhev-m-bugs, srevivo, ykaul | |
| Target Milestone: | ovirt-4.2.2 | Keywords: | ZStream | |
| Target Release: | --- | Flags: | lsvaty:
testing_plan_complete-
|
|
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | ovirt-engine-4.2.2 | Doc Type: | Bug Fix | |
| Doc Text: |
Previously, if a virtual machine with a strong positive affinity to a host was down, the affinity rules enforcer tried to migrate it, because it was not running on the specified host. When migration failed, the affinity rules enforcer tried repeatedly to migrate the same virtual machine, ignoring other virtual machines that violated affinity. In the current release, the affinity rules enforcer ignores virtual machines that are down.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1551582 (view as bug list) | Environment: | ||
| Last Closed: | 2018-05-15 17:47:24 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1551582 | |||
verified in rhvm-4.2.2-0.1.el7.noarch
created 3 affinity groups for 3 VMs (see xml below).
Tried scenarios:
1. create affinity group for powered off VM. check in engine.log that there are no attempts to migrate.
2. create affinity group for running VM. Then Power Off and check engine.log for no errors.
<affinity_group>
<name>af_group3</name>
<hosts_rule>
<enabled>true</enabled>
<enforcing>true</enforcing>
<positive>true</positive>
</hosts_rule>
<vms_rule>
<enabled>false</enabled>
<enforcing>true</enforcing>
<positive>false</positive>
</vms_rule>
<hosts>
<host href="/ovirt-engine/api/hosts/b97f27e5-1307-4dfa-a285-6fad766ebe82" id="b97f27e5-1307-4dfa-a285-6fad766ebe82"/>
</hosts>
<vms>
<vm href="/ovirt-engine/api/vms/8b60e9a2-c834-4f47-a30b-dbb2e6d8f07b" id="8b60e9a2-c834-4f47-a30b-dbb2e6d8f07b"/>
</vms>
</affinity_group>
<affinity_group>
<name>af_group1</name>
<hosts_rule>
<enabled>true</enabled>
<enforcing>false</enforcing>
<positive>true</positive>
</hosts_rule>
<positive>true</positive>
<vms_rule>
<enabled>true</enabled>
<enforcing>false</enforcing>
<positive>true</positive>
</vms_rule>
<hosts>
<host href="/ovirt-engine/api/hosts/9a50c448-61a1-4085-bfd1-62a6ee0b5525" id="9a50c448-61a1-4085-bfd1-62a6ee0b5525"/>
</hosts>
<vms>
<vm href="/ovirt-engine/api/vms/0cded25e-63ef-43ed-996c-9cfc1934d37a" id="0cded25e-63ef-43ed-996c-9cfc1934d37a"/>
</vms>
</affinity_group>
<affinity_group>
<name>af_group2</name>
<hosts_rule>
<enabled>true</enabled>
<enforcing>false</enforcing>
<positive>true</positive>
</hosts_rule>
<positive>true</positive>
<vms_rule>
<enabled>true</enabled>
<enforcing>false</enforcing>
<positive>true</positive>
</vms_rule>
<hosts>
<host href="/ovirt-engine/api/hosts/9a50c448-61a1-4085-bfd1-62a6ee0b5525" id="9a50c448-61a1-4085-bfd1-62a6ee0b5525"/>
</hosts>
<vms>
<vm href="/ovirt-engine/api/vms/206a53a3-d04a-4e0c-83c5-f76b9757af40" id="206a53a3-d04a-4e0c-83c5-f76b9757af40"/>
</vms>
</affinity_group>
INFO: Bug status (VERIFIED) wasn't changed but the folowing should be fixed: [Tag 'ovirt-engine-4.2.2.4' doesn't contain patch 'https://gerrit.ovirt.org/87320'] gitweb: https://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=shortlog;h=refs/tags/ovirt-engine-4.2.2.4 For more info please contact: rhv-devops Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1488 BZ<2>Jira Resync sync2jira sync2jira |
Description of problem: Logs spammed with: 2018-01-23 00:48:05,541Z WARN [org.ovirt.engine.core.bll.BalanceVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-45) [ec22cb2] Validation of action 'BalanceVm' failed for user SYSTEM. Reasons: VAR__ACTION__MIGRATE,VAR__TYPE__VM,ACTION_TYPE_FAILED_VM_IS_NOT_RUNNING And tasks are full of failed BalanceVm. Version-Release number of selected component (if applicable): rhevm-4.1.6.2-0.1.el7.noarch rhvm-4.2.0-0.6.el7.noarch (reproduced) How reproducible: 100% Steps to Reproduce: 1. Configure positive host affinity for 1 VM that is down 2. Check engine.log and tasks tab. Actual results: Flooded with errors Expected results: Don't balance (migrate) VMs that are down. Additional info: Inpecting the code, it looks like in function getVmToHostsAffinityGroupCandidates, especifically here: } else if (!affHosts.contains(vm.getRunOnVds()) && g.isVdsPositive()) { // Positive affinity violated vmToHostsAffinityMap.put(vm_id, 1 + vmToHostsAffinityMap.getOrDefault(vm_id, 0)); } affHosts is a Set and when vm is down getRunOnVds returns null. affHosts does not contain a null element so affinity is falsely violated? Maybe down vms need to be filtered to start with?