Bug 1535175
| Summary: | positive and negative affinity-groups for splitting hosts into two groups could force a migration loop of assigned VMs | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Steffen Froemer <sfroemer> | ||||||
| Component: | ovirt-engine | Assignee: | Andrej Krejcir <akrejcir> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Polina <pagranat> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 4.1.6 | CC: | alukiano, lsurette, mavital, mgoldboi, rbalakri, Rhev-m-bugs, srevivo, ykaul | ||||||
| Target Milestone: | ovirt-4.2.4 | Flags: | lsvaty:
testing_plan_complete-
|
||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||||
| Doc Text: |
undefined
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-05-15 17:47:24 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Steffen Froemer
2018-01-16 18:52:41 UTC
Yes, this is theoretically possible. But soft affinity has a very high priority (99x higher than most of the rules) and it should make a second non-complying host a very unattractive destination. We will check the affinity enforcement logic there to make sure. Based on what I understand, if a migration is started, based on a affinity-rule, the only possible migration-targets should be these, based on information of the affinity-group ruleset. If these target-hosts are not suiteable to whatever reason, the migration/balancing action should be aborted. There is no exception in terms of soft- or hard-affinity groups. Reproducible on rhvm-4.2.1.4-0.1.el7.noarch Environment with 3 hosts(host_1, host_2, host_3) 1) Create new host to VM soft positive affinity group 2) Add vm_1 and host_1 to the affinity group 3) Start the VM 4) Create CPU load on the VM 5) Put host_1 to maintenance Affinity Rule enforcement manager starts to migrate the VM from host_2 to host_3 and back. You can start to look in the log from the line 2018-01-30 17:53:39,278+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-16) [4ea1366a] EVENT_ID: VM_MIGRATION_START_SYSTEM_INITIATED(67), Migration initiated by system (VM: golden_env_mixed_virtio_0, Source: host_mixed_1, Destination: host_mixed_3, Reason: Host preparing for maintenance). Created attachment 1388513 [details]
engine log
We should probably fix this by ignoring the cpu load of the migrated VM when computing the source load and introducing a new unit that will add a small penalty for needed migration. That should create a hysteresis window and prefer a solution where migration is not necessary. the bug tested on rhv-release-4.2.3-2-001.noarch and still happens.
attached logs (engine, vdsm - host1,2,3) and image of Events and VM after the host 1 is put to maintenance.
steps for verification:
environment with three hosts - [host_mixed_1, host_mixed_2, host_mixed_3]
1. create on cluster affinity group (add VM and host_mixed_1):
<name>group1</name>
<hosts_rule>
<enabled>true</enabled>
<enforcing>false</enforcing>
<positive>true</positive>
</hosts_rule>
<positive>true</positive>
<vms_rule>
<enabled>true</enabled>
<enforcing>false</enforcing>
<positive>true</positive>
</vms_rule>
2. Run VM on host_mixed_1.
3. Create CPU load on VM with dd command (dd if=/dev/zero of=/dev/null).
4. Put host_mixed_1 to maintenance.
Result: the VM is moved to the host_mixed_2 , then starts circulating between host_mixed_2 and host_mixed_3.
Created attachment 1427152 [details]
logs
The bug is solved in rhv-release-4.2.3-4-001.noarch. The verification steps in 1535175#c10 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1488 BZ<2>Jira Resync sync2jira sync2jira |