Bug 1080521

Summary: Violating hard constraint positive Affinity rule can prevent fixing the violated rule forever
Product: Red Hat Enterprise Virtualization Manager Reporter: Gilad Chaplik <gchaplik>
Component: ovirt-engineAssignee: Gilad Chaplik <gchaplik>
Status: CLOSED CURRENTRELEASE QA Contact: Artyom <alukiano>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: dfediuck, gchaplik, gklein, iheim, lpeer, mavital, msivak, rbalakri, Rhev-m-bugs, sbonazzo, sherold, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: sla
Fixed In Version: ovirt-3.5.0-beta2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1080515
: 1122446 (view as bug list) Environment:
Last Closed: 2015-02-17 17:07:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1080515    
Bug Blocks: 1122446, 1142923, 1156165    

Description Gilad Chaplik 2014-03-25 15:29:29 UTC
+++ This bug was initially created as a clone of Bug #1080515 +++

Description of problem:

I found out that it is possible to get to a state when it is not possible to fix affinity group violation.

Take a look at the following code snippets:

// Group all hosts for VMs with positive affinity
for (Guid id : allVmIdsPositive) {
    VM runVm = runningVMsMap.get(id);
    if (runVm != null && runVm.getRunOnVds() != null) {
         acceptableHosts.add(runVm.getRunOnVds());
    }
}

In the above snippet the allVmIdsPositive holds a list of VMs that are supposed to run on the same host (Positive affinity).

The acceptableHosts set then ends up with all hosts that are used to run the VMs from the allVmIdsPositive list. The assumption here is that it should be either single host or empty set if no other VM from the Affinity group is running.

The following snippet checks that:

if (acceptableHosts.isEmpty()) {
    acceptableHosts.addAll(hostMap.keySet());
} else if (acceptableHosts.size() == 1 &&  
           hostMap.containsKey(acceptableHosts.iterator().next())) {
    hasPositiveConstraint = true;
    // Only one host is allowed for positive affinity, i.e. if the VM
    // contained in a positive affinity group he must run on the host
    // that all the other members are running, if the VMs spread across
    // hosts, the affinity rule isn't applied.
} else {
    ...
    return null;
}

Now focus on the last else clause. If for any reason there are VMs that

1) belong to the same positive affinity group
2) run on different hosts

then the filter returns null meaning no host can be used to run the currently scheduled VM.

The same scheduling algorithm is used when the user starts a new VM, when the user tries to migrate a VM manually and when the load balancing job tries to rebalance the cluster. In all of those cases any VM belonging to the affinity group is prevented to run or migrate.

Now, how can this happen?

The user is free to change cluster policies and the Affinity Filter module can be disabled at first.

Version-Release number of selected component (if applicable):

ovirt-engine master as of 25th of Mar 2014, 16:13 CET

Steps to Reproduce:
1. Disable affinity modules from cluster policy
2. Create at least 2 VMs
3. Add all VMs from step 2 to hard constraint positive affinity group
4. Start the VMs on different hosts
5. Enable the affinity modules in cluster policy
6. Try to fix the issue or watch the cluster as it tries to rebalance

Actual results:

The VMs are stuck on their hosts and no VM from the affinity group can be started.

Expected results:

The VMs are automatically rebalanced to run on a single host.

Additional info:

I believe the logic for hard constraint positive affinity should be changed to:

1) use any host if there is no VM from that group running (already there)
2) leave only hosts that already have VMs from that group running (instead of filtering out all hosts)

Comment 2 Artyom 2014-07-27 08:50:11 UTC
Verified on ovirt-engine-3.5.0-0.0.master.20140722232058.git8e1babc.el6.noarch
If I have two vms in hard positive affinity group on different hosts(started in without affinity policy) and start third vm(in same affinity group), vm randomly check host from two hosts, where runs vms.
If after it I manually migrate vms on one host, engine start other vms from affinity group also on this host

Comment 3 Eyal Edri 2015-02-17 17:07:14 UTC
rhev 3.5.0 was released. closing.