Description of problem: I found out that it is possible to get to a state when it is not possible to fix affinity group violation. Take a look at the following code snippets: // Group all hosts for VMs with positive affinity for (Guid id : allVmIdsPositive) { VM runVm = runningVMsMap.get(id); if (runVm != null && runVm.getRunOnVds() != null) { acceptableHosts.add(runVm.getRunOnVds()); } } In the above snippet the allVmIdsPositive holds a list of VMs that are supposed to run on the same host (Positive affinity). The acceptableHosts set then ends up with all hosts that are used to run the VMs from the allVmIdsPositive list. The assumption here is that it should be either single host or empty set if no other VM from the Affinity group is running. The following snippet checks that: if (acceptableHosts.isEmpty()) { acceptableHosts.addAll(hostMap.keySet()); } else if (acceptableHosts.size() == 1 && hostMap.containsKey(acceptableHosts.iterator().next())) { hasPositiveConstraint = true; // Only one host is allowed for positive affinity, i.e. if the VM // contained in a positive affinity group he must run on the host // that all the other members are running, if the VMs spread across // hosts, the affinity rule isn't applied. } else { ... return null; } Now focus on the last else clause. If for any reason there are VMs that 1) belong to the same positive affinity group 2) run on different hosts then the filter returns null meaning no host can be used to run the currently scheduled VM. The same scheduling algorithm is used when the user starts a new VM, when the user tries to migrate a VM manually and when the load balancing job tries to rebalance the cluster. In all of those cases any VM belonging to the affinity group is prevented to run or migrate. Now, how can this happen? The user is free to change cluster policies and the Affinity Filter module can be disabled at first. Version-Release number of selected component (if applicable): ovirt-engine master as of 25th of Mar 2014, 16:13 CET Steps to Reproduce: 1. Disable affinity modules from cluster policy 2. Create at least 2 VMs 3. Add all VMs from step 2 to hard constraint positive affinity group 4. Start the VMs on different hosts 5. Enable the affinity modules in cluster policy 6. Try to fix the issue or watch the cluster as it tries to rebalance Actual results: The VMs are stuck on their hosts and no VM from the affinity group can be started. Expected results: The VMs are automatically rebalanced to run on a single host. Additional info: I believe the logic for hard constraint positive affinity should be changed to: 1) use any host if there is no VM from that group running (already there) 2) leave only hosts that already have VMs from that group running (instead of filtering out all hosts)
Moving to 3.4.1 since 3.4.0 has been released
This is an automated message. oVirt 3.4.1 has been released. This issue has been retargeted to 3.5.0 since it has not been marked as high priority or severity issue, please retarget if needed.
oVirt 3.5 has been released and should include the fix for this issue.