Bug 1572547

Summary: ServerGroupAffinityFilter does not allow host-evacuate
Product: Red Hat OpenStack Reporter: broskos
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED WONTFIX QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: berrange, dasmith, eglynn, jhakimra, kchamart, sbauza, sferdjao, sgordon, srevivo, vromanso
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-03 17:23:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description broskos 2018-04-27 09:58:15 UTC
Description of problem:
When an affinity group is created and VMs are associated with this affinity group, they fail to evacuate during nova-evacuate.  The ServerGroupAffinityFilter fails to return the available hosts.


Version-Release number of selected component (if applicable):
RHOSP-10

How reproducible:
Always

Steps to Reproduce:
1. 3 compute nodes, each with 2 VM instance pairs (each pair in it's own Affinity) group
2. Scheduler has placed a VM pair on each of the 3 computes.
3. Stop nova-compute on one compute
4. Initiate nova-evacuate for one the compute that nova-compute is stopped.

Actual results:
The scheduler fails to place the VM pair on any host, failing for ServerGroupAffinityFilter

Expected results:
The instance pair should be migrated to one of the other available computes, with both instances on the same compute.


Additional info:
Log entry showing ServerGroupAffinityFilter not passing any hosts:
2018-04-26 13:22:43.037 601373 DEBUG nova.filters [req-cd3303ff-7b1a-4446-8ee4-2b91ae0e90a2 f45ce2faaa024212a13b84b1ecf088a4 5070892a5ef644d99e015e3c626a0084 - - -] Filtering removed all hosts for the request with instance ID 'e872b26c-5702-4dd5-9b24-da5be5835ce2'. Filter results: [('RetryFilter', [(u'overcloud-controller-0.localdomain', u'overcloud-controller-0.localdomain'), (u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('AvailabilityZoneFilter', [(u'overcloud-controller-0.localdomain', u'overcloud-controller-0.localdomain'), (u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('RamFilter', [(u'overcloud-controller-0.localdomain', u'overcloud-controller-0.localdomain'), (u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('DiskFilter', [(u'overcloud-controller-0.localdomain', u'overcloud-controller-0.localdomain'), (u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('ComputeFilter', [(u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('ComputeCapabilitiesFilter', [(u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('ImagePropertiesFilter', [(u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('ServerGroupAntiAffinityFilter', [(u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('ServerGroupAffinityFilter', None)] get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:129
2018-04-26 13:22:43.037 601373 INFO nova.filters [req-cd3303ff-7b1a-4446-8ee4-2b91ae0e90a2 f45ce2faaa024212a13b84b1ecf088a4 5070892a5ef644d99e015e3c626a0084 - - -] Filtering removed all hosts for the request with instance ID 'e872b26c-5702-4dd5-9b24-da5be5835ce2'. Filter results: ['RetryFilter: (start: 3, end: 3)', 'AvailabilityZoneFilter: (start: 3, end: 3)', 'RamFilter: (start: 3, end: 3)', 'DiskFilter: (start: 3, end: 3)', 'ComputeFilter: (start: 3, end: 2)', 'ComputeCapabilitiesFilter: (start: 2, end: 2)', 'ImagePropertiesFilter: (start: 2, end: 2)', 'ServerGroupAntiAffinityFilter: (start: 2, end: 2)', 'ServerGroupAffinityFilter: (start: 2, end: 0)']

Comment 1 Dan Smith 2018-05-03 17:23:57 UTC
Nova does not (and can not) move two instances at the same time. It also will not move an instance in response to some action other than an instance move request (i.e. a change in the server group). Thus, there's really no way for nova today to do the thing you expect (moving both instances) today, and it is unlikely that this will change.

The only realistic potential for improvement here is to allow adding and removing instances from server groups which would let you break the affinity bond and move one of the instances alone. However, you'd have to have some way to move both of those to the same host before you could put them back in an affinity group. CRUD operations on server groups has been a point of contention upstream in the past and is also not likely to be implemented soon.

So, I'm going to close this as WONTFIX for the above reasons. If you want to actually request implementation of CRUD operations as a workaround, that should be a new RFE bug.

Comment 2 Artom Lifshitz 2018-05-04 15:02:00 UTC
As a workaround, as of microversion 2.29 (included in Newton/OSP10), it is possible for an admin to entirely avoid the scheduler during an evacuate operation by passing a host and the 'force' parameter [1]. Using this, an admin can manually chose a new host and evacuate all instances in the same affinity group to that host. This allows the admin to temporarily "break" affinity to evacuate instances.

[1] https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action