1572547 – ServerGroupAffinityFilter does not allow host-evacuate

Bug 1572547 - ServerGroupAffinityFilter does not allow host-evacuate

Summary: ServerGroupAffinityFilter does not allow host-evacuate

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	10.0 (Newton)
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	OSP DFG:Compute
QA Contact:	OSP DFG:Compute
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-27 09:58 UTC by broskos
Modified:	2023-03-21 18:48 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-03 17:23:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description broskos 2018-04-27 09:58:15 UTC

Description of problem:
When an affinity group is created and VMs are associated with this affinity group, they fail to evacuate during nova-evacuate.  The ServerGroupAffinityFilter fails to return the available hosts.


Version-Release number of selected component (if applicable):
RHOSP-10

How reproducible:
Always

Steps to Reproduce:
1. 3 compute nodes, each with 2 VM instance pairs (each pair in it's own Affinity) group
2. Scheduler has placed a VM pair on each of the 3 computes.
3. Stop nova-compute on one compute
4. Initiate nova-evacuate for one the compute that nova-compute is stopped.

Actual results:
The scheduler fails to place the VM pair on any host, failing for ServerGroupAffinityFilter

Expected results:
The instance pair should be migrated to one of the other available computes, with both instances on the same compute.


Additional info:
Log entry showing ServerGroupAffinityFilter not passing any hosts:
2018-04-26 13:22:43.037 601373 DEBUG nova.filters [req-cd3303ff-7b1a-4446-8ee4-2b91ae0e90a2 f45ce2faaa024212a13b84b1ecf088a4 5070892a5ef644d99e015e3c626a0084 - - -] Filtering removed all hosts for the request with instance ID 'e872b26c-5702-4dd5-9b24-da5be5835ce2'. Filter results: [('RetryFilter', [(u'overcloud-controller-0.localdomain', u'overcloud-controller-0.localdomain'), (u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('AvailabilityZoneFilter', [(u'overcloud-controller-0.localdomain', u'overcloud-controller-0.localdomain'), (u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('RamFilter', [(u'overcloud-controller-0.localdomain', u'overcloud-controller-0.localdomain'), (u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('DiskFilter', [(u'overcloud-controller-0.localdomain', u'overcloud-controller-0.localdomain'), (u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('ComputeFilter', [(u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('ComputeCapabilitiesFilter', [(u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('ImagePropertiesFilter', [(u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('ServerGroupAntiAffinityFilter', [(u'overcloud-compute-1.localdomain', u'overcloud-compute-1.localdomain'), (u'overcloud-compute-2.localdomain', u'overcloud-compute-2.localdomain')]), ('ServerGroupAffinityFilter', None)] get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:129
2018-04-26 13:22:43.037 601373 INFO nova.filters [req-cd3303ff-7b1a-4446-8ee4-2b91ae0e90a2 f45ce2faaa024212a13b84b1ecf088a4 5070892a5ef644d99e015e3c626a0084 - - -] Filtering removed all hosts for the request with instance ID 'e872b26c-5702-4dd5-9b24-da5be5835ce2'. Filter results: ['RetryFilter: (start: 3, end: 3)', 'AvailabilityZoneFilter: (start: 3, end: 3)', 'RamFilter: (start: 3, end: 3)', 'DiskFilter: (start: 3, end: 3)', 'ComputeFilter: (start: 3, end: 2)', 'ComputeCapabilitiesFilter: (start: 2, end: 2)', 'ImagePropertiesFilter: (start: 2, end: 2)', 'ServerGroupAntiAffinityFilter: (start: 2, end: 2)', 'ServerGroupAffinityFilter: (start: 2, end: 0)']

Comment 1 Dan Smith 2018-05-03 17:23:57 UTC

Nova does not (and can not) move two instances at the same time. It also will not move an instance in response to some action other than an instance move request (i.e. a change in the server group). Thus, there's really no way for nova today to do the thing you expect (moving both instances) today, and it is unlikely that this will change.

The only realistic potential for improvement here is to allow adding and removing instances from server groups which would let you break the affinity bond and move one of the instances alone. However, you'd have to have some way to move both of those to the same host before you could put them back in an affinity group. CRUD operations on server groups has been a point of contention upstream in the past and is also not likely to be implemented soon.

So, I'm going to close this as WONTFIX for the above reasons. If you want to actually request implementation of CRUD operations as a workaround, that should be a new RFE bug.

Comment 2 Artom Lifshitz 2018-05-04 15:02:00 UTC

As a workaround, as of microversion 2.29 (included in Newton/OSP10), it is possible for an admin to entirely avoid the scheduler during an evacuate operation by passing a host and the 'force' parameter [1]. Using this, an admin can manually chose a new host and evacuate all instances in the same affinity group to that host. This allows the admin to temporarily "break" affinity to evacuate instances.

[1] https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action

Note You need to log in before you can comment on or make changes to this bug.