Bug 1152835 - [RFE][nova]: Honor anti-affinity policy on migration and evacuation
Summary: [RFE][nova]: Honor anti-affinity policy on migration and evacuation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
low
Target Milestone: Upstream M1
: 7.0 (Kilo)
Assignee: Sylvain Bauza
QA Contact: Sean Toner
URL: https://blueprints.launchpad.net/nova...
Whiteboard: upstream_milestone_kilo-1 upstream_de...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-15 04:02 UTC by RHOS Integration
Modified: 2020-03-09 09:43 UTC (History)
15 users (show)

Fixed In Version: openstack-nova-2015.1.0-16.el7ost
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-05 13:14:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
steps showing what was tested (18.34 KB, text/plain)
2015-05-18 18:17 UTC, Sean Toner
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2015:1548 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Enhancement Advisory 2015-08-05 17:07:06 UTC

Description RHOS Integration 2014-10-15 04:02:23 UTC
Cloned from launchpad blueprint https://blueprints.launchpad.net/nova/+spec/anti-affinity-on-migration.

Description:

If you create a server group with an anti-affinity policy, it is only honored during the initial boot. If you do a cold migrate, live migrate, or evacuate where the scheduler is picking the destination, it seems reasonable to expect that the scheduler will continue to honor that policy. However, it does not.

The reason for this has to do with an implementation optimization in the scheduler. It skips all of the group checking if the 'group' hint is not present. Since scheduler hints are only kept around for the initial boot, this doesn't work. One solution would be to persist scheduler hints. However, a shorter term fix specifically for server groups is to always check the database for group membership when the server group filters are enabled.

Specification URL (additional information):

None

Comment 2 Sean Toner 2015-05-18 18:16:33 UTC
I have tried this on kilo, but it does not appear to be working.  I set up nova.conf to use the ServerGroupAffinityFilter and ServerGroupAntiAffinityFilter, restarted compute services on my 2 compute nodes, created an affinity and anti-affinity policy group, and booted one instance into each respective group.

However, when doing a live migration (without explicitly specifying a host), the migration was still successful.  I ran into a problem during juno testing with this, and the issue was that all the compute nodes needed to have the scheduler_default_filter set to use the right filter.  I ensured this was the same on each compute node, but the live migration is still successful despite the instance I was trying to migrate was set to use the anti-affinity server group.

Comment 3 Sean Toner 2015-05-18 18:17:28 UTC
Created attachment 1026801 [details]
steps showing what was tested

Comment 5 Sean Toner 2015-06-09 13:18:31 UTC
I have retested this and it appears to be working now.  I have however noticed that when running the nova live-migration anti-test (where anti-test is the instance booted to the anti-affinity group), there is no message of any kind indicating the migration failed.  And indeed it seems successful.

[root@rhel71-kilo-1 ~(keystone_admin)]# nova boot --flavor 1 --image cirros --nic net-id=7cf22ea3-6483-46b9-8b75-ce18b4026ea5 --hint "group=80445252-da4e-43e0-a669-0ff52d383e1c" anti-test

[root@rhel71-kilo-1 ~(keystone_admin)]# nova boot --flavor 1 --image cirros --nic net-id=7cf22ea3-6483-46b9-8b75-ce18b4026ea5 --hint "group=01e9aa8c-f501-411a-a615-07cf48a552b8" aff-test

[root@rhel71-kilo-1 ~(keystone_admin)]# nova server-group-list
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
| Id                                   | Name       | Policies           | Members                                   | Metadata |
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
| 80445252-da4e-43e0-a669-0ff52d383e1c | anti-group | [u'anti-affinity'] | [u'd23e5c77-cd0f-4571-a1af-2fec8ad36549'] | {}       |
| 01e9aa8c-f501-411a-a615-07cf48a552b8 | aff-group  | [u'affinity']      | [u'87fb27cf-ed9b-407f-af65-8c6e31326f42'] | {}       |
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova hypervisor-list
+----+---------------------------------------+-------+---------+
| ID | Hypervisor hostname                   | State | Status  |
+----+---------------------------------------+-------+---------+
| 1  | rhel71-kilo-1.lab.eng.rdu2.redhat.com | up    | enabled |
| 2  | rhel71-kilo-2.lab.eng.rdu2.redhat.com | up    | enabled |
+----+---------------------------------------+-------+---------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration anti-test
[root@rhel71-kilo-1 ~(keystone_admin)]# echo $?
0
[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration aff-test
ERROR (BadRequest): No valid host was found. There are not enough hosts available. (HTTP 400) (Request-ID: req-422fa442-812b-4e0b-86ef-7aa5a263d746)
[root@rhel71-kilo-1 ~(keystone_admin)]# echo $?
1
[root@rhel71-kilo-1 ~(keystone_admin)]# 


I do not think this is correct behavior despite the anti-affinity policy being honored.  The scheduler should be reporting in the anti-affinity case that no valid hosts were found just like in the affinity case.

Comment 6 Sean Toner 2015-06-09 13:28:08 UTC
ok, I take back what I said in the above comment.  I ran a couple more:

nova live-migrate anti-test

There appears to be a race condition where if you run:

nova show anti-test 

immediately after you run the live-migration command, it will look like the instance is still on the same host (and therefore failed).  However, if you wait a few seconds and run nova show anti-test again, it will show that the instance has indeed switched hosts.

So there's 2 problems:

1) performing live migration on an instance booted with a hint to an Affinity Group is still not being honored
2) there's a race where information shown from nova show is not synchronized with respect to the live migration (it takes at least a second or two after the live migration has completed for the host the instance is on to actually reflect where it really is)

I also have the filters set correctly on all my compute nodes:

[root@rhel71-kilo-1 ~(keystone_admin)]# grep -Hrni AffinityFilter /etc/nova/nova.conf 
/etc/nova/nova.conf:1621:#scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter
/etc/nova/nova.conf:1622:scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ImagePropertiesFilter,CoreFilter,NUMATopologyFilter,ServerGroupAffinityFilter,ServerGroupAntiAffinityFilter,AggregateInstanceExtraSpecsFilter


[root@rhel71-kilo-2 ~]# grep -Hrni AffinityFilter /etc/nova/nova.conf 
/etc/nova/nova.conf:1609:#scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter
/etc/nova/nova.conf:1967:scheduler_default_filters = RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,NUMATopologyFilter,AggregateInstanceExtraSpecsFilter

Comment 7 Sean Toner 2015-06-11 11:52:55 UTC
I think I see what was causing my failures (which is perhaps a different bug).  Somehow, from my main nova controller if I ran nova hypervisor-list, it reported that one of my two hypervisors was down.  

[root@rhel71-kilo-1 ~(keystone_admin)]# nova hypervisor-list
+----+---------------------------------------+-------+---------+
| ID | Hypervisor hostname                   | State | Status  |
+----+---------------------------------------+-------+---------+
| 1  | rhel71-kilo-1.lab.eng.rdu2.redhat.com | up    | enabled |
| 2  | rhel71-kilo-2.lab.eng.rdu2.redhat.com | down  | enabled |
+----+---------------------------------------+-------+---------+

However, running openstack-status or openstack-service status nova on the other compute node showed it was up and running.

[root@rhel71-kilo-2 ~]# openstack-status
== Nova services ==
openstack-nova-api:                     inactive  (disabled on boot)
openstack-nova-compute:                 active
openstack-nova-network:                 inactive  (disabled on boot)
openstack-nova-scheduler:               inactive  (disabled on boot)


I rebooted both nodes (not just restarting services as I had done before) and now this feature appears to be working correctly.

[root@rhel71-kilo-1 ~(keystone_admin)]# nova list
+--------------------------------------+------------+--------+------------+-------------+------------------+
| ID                                   | Name       | Status | Task State | Power State | Networks         |
+--------------------------------------+------------+--------+------------+-------------+------------------+
| 48ae094b-243f-4c20-bdcf-88d65f317194 | anti-test  | ACTIVE | -          | Running     | private=10.0.0.8 |
| 63033f1a-dfaa-4a92-9f48-ca2a8498b058 | anti-test2 | ACTIVE | -          | Running     | private=10.0.0.9 |
+--------------------------------------+------------+--------+------------+-------------+------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova server-group-list
+--------------------------------------+------------+--------------------+------------------------------------------------------------------------------------+----------+
| Id                                   | Name       | Policies           | Members                                                                            | Metadata |
+--------------------------------------+------------+--------------------+------------------------------------------------------------------------------------+----------+
| e2edd2ea-514c-46ae-9f70-abdc0764467c | anti-group | [u'anti-affinity'] | [u'63033f1a-dfaa-4a92-9f48-ca2a8498b058', u'48ae094b-243f-4c20-bdcf-88d65f317194'] | {}       |
| 346bfeb0-5e59-442e-8c0c-99f96131fec4 | aff-group  | [u'affinity']      | []                                                                                 | {}       |
+--------------------------------------+------------+--------------------+------------------------------------------------------------------------------------+----------+


[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration anti-test
ERROR (BadRequest): No valid host was found. There are not enough hosts available. (HTTP 400) (Request-ID: req-341f8752-d4f4-4421-b427-6fc241f11b57)
[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration anti-test2
ERROR (BadRequest): No valid host was found. There are not enough hosts available. (HTTP 400) (Request-ID: req-0d8af5b7-b476-4b67-ab99-050834aa18e6)
[root@rhel71-kilo-1 ~(keystone_admin)]# nova show anti-test
+--------------------------------------+----------------------------------------------------------+
| Property                             | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                   |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | rhel71-kilo-1.lab.eng.rdu2.redhat.com                    |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | rhel71-kilo-1.lab.eng.rdu2.redhat.com                    |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000007                                        |
| OS-EXT-STS:power_state               | 1                                                        |
| OS-EXT-STS:task_state                | -                                                        |
| OS-EXT-STS:vm_state                  | active                                                   |
| OS-SRV-USG:launched_at               | 2015-06-10T19:39:55.000000                               |
| OS-SRV-USG:terminated_at             | -                                                        |
| accessIPv4                           |                                                          |
| accessIPv6                           |                                                          |
| config_drive                         |                                                          |
| created                              | 2015-06-10T19:39:31Z                                     |
| flavor                               | m1.tiny (1)                                              |
| hostId                               | 2a08640e0664f97c5c193b9607402413b41c772d0db2adfcd8ef6b8a |
| id                                   | 48ae094b-243f-4c20-bdcf-88d65f317194                     |
| image                                | cirros (87c92a01-df03-46a7-9058-3c7bac3f3b62)            |
| key_name                             | -                                                        |
| metadata                             | {}                                                       |
| name                                 | anti-test                                                |
| os-extended-volumes:volumes_attached | []                                                       |
| private network                      | 10.0.0.8                                                 |
| progress                             | 0                                                        |
| security_groups                      | default                                                  |
| status                               | ACTIVE                                                   |
| tenant_id                            | 5d0f3284b16244a395f68ba8faff2af7                         |
| updated                              | 2015-06-11T11:46:44Z                                     |
| user_id                              | 6e53af38154f44bb8d110cd864290bb6                         |
+--------------------------------------+----------------------------------------------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova show anti-test2
+--------------------------------------+----------------------------------------------------------+
| Property                             | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                   |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | rhel71-kilo-2.lab.eng.rdu2.redhat.com                    |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | rhel71-kilo-2.lab.eng.rdu2.redhat.com                    |
| OS-EXT-SRV-ATTR:instance_name        | instance-0000000a                                        |
| OS-EXT-STS:power_state               | 1                                                        |
| OS-EXT-STS:task_state                | -                                                        |
| OS-EXT-STS:vm_state                  | active                                                   |
| OS-SRV-USG:launched_at               | 2015-06-11T11:42:12.000000                               |
| OS-SRV-USG:terminated_at             | -                                                        |
| accessIPv4                           |                                                          |
| accessIPv6                           |                                                          |
| config_drive                         |                                                          |
| created                              | 2015-06-11T11:41:37Z                                     |
| flavor                               | m1.tiny (1)                                              |
| hostId                               | 13ad794381676c3d3a869192d9a339bcba5fac541e35c97f863272a5 |
| id                                   | 63033f1a-dfaa-4a92-9f48-ca2a8498b058                     |
| image                                | cirros (87c92a01-df03-46a7-9058-3c7bac3f3b62)            |
| key_name                             | -                                                        |
| metadata                             | {}                                                       |
| name                                 | anti-test2                                               |
| os-extended-volumes:volumes_attached | []                                                       |
| private network                      | 10.0.0.9                                                 |
| progress                             | 0                                                        |
| security_groups                      | default                                                  |
| status                               | ACTIVE                                                   |
| tenant_id                            | 5d0f3284b16244a395f68ba8faff2af7                         |
| updated                              | 2015-06-11T11:46:49Z                                     |
| user_id                              | 6e53af38154f44bb8d110cd864290bb6                         |
+--------------------------------------+----------------------------------------------------------+

[root@rhel71-kilo-1 ~(keystone_admin)]# nova boot --flavor 1 --image cirros --nic net-id=a298e540-a595-494c-867b-d8ec10e7fd15 --hint group=e2edd2ea-514c-46ae-9f70-abdc0764467c anti-test3
+--------------------------------------+-----------------------------------------------+
| Property                             | Value                                         |
+--------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                        |
| OS-EXT-AZ:availability_zone          | nova                                          |
| OS-EXT-SRV-ATTR:host                 | -                                             |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                             |
| OS-EXT-SRV-ATTR:instance_name        | instance-0000000b                             |
| OS-EXT-STS:power_state               | 0                                             |
| OS-EXT-STS:task_state                | scheduling                                    |
| OS-EXT-STS:vm_state                  | building                                      |
| OS-SRV-USG:launched_at               | -                                             |
| OS-SRV-USG:terminated_at             | -                                             |
| accessIPv4                           |                                               |
| accessIPv6                           |                                               |
| adminPass                            | pVQSeioVU3p7                                  |
| config_drive                         |                                               |
| created                              | 2015-06-11T11:49:23Z                          |
| flavor                               | m1.tiny (1)                                   |
| hostId                               |                                               |
| id                                   | 1b767426-c32a-4500-b594-56f37304850a          |
| image                                | cirros (87c92a01-df03-46a7-9058-3c7bac3f3b62) |
| key_name                             | -                                             |
| metadata                             | {}                                            |
| name                                 | anti-test3                                    |
| os-extended-volumes:volumes_attached | []                                            |
| progress                             | 0                                             |
| security_groups                      | default                                       |
| status                               | BUILD                                         |
| tenant_id                            | 5d0f3284b16244a395f68ba8faff2af7              |
| updated                              | 2015-06-11T11:49:23Z                          |
| user_id                              | 6e53af38154f44bb8d110cd864290bb6              |
+--------------------------------------+-----------------------------------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova list
+--------------------------------------+------------+--------+------------+-------------+------------------+
| ID                                   | Name       | Status | Task State | Power State | Networks         |
+--------------------------------------+------------+--------+------------+-------------+------------------+
| 48ae094b-243f-4c20-bdcf-88d65f317194 | anti-test  | ACTIVE | -          | Running     | private=10.0.0.8 |
| 63033f1a-dfaa-4a92-9f48-ca2a8498b058 | anti-test2 | ACTIVE | -          | Running     | private=10.0.0.9 |
| 1b767426-c32a-4500-b594-56f37304850a | anti-test3 | ERROR  | -          | NOSTATE     |                  |
+--------------------------------------+------------+--------+------------+-------------+------------------+



Both VM's live on different hypervisors which is correct.  I was also unable to migrate since both hypervisors had a VM running on it from that hypervisor group.  Trying to boot a 3rd instance properly fails (since there are only 2 compute nodes, a 3rd is not possible).  


After deleting the anti-test2 and anti-test3, I was able to perform a live-migration on anti-test, and it is on the other hypervisor.

[root@rhel71-kilo-1 ~(keystone_admin)]# nova delete anti-test2
Request to delete server anti-test2 has been accepted.
[root@rhel71-kilo-1 ~(keystone_admin)]# nova delete anti-test3
Request to delete server anti-test3 has been accepted.
[root@rhel71-kilo-1 ~(keystone_admin)]# nova list
+--------------------------------------+-----------+--------+------------+-------------+------------------+
| ID                                   | Name      | Status | Task State | Power State | Networks         |
+--------------------------------------+-----------+--------+------------+-------------+------------------+
| 48ae094b-243f-4c20-bdcf-88d65f317194 | anti-test | ACTIVE | -          | Running     | private=10.0.0.8 |
+--------------------------------------+-----------+--------+------------+-------------+------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova server-group-list
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
| Id                                   | Name       | Policies           | Members                                   | Metadata |
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
| e2edd2ea-514c-46ae-9f70-abdc0764467c | anti-group | [u'anti-affinity'] | [u'48ae094b-243f-4c20-bdcf-88d65f317194'] | {}       |
| 346bfeb0-5e59-442e-8c0c-99f96131fec4 | aff-group  | [u'affinity']      | []                                        | {}       |
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration anti-test
[root@rhel71-kilo-1 ~(keystone_admin)]# nova show anti-test

[root@rhel71-kilo-1 ~(keystone_admin)]# nova show anti-test
+--------------------------------------+----------------------------------------------------------+
| Property                             | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                   |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | rhel71-kilo-2.lab.eng.rdu2.redhat.com                    |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | rhel71-kilo-2.lab.eng.rdu2.redhat.com                    |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000007                                        |
| OS-EXT-STS:power_state               | 1                                                        |
| OS-EXT-STS:task_state                | -                                                        |
| OS-EXT-STS:vm_state                  | active                                                   |
| OS-SRV-USG:launched_at               | 2015-06-10T19:39:55.000000                               |
| OS-SRV-USG:terminated_at             | -                                                        |
| accessIPv4                           |                                                          |
| accessIPv6                           |                                                          |
| config_drive                         |                                                          |
| created                              | 2015-06-10T19:39:31Z                                     |
| flavor                               | m1.tiny (1)                                              |
| hostId                               | 13ad794381676c3d3a869192d9a339bcba5fac541e35c97f863272a5 |
| id                                   | 48ae094b-243f-4c20-bdcf-88d65f317194                     |
| image                                | cirros (87c92a01-df03-46a7-9058-3c7bac3f3b62)            |
| key_name                             | -                                                        |
| metadata                             | {}                                                       |
| name                                 | anti-test                                                |
| os-extended-volumes:volumes_attached | []                                                       |
| private network                      | 10.0.0.8                                                 |
| progress                             | 1                                                        |
| security_groups                      | default                                                  |
| status                               | ACTIVE                                                   |
| tenant_id                            | 5d0f3284b16244a395f68ba8faff2af7                         |
| updated                              | 2015-06-11T11:50:45Z                                     |
| user_id                              | 6e53af38154f44bb8d110cd864290bb6                         |
+--------------------------------------+----------------------------------------------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]#

Comment 9 errata-xmlrpc 2015-08-05 13:14:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1548


Note You need to log in before you can comment on or make changes to this bug.