Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1152835

Summary: [RFE][nova]: Honor anti-affinity policy on migration and evacuation
Product: Red Hat OpenStack Reporter: RHOS Integration <rhos-integ>
Component: openstack-novaAssignee: Sylvain Bauza <sbauza>
Status: CLOSED ERRATA QA Contact: Sean Toner <stoner>
Severity: low Docs Contact:
Priority: high    
Version: unspecifiedCC: berrange, dasmith, kchamart, markmc, ndipanov, nlevinki, pablo.iranzo, pbrady, sbauza, sferdjao, sgordon, trerober, tvvcox, vromanso, yeylon
Target Milestone: Upstream M1Keywords: FutureFeature, Triaged
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
URL: https://blueprints.launchpad.net/nova/+spec/anti-affinity-on-migration
Whiteboard: upstream_milestone_kilo-1 upstream_definition_approved upstream_status_implemented
Fixed In Version: openstack-nova-2015.1.0-16.el7ost Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-05 13:14:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
steps showing what was tested none

Description RHOS Integration 2014-10-15 04:02:23 UTC
Cloned from launchpad blueprint https://blueprints.launchpad.net/nova/+spec/anti-affinity-on-migration.

Description:

If you create a server group with an anti-affinity policy, it is only honored during the initial boot. If you do a cold migrate, live migrate, or evacuate where the scheduler is picking the destination, it seems reasonable to expect that the scheduler will continue to honor that policy. However, it does not.

The reason for this has to do with an implementation optimization in the scheduler. It skips all of the group checking if the 'group' hint is not present. Since scheduler hints are only kept around for the initial boot, this doesn't work. One solution would be to persist scheduler hints. However, a shorter term fix specifically for server groups is to always check the database for group membership when the server group filters are enabled.

Specification URL (additional information):

None

Comment 2 Sean Toner 2015-05-18 18:16:33 UTC
I have tried this on kilo, but it does not appear to be working.  I set up nova.conf to use the ServerGroupAffinityFilter and ServerGroupAntiAffinityFilter, restarted compute services on my 2 compute nodes, created an affinity and anti-affinity policy group, and booted one instance into each respective group.

However, when doing a live migration (without explicitly specifying a host), the migration was still successful.  I ran into a problem during juno testing with this, and the issue was that all the compute nodes needed to have the scheduler_default_filter set to use the right filter.  I ensured this was the same on each compute node, but the live migration is still successful despite the instance I was trying to migrate was set to use the anti-affinity server group.

Comment 3 Sean Toner 2015-05-18 18:17:28 UTC
Created attachment 1026801 [details]
steps showing what was tested

Comment 5 Sean Toner 2015-06-09 13:18:31 UTC
I have retested this and it appears to be working now.  I have however noticed that when running the nova live-migration anti-test (where anti-test is the instance booted to the anti-affinity group), there is no message of any kind indicating the migration failed.  And indeed it seems successful.

[root@rhel71-kilo-1 ~(keystone_admin)]# nova boot --flavor 1 --image cirros --nic net-id=7cf22ea3-6483-46b9-8b75-ce18b4026ea5 --hint "group=80445252-da4e-43e0-a669-0ff52d383e1c" anti-test

[root@rhel71-kilo-1 ~(keystone_admin)]# nova boot --flavor 1 --image cirros --nic net-id=7cf22ea3-6483-46b9-8b75-ce18b4026ea5 --hint "group=01e9aa8c-f501-411a-a615-07cf48a552b8" aff-test

[root@rhel71-kilo-1 ~(keystone_admin)]# nova server-group-list
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
| Id                                   | Name       | Policies           | Members                                   | Metadata |
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
| 80445252-da4e-43e0-a669-0ff52d383e1c | anti-group | [u'anti-affinity'] | [u'd23e5c77-cd0f-4571-a1af-2fec8ad36549'] | {}       |
| 01e9aa8c-f501-411a-a615-07cf48a552b8 | aff-group  | [u'affinity']      | [u'87fb27cf-ed9b-407f-af65-8c6e31326f42'] | {}       |
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova hypervisor-list
+----+---------------------------------------+-------+---------+
| ID | Hypervisor hostname                   | State | Status  |
+----+---------------------------------------+-------+---------+
| 1  | rhel71-kilo-1.lab.eng.rdu2.redhat.com | up    | enabled |
| 2  | rhel71-kilo-2.lab.eng.rdu2.redhat.com | up    | enabled |
+----+---------------------------------------+-------+---------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration anti-test
[root@rhel71-kilo-1 ~(keystone_admin)]# echo $?
0
[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration aff-test
ERROR (BadRequest): No valid host was found. There are not enough hosts available. (HTTP 400) (Request-ID: req-422fa442-812b-4e0b-86ef-7aa5a263d746)
[root@rhel71-kilo-1 ~(keystone_admin)]# echo $?
1
[root@rhel71-kilo-1 ~(keystone_admin)]# 


I do not think this is correct behavior despite the anti-affinity policy being honored.  The scheduler should be reporting in the anti-affinity case that no valid hosts were found just like in the affinity case.

Comment 6 Sean Toner 2015-06-09 13:28:08 UTC
ok, I take back what I said in the above comment.  I ran a couple more:

nova live-migrate anti-test

There appears to be a race condition where if you run:

nova show anti-test 

immediately after you run the live-migration command, it will look like the instance is still on the same host (and therefore failed).  However, if you wait a few seconds and run nova show anti-test again, it will show that the instance has indeed switched hosts.

So there's 2 problems:

1) performing live migration on an instance booted with a hint to an Affinity Group is still not being honored
2) there's a race where information shown from nova show is not synchronized with respect to the live migration (it takes at least a second or two after the live migration has completed for the host the instance is on to actually reflect where it really is)

I also have the filters set correctly on all my compute nodes:

[root@rhel71-kilo-1 ~(keystone_admin)]# grep -Hrni AffinityFilter /etc/nova/nova.conf 
/etc/nova/nova.conf:1621:#scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter
/etc/nova/nova.conf:1622:scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ImagePropertiesFilter,CoreFilter,NUMATopologyFilter,ServerGroupAffinityFilter,ServerGroupAntiAffinityFilter,AggregateInstanceExtraSpecsFilter


[root@rhel71-kilo-2 ~]# grep -Hrni AffinityFilter /etc/nova/nova.conf 
/etc/nova/nova.conf:1609:#scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter
/etc/nova/nova.conf:1967:scheduler_default_filters = RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,NUMATopologyFilter,AggregateInstanceExtraSpecsFilter

Comment 7 Sean Toner 2015-06-11 11:52:55 UTC
I think I see what was causing my failures (which is perhaps a different bug).  Somehow, from my main nova controller if I ran nova hypervisor-list, it reported that one of my two hypervisors was down.  

[root@rhel71-kilo-1 ~(keystone_admin)]# nova hypervisor-list
+----+---------------------------------------+-------+---------+
| ID | Hypervisor hostname                   | State | Status  |
+----+---------------------------------------+-------+---------+
| 1  | rhel71-kilo-1.lab.eng.rdu2.redhat.com | up    | enabled |
| 2  | rhel71-kilo-2.lab.eng.rdu2.redhat.com | down  | enabled |
+----+---------------------------------------+-------+---------+

However, running openstack-status or openstack-service status nova on the other compute node showed it was up and running.

[root@rhel71-kilo-2 ~]# openstack-status
== Nova services ==
openstack-nova-api:                     inactive  (disabled on boot)
openstack-nova-compute:                 active
openstack-nova-network:                 inactive  (disabled on boot)
openstack-nova-scheduler:               inactive  (disabled on boot)


I rebooted both nodes (not just restarting services as I had done before) and now this feature appears to be working correctly.

[root@rhel71-kilo-1 ~(keystone_admin)]# nova list
+--------------------------------------+------------+--------+------------+-------------+------------------+
| ID                                   | Name       | Status | Task State | Power State | Networks         |
+--------------------------------------+------------+--------+------------+-------------+------------------+
| 48ae094b-243f-4c20-bdcf-88d65f317194 | anti-test  | ACTIVE | -          | Running     | private=10.0.0.8 |
| 63033f1a-dfaa-4a92-9f48-ca2a8498b058 | anti-test2 | ACTIVE | -          | Running     | private=10.0.0.9 |
+--------------------------------------+------------+--------+------------+-------------+------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova server-group-list
+--------------------------------------+------------+--------------------+------------------------------------------------------------------------------------+----------+
| Id                                   | Name       | Policies           | Members                                                                            | Metadata |
+--------------------------------------+------------+--------------------+------------------------------------------------------------------------------------+----------+
| e2edd2ea-514c-46ae-9f70-abdc0764467c | anti-group | [u'anti-affinity'] | [u'63033f1a-dfaa-4a92-9f48-ca2a8498b058', u'48ae094b-243f-4c20-bdcf-88d65f317194'] | {}       |
| 346bfeb0-5e59-442e-8c0c-99f96131fec4 | aff-group  | [u'affinity']      | []                                                                                 | {}       |
+--------------------------------------+------------+--------------------+------------------------------------------------------------------------------------+----------+


[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration anti-test
ERROR (BadRequest): No valid host was found. There are not enough hosts available. (HTTP 400) (Request-ID: req-341f8752-d4f4-4421-b427-6fc241f11b57)
[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration anti-test2
ERROR (BadRequest): No valid host was found. There are not enough hosts available. (HTTP 400) (Request-ID: req-0d8af5b7-b476-4b67-ab99-050834aa18e6)
[root@rhel71-kilo-1 ~(keystone_admin)]# nova show anti-test
+--------------------------------------+----------------------------------------------------------+
| Property                             | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                   |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | rhel71-kilo-1.lab.eng.rdu2.redhat.com                    |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | rhel71-kilo-1.lab.eng.rdu2.redhat.com                    |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000007                                        |
| OS-EXT-STS:power_state               | 1                                                        |
| OS-EXT-STS:task_state                | -                                                        |
| OS-EXT-STS:vm_state                  | active                                                   |
| OS-SRV-USG:launched_at               | 2015-06-10T19:39:55.000000                               |
| OS-SRV-USG:terminated_at             | -                                                        |
| accessIPv4                           |                                                          |
| accessIPv6                           |                                                          |
| config_drive                         |                                                          |
| created                              | 2015-06-10T19:39:31Z                                     |
| flavor                               | m1.tiny (1)                                              |
| hostId                               | 2a08640e0664f97c5c193b9607402413b41c772d0db2adfcd8ef6b8a |
| id                                   | 48ae094b-243f-4c20-bdcf-88d65f317194                     |
| image                                | cirros (87c92a01-df03-46a7-9058-3c7bac3f3b62)            |
| key_name                             | -                                                        |
| metadata                             | {}                                                       |
| name                                 | anti-test                                                |
| os-extended-volumes:volumes_attached | []                                                       |
| private network                      | 10.0.0.8                                                 |
| progress                             | 0                                                        |
| security_groups                      | default                                                  |
| status                               | ACTIVE                                                   |
| tenant_id                            | 5d0f3284b16244a395f68ba8faff2af7                         |
| updated                              | 2015-06-11T11:46:44Z                                     |
| user_id                              | 6e53af38154f44bb8d110cd864290bb6                         |
+--------------------------------------+----------------------------------------------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova show anti-test2
+--------------------------------------+----------------------------------------------------------+
| Property                             | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                   |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | rhel71-kilo-2.lab.eng.rdu2.redhat.com                    |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | rhel71-kilo-2.lab.eng.rdu2.redhat.com                    |
| OS-EXT-SRV-ATTR:instance_name        | instance-0000000a                                        |
| OS-EXT-STS:power_state               | 1                                                        |
| OS-EXT-STS:task_state                | -                                                        |
| OS-EXT-STS:vm_state                  | active                                                   |
| OS-SRV-USG:launched_at               | 2015-06-11T11:42:12.000000                               |
| OS-SRV-USG:terminated_at             | -                                                        |
| accessIPv4                           |                                                          |
| accessIPv6                           |                                                          |
| config_drive                         |                                                          |
| created                              | 2015-06-11T11:41:37Z                                     |
| flavor                               | m1.tiny (1)                                              |
| hostId                               | 13ad794381676c3d3a869192d9a339bcba5fac541e35c97f863272a5 |
| id                                   | 63033f1a-dfaa-4a92-9f48-ca2a8498b058                     |
| image                                | cirros (87c92a01-df03-46a7-9058-3c7bac3f3b62)            |
| key_name                             | -                                                        |
| metadata                             | {}                                                       |
| name                                 | anti-test2                                               |
| os-extended-volumes:volumes_attached | []                                                       |
| private network                      | 10.0.0.9                                                 |
| progress                             | 0                                                        |
| security_groups                      | default                                                  |
| status                               | ACTIVE                                                   |
| tenant_id                            | 5d0f3284b16244a395f68ba8faff2af7                         |
| updated                              | 2015-06-11T11:46:49Z                                     |
| user_id                              | 6e53af38154f44bb8d110cd864290bb6                         |
+--------------------------------------+----------------------------------------------------------+

[root@rhel71-kilo-1 ~(keystone_admin)]# nova boot --flavor 1 --image cirros --nic net-id=a298e540-a595-494c-867b-d8ec10e7fd15 --hint group=e2edd2ea-514c-46ae-9f70-abdc0764467c anti-test3
+--------------------------------------+-----------------------------------------------+
| Property                             | Value                                         |
+--------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                        |
| OS-EXT-AZ:availability_zone          | nova                                          |
| OS-EXT-SRV-ATTR:host                 | -                                             |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                             |
| OS-EXT-SRV-ATTR:instance_name        | instance-0000000b                             |
| OS-EXT-STS:power_state               | 0                                             |
| OS-EXT-STS:task_state                | scheduling                                    |
| OS-EXT-STS:vm_state                  | building                                      |
| OS-SRV-USG:launched_at               | -                                             |
| OS-SRV-USG:terminated_at             | -                                             |
| accessIPv4                           |                                               |
| accessIPv6                           |                                               |
| adminPass                            | pVQSeioVU3p7                                  |
| config_drive                         |                                               |
| created                              | 2015-06-11T11:49:23Z                          |
| flavor                               | m1.tiny (1)                                   |
| hostId                               |                                               |
| id                                   | 1b767426-c32a-4500-b594-56f37304850a          |
| image                                | cirros (87c92a01-df03-46a7-9058-3c7bac3f3b62) |
| key_name                             | -                                             |
| metadata                             | {}                                            |
| name                                 | anti-test3                                    |
| os-extended-volumes:volumes_attached | []                                            |
| progress                             | 0                                             |
| security_groups                      | default                                       |
| status                               | BUILD                                         |
| tenant_id                            | 5d0f3284b16244a395f68ba8faff2af7              |
| updated                              | 2015-06-11T11:49:23Z                          |
| user_id                              | 6e53af38154f44bb8d110cd864290bb6              |
+--------------------------------------+-----------------------------------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova list
+--------------------------------------+------------+--------+------------+-------------+------------------+
| ID                                   | Name       | Status | Task State | Power State | Networks         |
+--------------------------------------+------------+--------+------------+-------------+------------------+
| 48ae094b-243f-4c20-bdcf-88d65f317194 | anti-test  | ACTIVE | -          | Running     | private=10.0.0.8 |
| 63033f1a-dfaa-4a92-9f48-ca2a8498b058 | anti-test2 | ACTIVE | -          | Running     | private=10.0.0.9 |
| 1b767426-c32a-4500-b594-56f37304850a | anti-test3 | ERROR  | -          | NOSTATE     |                  |
+--------------------------------------+------------+--------+------------+-------------+------------------+



Both VM's live on different hypervisors which is correct.  I was also unable to migrate since both hypervisors had a VM running on it from that hypervisor group.  Trying to boot a 3rd instance properly fails (since there are only 2 compute nodes, a 3rd is not possible).  


After deleting the anti-test2 and anti-test3, I was able to perform a live-migration on anti-test, and it is on the other hypervisor.

[root@rhel71-kilo-1 ~(keystone_admin)]# nova delete anti-test2
Request to delete server anti-test2 has been accepted.
[root@rhel71-kilo-1 ~(keystone_admin)]# nova delete anti-test3
Request to delete server anti-test3 has been accepted.
[root@rhel71-kilo-1 ~(keystone_admin)]# nova list
+--------------------------------------+-----------+--------+------------+-------------+------------------+
| ID                                   | Name      | Status | Task State | Power State | Networks         |
+--------------------------------------+-----------+--------+------------+-------------+------------------+
| 48ae094b-243f-4c20-bdcf-88d65f317194 | anti-test | ACTIVE | -          | Running     | private=10.0.0.8 |
+--------------------------------------+-----------+--------+------------+-------------+------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova server-group-list
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
| Id                                   | Name       | Policies           | Members                                   | Metadata |
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
| e2edd2ea-514c-46ae-9f70-abdc0764467c | anti-group | [u'anti-affinity'] | [u'48ae094b-243f-4c20-bdcf-88d65f317194'] | {}       |
| 346bfeb0-5e59-442e-8c0c-99f96131fec4 | aff-group  | [u'affinity']      | []                                        | {}       |
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration anti-test
[root@rhel71-kilo-1 ~(keystone_admin)]# nova show anti-test

[root@rhel71-kilo-1 ~(keystone_admin)]# nova show anti-test
+--------------------------------------+----------------------------------------------------------+
| Property                             | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                   |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | rhel71-kilo-2.lab.eng.rdu2.redhat.com                    |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | rhel71-kilo-2.lab.eng.rdu2.redhat.com                    |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000007                                        |
| OS-EXT-STS:power_state               | 1                                                        |
| OS-EXT-STS:task_state                | -                                                        |
| OS-EXT-STS:vm_state                  | active                                                   |
| OS-SRV-USG:launched_at               | 2015-06-10T19:39:55.000000                               |
| OS-SRV-USG:terminated_at             | -                                                        |
| accessIPv4                           |                                                          |
| accessIPv6                           |                                                          |
| config_drive                         |                                                          |
| created                              | 2015-06-10T19:39:31Z                                     |
| flavor                               | m1.tiny (1)                                              |
| hostId                               | 13ad794381676c3d3a869192d9a339bcba5fac541e35c97f863272a5 |
| id                                   | 48ae094b-243f-4c20-bdcf-88d65f317194                     |
| image                                | cirros (87c92a01-df03-46a7-9058-3c7bac3f3b62)            |
| key_name                             | -                                                        |
| metadata                             | {}                                                       |
| name                                 | anti-test                                                |
| os-extended-volumes:volumes_attached | []                                                       |
| private network                      | 10.0.0.8                                                 |
| progress                             | 1                                                        |
| security_groups                      | default                                                  |
| status                               | ACTIVE                                                   |
| tenant_id                            | 5d0f3284b16244a395f68ba8faff2af7                         |
| updated                              | 2015-06-11T11:50:45Z                                     |
| user_id                              | 6e53af38154f44bb8d110cd864290bb6                         |
+--------------------------------------+----------------------------------------------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]#

Comment 9 errata-xmlrpc 2015-08-05 13:14:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1548