Bug 1152835
| Summary: | [RFE][nova]: Honor anti-affinity policy on migration and evacuation | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | RHOS Integration <rhos-integ> | ||||
| Component: | openstack-nova | Assignee: | Sylvain Bauza <sbauza> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Sean Toner <stoner> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | unspecified | CC: | berrange, dasmith, kchamart, markmc, ndipanov, nlevinki, pablo.iranzo, pbrady, sbauza, sferdjao, sgordon, trerober, tvvcox, vromanso, yeylon | ||||
| Target Milestone: | Upstream M1 | Keywords: | FutureFeature, Triaged | ||||
| Target Release: | 7.0 (Kilo) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| URL: | https://blueprints.launchpad.net/nova/+spec/anti-affinity-on-migration | ||||||
| Whiteboard: | upstream_milestone_kilo-1 upstream_definition_approved upstream_status_implemented | ||||||
| Fixed In Version: | openstack-nova-2015.1.0-16.el7ost | Doc Type: | Enhancement | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-08-05 13:14:47 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
RHOS Integration
2014-10-15 04:02:23 UTC
I have tried this on kilo, but it does not appear to be working. I set up nova.conf to use the ServerGroupAffinityFilter and ServerGroupAntiAffinityFilter, restarted compute services on my 2 compute nodes, created an affinity and anti-affinity policy group, and booted one instance into each respective group. However, when doing a live migration (without explicitly specifying a host), the migration was still successful. I ran into a problem during juno testing with this, and the issue was that all the compute nodes needed to have the scheduler_default_filter set to use the right filter. I ensured this was the same on each compute node, but the live migration is still successful despite the instance I was trying to migrate was set to use the anti-affinity server group. Created attachment 1026801 [details]
steps showing what was tested
I have retested this and it appears to be working now. I have however noticed that when running the nova live-migration anti-test (where anti-test is the instance booted to the anti-affinity group), there is no message of any kind indicating the migration failed. And indeed it seems successful.
[root@rhel71-kilo-1 ~(keystone_admin)]# nova boot --flavor 1 --image cirros --nic net-id=7cf22ea3-6483-46b9-8b75-ce18b4026ea5 --hint "group=80445252-da4e-43e0-a669-0ff52d383e1c" anti-test
[root@rhel71-kilo-1 ~(keystone_admin)]# nova boot --flavor 1 --image cirros --nic net-id=7cf22ea3-6483-46b9-8b75-ce18b4026ea5 --hint "group=01e9aa8c-f501-411a-a615-07cf48a552b8" aff-test
[root@rhel71-kilo-1 ~(keystone_admin)]# nova server-group-list
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
| Id | Name | Policies | Members | Metadata |
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
| 80445252-da4e-43e0-a669-0ff52d383e1c | anti-group | [u'anti-affinity'] | [u'd23e5c77-cd0f-4571-a1af-2fec8ad36549'] | {} |
| 01e9aa8c-f501-411a-a615-07cf48a552b8 | aff-group | [u'affinity'] | [u'87fb27cf-ed9b-407f-af65-8c6e31326f42'] | {} |
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova hypervisor-list
+----+---------------------------------------+-------+---------+
| ID | Hypervisor hostname | State | Status |
+----+---------------------------------------+-------+---------+
| 1 | rhel71-kilo-1.lab.eng.rdu2.redhat.com | up | enabled |
| 2 | rhel71-kilo-2.lab.eng.rdu2.redhat.com | up | enabled |
+----+---------------------------------------+-------+---------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration anti-test
[root@rhel71-kilo-1 ~(keystone_admin)]# echo $?
0
[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration aff-test
ERROR (BadRequest): No valid host was found. There are not enough hosts available. (HTTP 400) (Request-ID: req-422fa442-812b-4e0b-86ef-7aa5a263d746)
[root@rhel71-kilo-1 ~(keystone_admin)]# echo $?
1
[root@rhel71-kilo-1 ~(keystone_admin)]#
I do not think this is correct behavior despite the anti-affinity policy being honored. The scheduler should be reporting in the anti-affinity case that no valid hosts were found just like in the affinity case.
ok, I take back what I said in the above comment. I ran a couple more: nova live-migrate anti-test There appears to be a race condition where if you run: nova show anti-test immediately after you run the live-migration command, it will look like the instance is still on the same host (and therefore failed). However, if you wait a few seconds and run nova show anti-test again, it will show that the instance has indeed switched hosts. So there's 2 problems: 1) performing live migration on an instance booted with a hint to an Affinity Group is still not being honored 2) there's a race where information shown from nova show is not synchronized with respect to the live migration (it takes at least a second or two after the live migration has completed for the host the instance is on to actually reflect where it really is) I also have the filters set correctly on all my compute nodes: [root@rhel71-kilo-1 ~(keystone_admin)]# grep -Hrni AffinityFilter /etc/nova/nova.conf /etc/nova/nova.conf:1621:#scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter /etc/nova/nova.conf:1622:scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ImagePropertiesFilter,CoreFilter,NUMATopologyFilter,ServerGroupAffinityFilter,ServerGroupAntiAffinityFilter,AggregateInstanceExtraSpecsFilter [root@rhel71-kilo-2 ~]# grep -Hrni AffinityFilter /etc/nova/nova.conf /etc/nova/nova.conf:1609:#scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter /etc/nova/nova.conf:1967:scheduler_default_filters = RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,NUMATopologyFilter,AggregateInstanceExtraSpecsFilter I think I see what was causing my failures (which is perhaps a different bug). Somehow, from my main nova controller if I ran nova hypervisor-list, it reported that one of my two hypervisors was down.
[root@rhel71-kilo-1 ~(keystone_admin)]# nova hypervisor-list
+----+---------------------------------------+-------+---------+
| ID | Hypervisor hostname | State | Status |
+----+---------------------------------------+-------+---------+
| 1 | rhel71-kilo-1.lab.eng.rdu2.redhat.com | up | enabled |
| 2 | rhel71-kilo-2.lab.eng.rdu2.redhat.com | down | enabled |
+----+---------------------------------------+-------+---------+
However, running openstack-status or openstack-service status nova on the other compute node showed it was up and running.
[root@rhel71-kilo-2 ~]# openstack-status
== Nova services ==
openstack-nova-api: inactive (disabled on boot)
openstack-nova-compute: active
openstack-nova-network: inactive (disabled on boot)
openstack-nova-scheduler: inactive (disabled on boot)
I rebooted both nodes (not just restarting services as I had done before) and now this feature appears to be working correctly.
[root@rhel71-kilo-1 ~(keystone_admin)]# nova list
+--------------------------------------+------------+--------+------------+-------------+------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------------+--------+------------+-------------+------------------+
| 48ae094b-243f-4c20-bdcf-88d65f317194 | anti-test | ACTIVE | - | Running | private=10.0.0.8 |
| 63033f1a-dfaa-4a92-9f48-ca2a8498b058 | anti-test2 | ACTIVE | - | Running | private=10.0.0.9 |
+--------------------------------------+------------+--------+------------+-------------+------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova server-group-list
+--------------------------------------+------------+--------------------+------------------------------------------------------------------------------------+----------+
| Id | Name | Policies | Members | Metadata |
+--------------------------------------+------------+--------------------+------------------------------------------------------------------------------------+----------+
| e2edd2ea-514c-46ae-9f70-abdc0764467c | anti-group | [u'anti-affinity'] | [u'63033f1a-dfaa-4a92-9f48-ca2a8498b058', u'48ae094b-243f-4c20-bdcf-88d65f317194'] | {} |
| 346bfeb0-5e59-442e-8c0c-99f96131fec4 | aff-group | [u'affinity'] | [] | {} |
+--------------------------------------+------------+--------------------+------------------------------------------------------------------------------------+----------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration anti-test
ERROR (BadRequest): No valid host was found. There are not enough hosts available. (HTTP 400) (Request-ID: req-341f8752-d4f4-4421-b427-6fc241f11b57)
[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration anti-test2
ERROR (BadRequest): No valid host was found. There are not enough hosts available. (HTTP 400) (Request-ID: req-0d8af5b7-b476-4b67-ab99-050834aa18e6)
[root@rhel71-kilo-1 ~(keystone_admin)]# nova show anti-test
+--------------------------------------+----------------------------------------------------------+
| Property | Value |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | rhel71-kilo-1.lab.eng.rdu2.redhat.com |
| OS-EXT-SRV-ATTR:hypervisor_hostname | rhel71-kilo-1.lab.eng.rdu2.redhat.com |
| OS-EXT-SRV-ATTR:instance_name | instance-00000007 |
| OS-EXT-STS:power_state | 1 |
| OS-EXT-STS:task_state | - |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2015-06-10T19:39:55.000000 |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| config_drive | |
| created | 2015-06-10T19:39:31Z |
| flavor | m1.tiny (1) |
| hostId | 2a08640e0664f97c5c193b9607402413b41c772d0db2adfcd8ef6b8a |
| id | 48ae094b-243f-4c20-bdcf-88d65f317194 |
| image | cirros (87c92a01-df03-46a7-9058-3c7bac3f3b62) |
| key_name | - |
| metadata | {} |
| name | anti-test |
| os-extended-volumes:volumes_attached | [] |
| private network | 10.0.0.8 |
| progress | 0 |
| security_groups | default |
| status | ACTIVE |
| tenant_id | 5d0f3284b16244a395f68ba8faff2af7 |
| updated | 2015-06-11T11:46:44Z |
| user_id | 6e53af38154f44bb8d110cd864290bb6 |
+--------------------------------------+----------------------------------------------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova show anti-test2
+--------------------------------------+----------------------------------------------------------+
| Property | Value |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | rhel71-kilo-2.lab.eng.rdu2.redhat.com |
| OS-EXT-SRV-ATTR:hypervisor_hostname | rhel71-kilo-2.lab.eng.rdu2.redhat.com |
| OS-EXT-SRV-ATTR:instance_name | instance-0000000a |
| OS-EXT-STS:power_state | 1 |
| OS-EXT-STS:task_state | - |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2015-06-11T11:42:12.000000 |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| config_drive | |
| created | 2015-06-11T11:41:37Z |
| flavor | m1.tiny (1) |
| hostId | 13ad794381676c3d3a869192d9a339bcba5fac541e35c97f863272a5 |
| id | 63033f1a-dfaa-4a92-9f48-ca2a8498b058 |
| image | cirros (87c92a01-df03-46a7-9058-3c7bac3f3b62) |
| key_name | - |
| metadata | {} |
| name | anti-test2 |
| os-extended-volumes:volumes_attached | [] |
| private network | 10.0.0.9 |
| progress | 0 |
| security_groups | default |
| status | ACTIVE |
| tenant_id | 5d0f3284b16244a395f68ba8faff2af7 |
| updated | 2015-06-11T11:46:49Z |
| user_id | 6e53af38154f44bb8d110cd864290bb6 |
+--------------------------------------+----------------------------------------------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova boot --flavor 1 --image cirros --nic net-id=a298e540-a595-494c-867b-d8ec10e7fd15 --hint group=e2edd2ea-514c-46ae-9f70-abdc0764467c anti-test3
+--------------------------------------+-----------------------------------------------+
| Property | Value |
+--------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | instance-0000000b |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | pVQSeioVU3p7 |
| config_drive | |
| created | 2015-06-11T11:49:23Z |
| flavor | m1.tiny (1) |
| hostId | |
| id | 1b767426-c32a-4500-b594-56f37304850a |
| image | cirros (87c92a01-df03-46a7-9058-3c7bac3f3b62) |
| key_name | - |
| metadata | {} |
| name | anti-test3 |
| os-extended-volumes:volumes_attached | [] |
| progress | 0 |
| security_groups | default |
| status | BUILD |
| tenant_id | 5d0f3284b16244a395f68ba8faff2af7 |
| updated | 2015-06-11T11:49:23Z |
| user_id | 6e53af38154f44bb8d110cd864290bb6 |
+--------------------------------------+-----------------------------------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova list
+--------------------------------------+------------+--------+------------+-------------+------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------------+--------+------------+-------------+------------------+
| 48ae094b-243f-4c20-bdcf-88d65f317194 | anti-test | ACTIVE | - | Running | private=10.0.0.8 |
| 63033f1a-dfaa-4a92-9f48-ca2a8498b058 | anti-test2 | ACTIVE | - | Running | private=10.0.0.9 |
| 1b767426-c32a-4500-b594-56f37304850a | anti-test3 | ERROR | - | NOSTATE | |
+--------------------------------------+------------+--------+------------+-------------+------------------+
Both VM's live on different hypervisors which is correct. I was also unable to migrate since both hypervisors had a VM running on it from that hypervisor group. Trying to boot a 3rd instance properly fails (since there are only 2 compute nodes, a 3rd is not possible).
After deleting the anti-test2 and anti-test3, I was able to perform a live-migration on anti-test, and it is on the other hypervisor.
[root@rhel71-kilo-1 ~(keystone_admin)]# nova delete anti-test2
Request to delete server anti-test2 has been accepted.
[root@rhel71-kilo-1 ~(keystone_admin)]# nova delete anti-test3
Request to delete server anti-test3 has been accepted.
[root@rhel71-kilo-1 ~(keystone_admin)]# nova list
+--------------------------------------+-----------+--------+------------+-------------+------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-----------+--------+------------+-------------+------------------+
| 48ae094b-243f-4c20-bdcf-88d65f317194 | anti-test | ACTIVE | - | Running | private=10.0.0.8 |
+--------------------------------------+-----------+--------+------------+-------------+------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova server-group-list
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
| Id | Name | Policies | Members | Metadata |
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
| e2edd2ea-514c-46ae-9f70-abdc0764467c | anti-group | [u'anti-affinity'] | [u'48ae094b-243f-4c20-bdcf-88d65f317194'] | {} |
| 346bfeb0-5e59-442e-8c0c-99f96131fec4 | aff-group | [u'affinity'] | [] | {} |
+--------------------------------------+------------+--------------------+-------------------------------------------+----------+
[root@rhel71-kilo-1 ~(keystone_admin)]# nova live-migration anti-test
[root@rhel71-kilo-1 ~(keystone_admin)]# nova show anti-test
[root@rhel71-kilo-1 ~(keystone_admin)]# nova show anti-test
+--------------------------------------+----------------------------------------------------------+
| Property | Value |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | rhel71-kilo-2.lab.eng.rdu2.redhat.com |
| OS-EXT-SRV-ATTR:hypervisor_hostname | rhel71-kilo-2.lab.eng.rdu2.redhat.com |
| OS-EXT-SRV-ATTR:instance_name | instance-00000007 |
| OS-EXT-STS:power_state | 1 |
| OS-EXT-STS:task_state | - |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2015-06-10T19:39:55.000000 |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| config_drive | |
| created | 2015-06-10T19:39:31Z |
| flavor | m1.tiny (1) |
| hostId | 13ad794381676c3d3a869192d9a339bcba5fac541e35c97f863272a5 |
| id | 48ae094b-243f-4c20-bdcf-88d65f317194 |
| image | cirros (87c92a01-df03-46a7-9058-3c7bac3f3b62) |
| key_name | - |
| metadata | {} |
| name | anti-test |
| os-extended-volumes:volumes_attached | [] |
| private network | 10.0.0.8 |
| progress | 1 |
| security_groups | default |
| status | ACTIVE |
| tenant_id | 5d0f3284b16244a395f68ba8faff2af7 |
| updated | 2015-06-11T11:50:45Z |
| user_id | 6e53af38154f44bb8d110cd864290bb6 |
+--------------------------------------+----------------------------------------------------------+
[root@rhel71-kilo-1 ~(keystone_admin)]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1548 |