Bug 1657391 - Host Evacuate fails for CPU Weigher filter due to moving smallest instance first
Summary: Host Evacuate fails for CPU Weigher filter due to moving smallest instance first
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks: 1469073
TreeView+ depends on / blocked
 
Reported: 2018-12-07 21:18 UTC by awaugama
Modified: 2023-03-21 19:09 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-10 15:03:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description awaugama 2018-12-07 21:18:01 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 awaugama 2018-12-07 21:25:23 UTC
Sorry hit submit on accident before updating the bug.

When trying to run a host evacuate with 2 instances of different sizes, the larger instance fails to evacuate.  This seems to be because the smaller instance is evacuated first and takes up the "Slot" the larger instance could use.  

Layout before Host Evacuate

Compute-0: 2 instances, 1 with 2 vCPU, 1 with 1 vCPU
Compute-1: 1 vCPU free (4 vCPU total, 1 instance using 3 on the node)
Compute-2: 2 vCPU free (4 vCPU total, 1 instance using 2 on the node)

Expected after Host Evacuate:

Compute-0: 0 instances
Compute-1: The instance from Compute Node 0 using 1 vCPU, the instance using 3 that was there before
Compute-2: The instance from Compute Node 0 using 2 vCPU, the instance using 2 that was there before

Actual Results:

Compute-0: 2 vCPU instance failed to evacuate, listed as error
Compute-1: The instance that was using 3 vCPU only
Compute-2: The instance from Compute Node 0 using 1 vCPU, the instance using 2 that was there before.

Logs below:

()[root@compute-0 /]# yum info openstack-nova-common.noarch
Version     : 18.0.3
Release     : 0.20181011032838.d1243fe.el7ost

()[root@compute-0 /]# yum info openstack-nova-compute.noarch
Version     : 18.0.3
Release     : 0.20181011032838.d1243fe.el7ost

()[root@compute-0 /]# yum info openstack-nova-migration.noarch
Version     : 18.0.3
Release     : 0.20181011032838.d1243fe.el7ost

(overcloud) [stack@undercloud-0 ~]$ openstack server list --long
+--------------------------------------+--------------------------+--------+------------+-------------+-------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+-----------------------+------------+
| ID                                   | Name                     | Status | Task State | Power State | Networks          | Image Name | Image ID                             | Flavor Name | Flavor ID                            | Availability Zone | Host                  | Properties |
+--------------------------------------+--------------------------+--------+------------+-------------+-------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+-----------------------+------------+
| 2fe86736-4073-427e-84da-074dc78705e6 | compute_0_large_instance | ACTIVE | None       | Running     | public=10.0.0.216 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_2_vcpu  | ee083ec8-f154-44a6-a0e1-4b8d32f02561 | nova              | compute-0.localdomain |            |
| 7f741183-b831-4331-a1e2-fa5a630f92a9 | compute_0_small_instance | ACTIVE | None       | Running     | public=10.0.0.213 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_1_vcpu  | 52854d2b-613f-43e8-ab19-93b9b6c1abe0 | nova              | compute-0.localdomain |            |
| 10527e30-7fe4-4ed3-bc18-c0a3bdaa260e | compute_1_use_3_vcpu     | ACTIVE | None       | Running     | public=10.0.0.231 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_3_vcpu  | eb8d60d6-8787-436c-a0a5-eb2f5d1930ec | nova              | compute-1.localdomain |            |
| 2ff31a01-b233-4688-83e6-7ad1e5b8b330 | compute_2_use_2_vcpu     | ACTIVE | None       | Running     | public=10.0.0.218 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_2_vcpu  | ee083ec8-f154-44a6-a0e1-4b8d32f02561 | nova              | compute-2.localdomain |            |
+--------------------------------------+--------------------------+--------+------------+-------------+-------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+-----------------------+------------+

(overcloud) [stack@undercloud-0 ~]$ openstack hypervisor show 1 | grep vcpus
| vcpus                | 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| vcpus_used           | 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
(overcloud) [stack@undercloud-0 ~]$ openstack hypervisor show 2 | grep vcpus
| vcpus                | 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| vcpus_used           | 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
(overcloud) [stack@undercloud-0 ~]$ openstack hypervisor show 3 | grep vcpus
| vcpus                | 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| vcpus_used           | 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |

(overcloud) [stack@undercloud-0 ~]$ nova service-force-down 4e52c4fb-7bc3-4638-98d5-58a64e85be97
+--------------------------------------+-----------------------+--------------+-------------+
| ID                                   | Host                  | Binary       | Forced down |
+--------------------------------------+-----------------------+--------------+-------------+
| 4e52c4fb-7bc3-4638-98d5-58a64e85be97 | compute-0.localdomain | nova-compute | True        |
+--------------------------------------+-----------------------+--------------+-------------+

(overcloud) [stack@undercloud-0 ~]$ nova host-evacuate compute-0.localdomain
+--------------------------------------+-------------------+---------------+
| Server UUID                          | Evacuate Accepted | Error Message |
+--------------------------------------+-------------------+---------------+
| 7f741183-b831-4331-a1e2-fa5a630f92a9 | True              |               |
| 2fe86736-4073-427e-84da-074dc78705e6 | True              |               |
+--------------------------------------+-------------------+---------------+

(overcloud) [stack@undercloud-0 ~]$ openstack server list --long
+--------------------------------------+--------------------------+--------+------------+-------------+-------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+-----------------------+------------+
| ID                                   | Name                     | Status | Task State | Power State | Networks          | Image Name | Image ID                             | Flavor Name | Flavor ID                            | Availability Zone | Host                  | Properties |
+--------------------------------------+--------------------------+--------+------------+-------------+-------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+-----------------------+------------+
| 2fe86736-4073-427e-84da-074dc78705e6 | compute_0_large_instance | ERROR  | None       | Running     | public=10.0.0.216 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_2_vcpu  | ee083ec8-f154-44a6-a0e1-4b8d32f02561 | nova              | compute-0.localdomain |            |
| 7f741183-b831-4331-a1e2-fa5a630f92a9 | compute_0_small_instance | ACTIVE | None       | Running     | public=10.0.0.213 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_1_vcpu  | 52854d2b-613f-43e8-ab19-93b9b6c1abe0 | nova              | compute-2.localdomain |            |
| 10527e30-7fe4-4ed3-bc18-c0a3bdaa260e | compute_1_use_3_vcpu     | ACTIVE | None       | Running     | public=10.0.0.231 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_3_vcpu  | eb8d60d6-8787-436c-a0a5-eb2f5d1930ec | nova              | compute-1.localdomain |            |
| 2ff31a01-b233-4688-83e6-7ad1e5b8b330 | compute_2_use_2_vcpu     | ACTIVE | None       | Running     | public=10.0.0.218 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_2_vcpu  | ee083ec8-f154-44a6-a0e1-4b8d32f02561 | nova              | compute-2.localdomain |            |
+--------------------------------------+--------------------------+--------+------------+-------------+-------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+-----------------------+------------+

(overcloud) [stack@undercloud-0 ~]$ openstack hypervisor show 1 | grep vcpus
Compute service of compute-0.localdomain is unavailable at this time. (HTTP 400) (Request-ID: req-be965b15-b043-429b-8cb5-b0e6ca109782)
(overcloud) [stack@undercloud-0 ~]$ openstack hypervisor show 2 | grep vcpus
| vcpus                | 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| vcpus_used           | 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
(overcloud) [stack@undercloud-0 ~]$ openstack hypervisor show 3 | grep vcpus
| vcpus                | 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| vcpus_used           | 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |

Comment 2 Stephen Finucane 2018-12-10 15:03:46 UTC
This is behaving as expected. Host evacuate makes no guarantees about the order that things are evacuated in. You could use the CPUWeigher with a negative value to cause "stacking" of the instances (so each instance will go on the host with the least amount of free CPUs) but this will affect all scheduling operations.


Note You need to log in before you can comment on or make changes to this bug.