Bug 1657391

Summary: Host Evacuate fails for CPU Weigher filter due to moving smallest instance first
Product: Red Hat OpenStack Reporter: awaugama
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED NOTABUG QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 14.0 (Rocky)CC: dasmith, eglynn, jhakimra, kchamart, sbauza, sgordon, stephenfin, vromanso
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-10 15:03:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1469073    

Description awaugama 2018-12-07 21:18:01 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 awaugama 2018-12-07 21:25:23 UTC
Sorry hit submit on accident before updating the bug.

When trying to run a host evacuate with 2 instances of different sizes, the larger instance fails to evacuate.  This seems to be because the smaller instance is evacuated first and takes up the "Slot" the larger instance could use.  

Layout before Host Evacuate

Compute-0: 2 instances, 1 with 2 vCPU, 1 with 1 vCPU
Compute-1: 1 vCPU free (4 vCPU total, 1 instance using 3 on the node)
Compute-2: 2 vCPU free (4 vCPU total, 1 instance using 2 on the node)

Expected after Host Evacuate:

Compute-0: 0 instances
Compute-1: The instance from Compute Node 0 using 1 vCPU, the instance using 3 that was there before
Compute-2: The instance from Compute Node 0 using 2 vCPU, the instance using 2 that was there before

Actual Results:

Compute-0: 2 vCPU instance failed to evacuate, listed as error
Compute-1: The instance that was using 3 vCPU only
Compute-2: The instance from Compute Node 0 using 1 vCPU, the instance using 2 that was there before.

Logs below:

()[root@compute-0 /]# yum info openstack-nova-common.noarch
Version     : 18.0.3
Release     : 0.20181011032838.d1243fe.el7ost

()[root@compute-0 /]# yum info openstack-nova-compute.noarch
Version     : 18.0.3
Release     : 0.20181011032838.d1243fe.el7ost

()[root@compute-0 /]# yum info openstack-nova-migration.noarch
Version     : 18.0.3
Release     : 0.20181011032838.d1243fe.el7ost

(overcloud) [stack@undercloud-0 ~]$ openstack server list --long
+--------------------------------------+--------------------------+--------+------------+-------------+-------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+-----------------------+------------+
| ID                                   | Name                     | Status | Task State | Power State | Networks          | Image Name | Image ID                             | Flavor Name | Flavor ID                            | Availability Zone | Host                  | Properties |
+--------------------------------------+--------------------------+--------+------------+-------------+-------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+-----------------------+------------+
| 2fe86736-4073-427e-84da-074dc78705e6 | compute_0_large_instance | ACTIVE | None       | Running     | public=10.0.0.216 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_2_vcpu  | ee083ec8-f154-44a6-a0e1-4b8d32f02561 | nova              | compute-0.localdomain |            |
| 7f741183-b831-4331-a1e2-fa5a630f92a9 | compute_0_small_instance | ACTIVE | None       | Running     | public=10.0.0.213 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_1_vcpu  | 52854d2b-613f-43e8-ab19-93b9b6c1abe0 | nova              | compute-0.localdomain |            |
| 10527e30-7fe4-4ed3-bc18-c0a3bdaa260e | compute_1_use_3_vcpu     | ACTIVE | None       | Running     | public=10.0.0.231 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_3_vcpu  | eb8d60d6-8787-436c-a0a5-eb2f5d1930ec | nova              | compute-1.localdomain |            |
| 2ff31a01-b233-4688-83e6-7ad1e5b8b330 | compute_2_use_2_vcpu     | ACTIVE | None       | Running     | public=10.0.0.218 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_2_vcpu  | ee083ec8-f154-44a6-a0e1-4b8d32f02561 | nova              | compute-2.localdomain |            |
+--------------------------------------+--------------------------+--------+------------+-------------+-------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+-----------------------+------------+

(overcloud) [stack@undercloud-0 ~]$ openstack hypervisor show 1 | grep vcpus
| vcpus                | 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| vcpus_used           | 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
(overcloud) [stack@undercloud-0 ~]$ openstack hypervisor show 2 | grep vcpus
| vcpus                | 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| vcpus_used           | 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
(overcloud) [stack@undercloud-0 ~]$ openstack hypervisor show 3 | grep vcpus
| vcpus                | 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| vcpus_used           | 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |

(overcloud) [stack@undercloud-0 ~]$ nova service-force-down 4e52c4fb-7bc3-4638-98d5-58a64e85be97
+--------------------------------------+-----------------------+--------------+-------------+
| ID                                   | Host                  | Binary       | Forced down |
+--------------------------------------+-----------------------+--------------+-------------+
| 4e52c4fb-7bc3-4638-98d5-58a64e85be97 | compute-0.localdomain | nova-compute | True        |
+--------------------------------------+-----------------------+--------------+-------------+

(overcloud) [stack@undercloud-0 ~]$ nova host-evacuate compute-0.localdomain
+--------------------------------------+-------------------+---------------+
| Server UUID                          | Evacuate Accepted | Error Message |
+--------------------------------------+-------------------+---------------+
| 7f741183-b831-4331-a1e2-fa5a630f92a9 | True              |               |
| 2fe86736-4073-427e-84da-074dc78705e6 | True              |               |
+--------------------------------------+-------------------+---------------+

(overcloud) [stack@undercloud-0 ~]$ openstack server list --long
+--------------------------------------+--------------------------+--------+------------+-------------+-------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+-----------------------+------------+
| ID                                   | Name                     | Status | Task State | Power State | Networks          | Image Name | Image ID                             | Flavor Name | Flavor ID                            | Availability Zone | Host                  | Properties |
+--------------------------------------+--------------------------+--------+------------+-------------+-------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+-----------------------+------------+
| 2fe86736-4073-427e-84da-074dc78705e6 | compute_0_large_instance | ERROR  | None       | Running     | public=10.0.0.216 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_2_vcpu  | ee083ec8-f154-44a6-a0e1-4b8d32f02561 | nova              | compute-0.localdomain |            |
| 7f741183-b831-4331-a1e2-fa5a630f92a9 | compute_0_small_instance | ACTIVE | None       | Running     | public=10.0.0.213 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_1_vcpu  | 52854d2b-613f-43e8-ab19-93b9b6c1abe0 | nova              | compute-2.localdomain |            |
| 10527e30-7fe4-4ed3-bc18-c0a3bdaa260e | compute_1_use_3_vcpu     | ACTIVE | None       | Running     | public=10.0.0.231 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_3_vcpu  | eb8d60d6-8787-436c-a0a5-eb2f5d1930ec | nova              | compute-1.localdomain |            |
| 2ff31a01-b233-4688-83e6-7ad1e5b8b330 | compute_2_use_2_vcpu     | ACTIVE | None       | Running     | public=10.0.0.218 | cirros     | 0a411db3-dd1a-41ea-b348-bf15077c749b | use_2_vcpu  | ee083ec8-f154-44a6-a0e1-4b8d32f02561 | nova              | compute-2.localdomain |            |
+--------------------------------------+--------------------------+--------+------------+-------------+-------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+-----------------------+------------+

(overcloud) [stack@undercloud-0 ~]$ openstack hypervisor show 1 | grep vcpus
Compute service of compute-0.localdomain is unavailable at this time. (HTTP 400) (Request-ID: req-be965b15-b043-429b-8cb5-b0e6ca109782)
(overcloud) [stack@undercloud-0 ~]$ openstack hypervisor show 2 | grep vcpus
| vcpus                | 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| vcpus_used           | 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
(overcloud) [stack@undercloud-0 ~]$ openstack hypervisor show 3 | grep vcpus
| vcpus                | 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| vcpus_used           | 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |

Comment 2 Stephen Finucane 2018-12-10 15:03:46 UTC
This is behaving as expected. Host evacuate makes no guarantees about the order that things are evacuated in. You could use the CPUWeigher with a negative value to cause "stacking" of the instances (so each instance will go on the host with the least amount of free CPUs) but this will affect all scheduling operations.