Bug 1638095 - CPU Weighter Filter overcommits vCPU with Host-Evacuate
Summary: CPU Weighter Filter overcommits vCPU with Host-Evacuate
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks: 1469073
TreeView+ depends on / blocked
 
Reported: 2018-10-10 16:26 UTC by awaugama
Modified: 2023-03-21 19:01 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-11 15:38:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description awaugama 2018-10-10 16:26:15 UTC
Description of problem: When running host-evacuate with a few instances on a compute node, the instances were put onto the same compute node.  This led to using more vCPU than available.


Version-Release number of selected component (if applicable):

()[root@undercloud-0 /]# yum list installed | grep nova
openstack-nova-common.noarch         1:18.0.1-0.20180922200406.03188f7.el7ost
openstack-nova-compute.noarch        1:18.0.1-0.20180922200406.03188f7.el7ost
puppet-nova.noarch                   13.3.1-0.20180917153244.6fdb591.el7ost
python-nova.noarch                   1:18.0.1-0.20180922200406.03188f7.el7ost
python-novajoin.noarch               1.0.19-0.20180828184454.3d58511.el7ost
python2-novaclient.noarch            1:11.0.0-0.20180809174649.f1005ce.el7ost

How reproducible:

Tried twice and had it happen both times

Steps to Reproduce:
1. Configure the system to have the following setup (4 Compute nodes):

Compute Node 0: 4 instances with 1 vCPU each
Compute Node 1: 1 vCPU available
Compute Node 2: 1 vCPU available
Compute Node 3: 2 vCPU available

2. Force down compute node 0

3. Run "nova host-evacuate compute-0.localdomain"

Actual results:

All 4 instances were put onto Compute Node 3, leading to 6 vCPUs used rather than the 4 that were available (2 were used before the evacuate, so those 2 plus the 4 used by instances)


Expected results:

This layout should have occurred:

Compute Node 0: No instances
Compute Node 1: 1 instance 
Compute Node 2: 1 instance
Compute Node 3: 2 instances

Additional info:

I tried running this test using disks instead of vCPU with host-evacuate, and that behaved as expected (the disk weighter put the instances on difference compute nodes)

Comment 1 Stephen Finucane 2018-10-11 15:38:27 UTC
I wasn't able to reproduce this but I think this is actually correct behavior. This is the algorithm used to decide available CPU count.

  vcpus_free = (host_state.vcpus_total * host_state.cpu_allocation_ratio - host_state.vcpus_used)

We don't appear to have a CPU allocation ratio set:

  $ ssh heat-admin@compute-0
  $ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf DEFAULT cpu_allocation_ratio
  Parameter not found: cpu_allocation_ratio

If so we're going to use the default CPU allocation ratio of 16, meaning the above formula works out as:

  Compute Node 1:
    1 * 16 - 0 => 16

  Compute Node 2:
    1 * 16 - 0 => 16

  Compute Node 3:
    2 * 16 - 2 => 30  (you said there were two instances already here)

As such, the four instances _should_ have gone here as 30 - 4 > 16. If you want the behavior you're describing, you need to configure '[DEFAULT] cpu_allocation_ratio = 1' or similar.


Note You need to log in before you can comment on or make changes to this bug.