Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1638095

Summary: CPU Weighter Filter overcommits vCPU with Host-Evacuate
Product: Red Hat OpenStack Reporter: awaugama
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED NOTABUG QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 14.0 (Rocky)CC: berrange, dasmith, eglynn, jhakimra, kchamart, sbauza, sferdjao, sgordon, stephenfin, vromanso
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-11 15:38:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1469073    

Description awaugama 2018-10-10 16:26:15 UTC
Description of problem: When running host-evacuate with a few instances on a compute node, the instances were put onto the same compute node.  This led to using more vCPU than available.


Version-Release number of selected component (if applicable):

()[root@undercloud-0 /]# yum list installed | grep nova
openstack-nova-common.noarch         1:18.0.1-0.20180922200406.03188f7.el7ost
openstack-nova-compute.noarch        1:18.0.1-0.20180922200406.03188f7.el7ost
puppet-nova.noarch                   13.3.1-0.20180917153244.6fdb591.el7ost
python-nova.noarch                   1:18.0.1-0.20180922200406.03188f7.el7ost
python-novajoin.noarch               1.0.19-0.20180828184454.3d58511.el7ost
python2-novaclient.noarch            1:11.0.0-0.20180809174649.f1005ce.el7ost

How reproducible:

Tried twice and had it happen both times

Steps to Reproduce:
1. Configure the system to have the following setup (4 Compute nodes):

Compute Node 0: 4 instances with 1 vCPU each
Compute Node 1: 1 vCPU available
Compute Node 2: 1 vCPU available
Compute Node 3: 2 vCPU available

2. Force down compute node 0

3. Run "nova host-evacuate compute-0.localdomain"

Actual results:

All 4 instances were put onto Compute Node 3, leading to 6 vCPUs used rather than the 4 that were available (2 were used before the evacuate, so those 2 plus the 4 used by instances)


Expected results:

This layout should have occurred:

Compute Node 0: No instances
Compute Node 1: 1 instance 
Compute Node 2: 1 instance
Compute Node 3: 2 instances

Additional info:

I tried running this test using disks instead of vCPU with host-evacuate, and that behaved as expected (the disk weighter put the instances on difference compute nodes)

Comment 1 Stephen Finucane 2018-10-11 15:38:27 UTC
I wasn't able to reproduce this but I think this is actually correct behavior. This is the algorithm used to decide available CPU count.

  vcpus_free = (host_state.vcpus_total * host_state.cpu_allocation_ratio - host_state.vcpus_used)

We don't appear to have a CPU allocation ratio set:

  $ ssh heat-admin@compute-0
  $ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf DEFAULT cpu_allocation_ratio
  Parameter not found: cpu_allocation_ratio

If so we're going to use the default CPU allocation ratio of 16, meaning the above formula works out as:

  Compute Node 1:
    1 * 16 - 0 => 16

  Compute Node 2:
    1 * 16 - 0 => 16

  Compute Node 3:
    2 * 16 - 2 => 30  (you said there were two instances already here)

As such, the four instances _should_ have gone here as 30 - 4 > 16. If you want the behavior you're describing, you need to configure '[DEFAULT] cpu_allocation_ratio = 1' or similar.