1638095 – CPU Weighter Filter overcommits vCPU with Host-Evacuate

Bug 1638095 - CPU Weighter Filter overcommits vCPU with Host-Evacuate

Summary: CPU Weighter Filter overcommits vCPU with Host-Evacuate

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	14.0 (Rocky)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	OSP DFG:Compute
QA Contact:	OSP DFG:Compute
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1469073
TreeView+	depends on / blocked

Reported:	2018-10-10 16:26 UTC by awaugama
Modified:	2023-03-21 19:01 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-10-11 15:38:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description awaugama 2018-10-10 16:26:15 UTC

Description of problem: When running host-evacuate with a few instances on a compute node, the instances were put onto the same compute node.  This led to using more vCPU than available.


Version-Release number of selected component (if applicable):

()[root@undercloud-0 /]# yum list installed | grep nova
openstack-nova-common.noarch         1:18.0.1-0.20180922200406.03188f7.el7ost
openstack-nova-compute.noarch        1:18.0.1-0.20180922200406.03188f7.el7ost
puppet-nova.noarch                   13.3.1-0.20180917153244.6fdb591.el7ost
python-nova.noarch                   1:18.0.1-0.20180922200406.03188f7.el7ost
python-novajoin.noarch               1.0.19-0.20180828184454.3d58511.el7ost
python2-novaclient.noarch            1:11.0.0-0.20180809174649.f1005ce.el7ost

How reproducible:

Tried twice and had it happen both times

Steps to Reproduce:
1. Configure the system to have the following setup (4 Compute nodes):

Compute Node 0: 4 instances with 1 vCPU each
Compute Node 1: 1 vCPU available
Compute Node 2: 1 vCPU available
Compute Node 3: 2 vCPU available

2. Force down compute node 0

3. Run "nova host-evacuate compute-0.localdomain"

Actual results:

All 4 instances were put onto Compute Node 3, leading to 6 vCPUs used rather than the 4 that were available (2 were used before the evacuate, so those 2 plus the 4 used by instances)


Expected results:

This layout should have occurred:

Compute Node 0: No instances
Compute Node 1: 1 instance 
Compute Node 2: 1 instance
Compute Node 3: 2 instances

Additional info:

I tried running this test using disks instead of vCPU with host-evacuate, and that behaved as expected (the disk weighter put the instances on difference compute nodes)

Comment 1 Stephen Finucane 2018-10-11 15:38:27 UTC

I wasn't able to reproduce this but I think this is actually correct behavior. This is the algorithm used to decide available CPU count.

  vcpus_free = (host_state.vcpus_total * host_state.cpu_allocation_ratio - host_state.vcpus_used)

We don't appear to have a CPU allocation ratio set:

  $ ssh heat-admin@compute-0
  $ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf DEFAULT cpu_allocation_ratio
  Parameter not found: cpu_allocation_ratio

If so we're going to use the default CPU allocation ratio of 16, meaning the above formula works out as:

  Compute Node 1:
    1 * 16 - 0 => 16

  Compute Node 2:
    1 * 16 - 0 => 16

  Compute Node 3:
    2 * 16 - 2 => 30  (you said there were two instances already here)

As such, the four instances _should_ have gone here as 30 - 4 > 16. If you want the behavior you're describing, you need to configure '[DEFAULT] cpu_allocation_ratio = 1' or similar.

Note You need to log in before you can comment on or make changes to this bug.