Description of problem: When running host-evacuate with a few instances on a compute node, the instances were put onto the same compute node. This led to using more vCPU than available.
Version-Release number of selected component (if applicable):
()[root@undercloud-0 /]# yum list installed | grep nova
openstack-nova-common.noarch 1:18.0.1-0.20180922200406.03188f7.el7ost
openstack-nova-compute.noarch 1:18.0.1-0.20180922200406.03188f7.el7ost
puppet-nova.noarch 13.3.1-0.20180917153244.6fdb591.el7ost
python-nova.noarch 1:18.0.1-0.20180922200406.03188f7.el7ost
python-novajoin.noarch 1.0.19-0.20180828184454.3d58511.el7ost
python2-novaclient.noarch 1:11.0.0-0.20180809174649.f1005ce.el7ost
How reproducible:
Tried twice and had it happen both times
Steps to Reproduce:
1. Configure the system to have the following setup (4 Compute nodes):
Compute Node 0: 4 instances with 1 vCPU each
Compute Node 1: 1 vCPU available
Compute Node 2: 1 vCPU available
Compute Node 3: 2 vCPU available
2. Force down compute node 0
3. Run "nova host-evacuate compute-0.localdomain"
Actual results:
All 4 instances were put onto Compute Node 3, leading to 6 vCPUs used rather than the 4 that were available (2 were used before the evacuate, so those 2 plus the 4 used by instances)
Expected results:
This layout should have occurred:
Compute Node 0: No instances
Compute Node 1: 1 instance
Compute Node 2: 1 instance
Compute Node 3: 2 instances
Additional info:
I tried running this test using disks instead of vCPU with host-evacuate, and that behaved as expected (the disk weighter put the instances on difference compute nodes)
I wasn't able to reproduce this but I think this is actually correct behavior. This is the algorithm used to decide available CPU count.
vcpus_free = (host_state.vcpus_total * host_state.cpu_allocation_ratio - host_state.vcpus_used)
We don't appear to have a CPU allocation ratio set:
$ ssh heat-admin@compute-0
$ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf DEFAULT cpu_allocation_ratio
Parameter not found: cpu_allocation_ratio
If so we're going to use the default CPU allocation ratio of 16, meaning the above formula works out as:
Compute Node 1:
1 * 16 - 0 => 16
Compute Node 2:
1 * 16 - 0 => 16
Compute Node 3:
2 * 16 - 2 => 30 (you said there were two instances already here)
As such, the four instances _should_ have gone here as 30 - 4 > 16. If you want the behavior you're describing, you need to configure '[DEFAULT] cpu_allocation_ratio = 1' or similar.