Description of problem: When running host-evacuate with a few instances on a compute node, the instances were put onto the same compute node. This led to using more vCPU than available. Version-Release number of selected component (if applicable): ()[root@undercloud-0 /]# yum list installed | grep nova openstack-nova-common.noarch 1:18.0.1-0.20180922200406.03188f7.el7ost openstack-nova-compute.noarch 1:18.0.1-0.20180922200406.03188f7.el7ost puppet-nova.noarch 13.3.1-0.20180917153244.6fdb591.el7ost python-nova.noarch 1:18.0.1-0.20180922200406.03188f7.el7ost python-novajoin.noarch 1.0.19-0.20180828184454.3d58511.el7ost python2-novaclient.noarch 1:11.0.0-0.20180809174649.f1005ce.el7ost How reproducible: Tried twice and had it happen both times Steps to Reproduce: 1. Configure the system to have the following setup (4 Compute nodes): Compute Node 0: 4 instances with 1 vCPU each Compute Node 1: 1 vCPU available Compute Node 2: 1 vCPU available Compute Node 3: 2 vCPU available 2. Force down compute node 0 3. Run "nova host-evacuate compute-0.localdomain" Actual results: All 4 instances were put onto Compute Node 3, leading to 6 vCPUs used rather than the 4 that were available (2 were used before the evacuate, so those 2 plus the 4 used by instances) Expected results: This layout should have occurred: Compute Node 0: No instances Compute Node 1: 1 instance Compute Node 2: 1 instance Compute Node 3: 2 instances Additional info: I tried running this test using disks instead of vCPU with host-evacuate, and that behaved as expected (the disk weighter put the instances on difference compute nodes)
I wasn't able to reproduce this but I think this is actually correct behavior. This is the algorithm used to decide available CPU count. vcpus_free = (host_state.vcpus_total * host_state.cpu_allocation_ratio - host_state.vcpus_used) We don't appear to have a CPU allocation ratio set: $ ssh heat-admin@compute-0 $ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf DEFAULT cpu_allocation_ratio Parameter not found: cpu_allocation_ratio If so we're going to use the default CPU allocation ratio of 16, meaning the above formula works out as: Compute Node 1: 1 * 16 - 0 => 16 Compute Node 2: 1 * 16 - 0 => 16 Compute Node 3: 2 * 16 - 2 => 30 (you said there were two instances already here) As such, the four instances _should_ have gone here as 30 - 4 > 16. If you want the behavior you're describing, you need to configure '[DEFAULT] cpu_allocation_ratio = 1' or similar.