Cloned from launchpad bug 1438238. Description: The issue happens when multiple scheduling attempts that request CPU pinning are done in parallel. 015-03-25T14:18:00.222 controller-0 nova-scheduler err Exception during message handling: Cannot pin/unpin cpus [4] from the following pinned set [3, 4, 5, 6, 7, 8, 9] 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last): 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher incoming.message)) 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args) 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args) 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/oslo/messaging/rpc/server.py", line 139, in inner 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher return func(*args, **kwargs) 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "./usr/lib64/python2.7/site-packages/nova/scheduler/manager.py", line 86, in select_destinations 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "./usr/lib64/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 80, in select_destinations 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "./usr/lib64/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 241, in _schedule 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "./usr/lib64/python2.7/site-packages/nova/scheduler/host_manager.py", line 266, in consume_from_instance 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "./usr/lib64/python2.7/site-packages/nova/virt/hardware.py", line 1472, in get_host_numa_usage_from_instance 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "./usr/lib64/python2.7/site-packages/nova/virt/hardware.py", line 1344, in numa_usage_from_instances 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "./usr/lib64/python2.7/site-packages/nova/objects/numa.py", line 91, in pin_cpus 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher CPUPinningInvalid: Cannot pin/unpin cpus [4] from the following pinned set [3, 4, 5, 6, 7, 8, 9] 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher What is likely happening is: * nova-scheduler is handling several RPC calls to select_destinations at the same time, in multiple greenthreads * greenthread 1 runs the NUMATopologyFilter and selects a cpu on a particular compute node, updating host_state.instance_numa_topology * greenthread 1 then blocks for some reason * greenthread 2 runs the NUMATopologyFilter and selects the same cpu on the same compute node, updating host_state.instance_numa_topology. This also seems like an issue if a different cpu was selected, as it would be overwriting the instance_numa_topology selected by greenthread 1. * greenthread 2 then blocks for some reason * greenthread 1 gets scheduled and calls consume_from_instance, which consumes the numa resources based on what is in host_state.instance_numa_topology * greenthread 1 completes the scheduling operation * greenthread 2 gets scheduled and calls consume_from_instance, which consumes the numa resources based on what is in host_state.instance_numa_topology - since the resources were already consumed by greenthread 1, we get the exception above Specification URL (additional information): https://bugs.launchpad.net/nova/+bug/1438238
code is in openstack-nova-api-2014.2.3-65.el7ost.noarch automation passed https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHOS/view/RHOS6/job/rhos-jenkins-rhos-6.0-puddle-rhel-7.2-multi-node-packstack-neutron-ml2-vxlan-rabbitmq-tempest-git-all/17/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0500.html