Description of problem: When spawning an Openstack instance, this error is received: 2019-03-07 08:07:38.499 3124 WARNING nova.scheduler.utils [req-e577cf31-7a58-420f-8ba5-3f962569ab08 0c90c8d8b42c42e883d2135cc733cac4 8b869a98a43e4fc48001e0ff6d149fe6 - - -] Failed to compute_task_build_instances: local variable 'sibling_set' referenced before assignment Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming res = self.dispatcher.dispatch(message) File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch return self._do_dispatch(endpoint, method, ctxt, args) File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch result = func(ctxt, **new_args) File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 199, in inner return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 104, in select_destinations dests = self.driver.select_destinations(ctxt, spec_obj) File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 53, in select_destinations selected_hosts = self._schedule(context, spec_obj) File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 113, in _schedule spec_obj, index=num) File "/usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py", line 576, in get_filtered_hosts hosts, spec_obj, index) File "/usr/lib/python2.7/site-packages/nova/filters.py", line 89, in get_filtered_objects list_objs = list(objs) File "/usr/lib/python2.7/site-packages/nova/filters.py", line 44, in filter_all if self._filter_one(obj, spec_obj): File "/usr/lib/python2.7/site-packages/nova/scheduler/filters/__init__.py", line 44, in _filter_one return self.host_passes(obj, spec) File "/usr/lib/python2.7/site-packages/nova/scheduler/filters/numa_topology_filter.py", line 123, in host_passes pci_stats=host_state.pci_stats)) File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 1297, in numa_fit_instance_to_host host_cell, instance_cell, limits) File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 906, in _numa_fit_instance_cell host_cell, instance_cell) File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 854, in _numa_fit_instance_cell_with_pinning max(map(len, host_cell.siblings))) File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 805, in _pack_instance_onto_cores itertools.chain(*sibling_set))) UnboundLocalError: local variable 'sibling_set' referenced before assignment 2019-03-07 08:07:38.500 3124 WARNING nova.scheduler.utils [req-e577cf31-7a58-420f-8ba5-3f962569ab08 0c90c8d8b42c42e883d2135cc733cac4 8b869a98a43e4fc48001e0ff6d149fe6 - - -] [instance: 5bca186a-5a36-4b0f-8b7a-f2f3bc168b29] Setting instance to ERROR state. The error is received when this flavor is used for the VM: +----------------------------+----------------------------------------------------------------------------------------------------------------------+ | Field | Value | +----------------------------+----------------------------------------------------------------------------------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | disk | 40 | | id | 95ccf45e-7d1c-4c80-8207-90d56fdc83ac | | name | vmme-dpdk-esc-prod-ha-mgmt | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:vmme_vm_type='vmme-prod-ha-mgmt', hw:cpu_policy='dedicated', hw:mem_page_size='large' | | ram | 4096 | | rxtx_factor | 1.0 | | swap | | | vcpus | 2 | +----------------------------+----------------------------------------------------------------------------------------------------------------------+ Actual results: It is not possible to create the instance Expected results: It is possible to create the instance
I've pushed a patch for this upstream and will update once I receive feedback there. For what it's worth, the issue is because of he use of 'vcpu_pin_set', which is breaking the pinning algorithm. This has been resolved in OSP 13+.
From later discussions, it appears this hotfix has been provided to the customer. The patch included in that hotfix will resolve the exceptions the customer was seeing by catching corner cases that were previously not handled. I've included the commit message verbatim below as I think this does a good job of explaining the issue and resolution. Please let me know if this is not the case. Due to how we do claiming of pinned CPUs and related NUMA "things", it's possible for claims to race. This raciness is usually not an issue since pinning with fail for the losing instance, which will just get rescheduled. This does mean that it's possible for an instance to land on a host with no CPUs at all though and this edge case is triggering a nasty bug made possible by Python's unusual scoping rules around for loops. >>> x = 5 >>> for y in range(x): ... pass ... >>> print(y) 4 'y' would be considered out of scope in the above for most other languages (JS and its even dumber scoping rules aside, I guess) and it leaves us with situations where the variable might never exist, i.e. the bug at hand: >>> x = 0 >>> for y in range(x): ... pass ... >>> print(y) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'y' is not defined Resolve this by adding a check to handle the "no CPUs at all" case and quick fail but also remove the reliance on this quirk of Python.
Due to the racey nature of the bug, we haven't been able to reproduce the issue locally in our test beds. However, based on customer feedback for the hotfix, as well as our functional testing, I'm marking this BZ as VERIFIED. If the error persists, feel free to re-open the BZ.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1715