1686511 – Failed to compute_task_build_instances: local variable 'sibling_set' referenced before assignment

Bug 1686511 - Failed to compute_task_build_instances: local variable 'sibling_set' referenced before assignment

Summary: Failed to compute_task_build_instances: local variable 'sibling_set' referenc...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	10.0 (Newton)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z12
Target Release:	10.0 (Newton)
Assignee:	Stephen Finucane
QA Contact:	Joe H. Rahme
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-03-07 15:31 UTC by ojanas
Modified:	2020-12-21 19:34 UTC (History)
CC List:	12 users (show)
Fixed In Version:	openstack-nova-14.1.0-46.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-07-10 09:19:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Launchpad	1821733	0	None	None	None	2019-03-26 17:23:52 UTC
Red Hat Product Errata	RHBA-2019:1715	0	None	None	None	2019-07-10 09:19:16 UTC

Description ojanas 2019-03-07 15:31:33 UTC

Description of problem:

When spawning an Openstack instance, this error is received:

2019-03-07 08:07:38.499 3124 WARNING nova.scheduler.utils [req-e577cf31-7a58-420f-8ba5-3f962569ab08 0c90c8d8b42c42e883d2135cc733cac4 8b869a98a43e4fc48001e0ff6d149fe6 - - -] Failed to compute_task_build_instances: local variable 'sibling_set' referenced before assignment
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming
    res = self.dispatcher.dispatch(message)

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch
    return self._do_dispatch(endpoint, method, ctxt, args)

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch
    result = func(ctxt, **new_args)

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 199, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 104, in select_destinations
    dests = self.driver.select_destinations(ctxt, spec_obj)

  File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 53, in select_destinations
    selected_hosts = self._schedule(context, spec_obj)

  File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 113, in _schedule
    spec_obj, index=num)

  File "/usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py", line 576, in get_filtered_hosts
    hosts, spec_obj, index)

  File "/usr/lib/python2.7/site-packages/nova/filters.py", line 89, in get_filtered_objects
    list_objs = list(objs)

  File "/usr/lib/python2.7/site-packages/nova/filters.py", line 44, in filter_all
    if self._filter_one(obj, spec_obj):

  File "/usr/lib/python2.7/site-packages/nova/scheduler/filters/__init__.py", line 44, in _filter_one
    return self.host_passes(obj, spec)

  File "/usr/lib/python2.7/site-packages/nova/scheduler/filters/numa_topology_filter.py", line 123, in host_passes
    pci_stats=host_state.pci_stats))

  File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 1297, in numa_fit_instance_to_host
    host_cell, instance_cell, limits)

  File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 906, in _numa_fit_instance_cell
    host_cell, instance_cell)

  File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 854, in _numa_fit_instance_cell_with_pinning
    max(map(len, host_cell.siblings)))

  File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 805, in _pack_instance_onto_cores
    itertools.chain(*sibling_set)))

UnboundLocalError: local variable 'sibling_set' referenced before assignment

2019-03-07 08:07:38.500 3124 WARNING nova.scheduler.utils [req-e577cf31-7a58-420f-8ba5-3f962569ab08 0c90c8d8b42c42e883d2135cc733cac4 8b869a98a43e4fc48001e0ff6d149fe6 - - -] [instance: 5bca186a-5a36-4b0f-8b7a-f2f3bc168b29] Setting instance to ERROR state.


The error is received when this flavor is used for the VM:

+----------------------------+----------------------------------------------------------------------------------------------------------------------+
| Field                      | Value                                                                                                                |
+----------------------------+----------------------------------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                                                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                                    |
| access_project_ids         | None                                                                                                                 |
| disk                       | 40                                                                                                                   |
| id                         | 95ccf45e-7d1c-4c80-8207-90d56fdc83ac                                                                                 |
| name                       | vmme-dpdk-esc-prod-ha-mgmt                                                                                           |
| os-flavor-access:is_public | True                                                                                                                 |
| properties                 | aggregate_instance_extra_specs:vmme_vm_type='vmme-prod-ha-mgmt', hw:cpu_policy='dedicated', hw:mem_page_size='large' |
| ram                        | 4096                                                                                                                 |
| rxtx_factor                | 1.0                                                                                                                  |
| swap                       |                                                                                                                      |
| vcpus                      | 2                                                                                                                    |
+----------------------------+----------------------------------------------------------------------------------------------------------------------+

Actual results:

It is not possible to create the instance

Expected results:

It is possible to create the instance

Comment 8 Stephen Finucane 2019-03-26 17:30:45 UTC

I've pushed a patch for this upstream and will update once I receive feedback there. For what it's worth, the issue is because of he use of 'vcpu_pin_set', which is breaking the pinning algorithm. This has been resolved in OSP 13+.

Comment 19 Stephen Finucane 2019-04-23 09:20:44 UTC

From later discussions, it appears this hotfix has been provided to the customer. The patch included in that hotfix will resolve the exceptions the customer was seeing by catching corner cases that were previously not handled. I've included the commit message verbatim below as I think this does a good job of explaining the issue and resolution. Please let me know if this is not the case.

    Due to how we do claiming of pinned CPUs and related NUMA "things", it's possible for claims to race. This raciness is usually not an issue since pinning with fail for the losing instance, which will just get rescheduled. This does mean that it's possible for an instance to land on a host with no CPUs at all though and this edge case is triggering a nasty bug made possible by Python's unusual scoping rules around for loops.
    
        >>> x = 5
        >>> for y in range(x):
        ...     pass
        ...
        >>> print(y)
        4
    
    'y' would be considered out of scope in the above for most other languages (JS and its even dumber scoping rules aside, I guess) and it leaves us with situations where the variable might never exist, i.e. the bug at hand:
    
        >>> x = 0
        >>> for y in range(x):
        ...     pass
        ...
        >>> print(y)
        Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        NameError: name 'y' is not defined
    
    Resolve this by adding a check to handle the "no CPUs at all" case and quick fail but also remove the reliance on this quirk of Python.

Comment 29 Joe H. Rahme 2019-06-19 07:43:46 UTC

Due to the racey nature of the bug, we haven't been able to reproduce the issue locally in our test beds. However, based on customer feedback for the hotfix, as well as our functional testing, I'm marking this BZ as VERIFIED.

If the error persists, feel free to re-open the BZ.

Comment 31 errata-xmlrpc 2019-07-10 09:19:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1715

Note You need to log in before you can comment on or make changes to this bug.