Bug 1686511
Summary: | Failed to compute_task_build_instances: local variable 'sibling_set' referenced before assignment | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | ojanas |
Component: | openstack-nova | Assignee: | Stephen Finucane <stephenfin> |
Status: | CLOSED ERRATA | QA Contact: | Joe H. Rahme <jhakimra> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 10.0 (Newton) | CC: | astupnik, dasmith, eglynn, jhakimra, kchamart, lyarwood, mbooth, pamadio, sbauza, sgordon, stephenfin, vromanso |
Target Milestone: | z12 | Keywords: | Triaged, ZStream |
Target Release: | 10.0 (Newton) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-nova-14.1.0-46.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-07-10 09:19:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
ojanas
2019-03-07 15:31:33 UTC
I've pushed a patch for this upstream and will update once I receive feedback there. For what it's worth, the issue is because of he use of 'vcpu_pin_set', which is breaking the pinning algorithm. This has been resolved in OSP 13+. From later discussions, it appears this hotfix has been provided to the customer. The patch included in that hotfix will resolve the exceptions the customer was seeing by catching corner cases that were previously not handled. I've included the commit message verbatim below as I think this does a good job of explaining the issue and resolution. Please let me know if this is not the case. Due to how we do claiming of pinned CPUs and related NUMA "things", it's possible for claims to race. This raciness is usually not an issue since pinning with fail for the losing instance, which will just get rescheduled. This does mean that it's possible for an instance to land on a host with no CPUs at all though and this edge case is triggering a nasty bug made possible by Python's unusual scoping rules around for loops. >>> x = 5 >>> for y in range(x): ... pass ... >>> print(y) 4 'y' would be considered out of scope in the above for most other languages (JS and its even dumber scoping rules aside, I guess) and it leaves us with situations where the variable might never exist, i.e. the bug at hand: >>> x = 0 >>> for y in range(x): ... pass ... >>> print(y) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'y' is not defined Resolve this by adding a check to handle the "no CPUs at all" case and quick fail but also remove the reliance on this quirk of Python. Due to the racey nature of the bug, we haven't been able to reproduce the issue locally in our test beds. However, based on customer feedback for the hotfix, as well as our functional testing, I'm marking this BZ as VERIFIED. If the error persists, feel free to re-open the BZ. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1715 |