This bug was initially created as a copy of Bug #1774243 I am copying this bug because: When faced with a deployment recovery involving the need to delete placement resource providers and recover instance allocations in placement, having access to the 'nova-manage heal_allocations' CLI would be helpful. The CLI was introduced in the Rocky release (OSP14) and we can backport it to OSP13. Description of problem: We have a failed Compute node. When we try to migrate VM's from it, we receive the following traceback: {u'message': u'Failed to create resource provider compute-0', u'code': 500, u'details': u'Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 202, in decorated_function return function(self, context, *args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2920, in rebuild_instance migration=migration) File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 246, in rebuild_claim limits=limits, image_meta=image_meta) File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 355, in _move_claim self._update(elevated, cn) File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 904, in _update inv_data, File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 68, in set_inventory_for_provider parent_provider_uuid=parent_provider_uuid, File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method return getattr(self.instance, __name)(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 1104, in set_inventory_for_provider parent_provider_uuid=parent_provider_uuid) File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 673, in _ensure_resource_provider name=name or uuid) ResourceProviderCreationFailed: Failed to create resource provider compute-0 ', u'created': u'2019-11-12T22:51:43Z'} Version-Release number of selected component (if applicable): RHOSP13 z7. (I'll get the exact nova version and provide it as a comment) How reproducible: Every time we try to migrate in this environment Steps to Reproduce: 1. nova evacuate UUID (from the failed hypervisor) 2. 3. Actual results: {u'message': u'Failed to create resource provider compute-0', u'code': 500, u'details': u'Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 202, in decorated_function return function(self, context, *args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2920, in rebuild_instance migration=migration) File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 246, in rebuild_claim limits=limits, image_meta=image_meta) File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 355, in _move_claim self._update(elevated, cn) File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 904, in _update inv_data, File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 68, in set_inventory_for_provider parent_provider_uuid=parent_provider_uuid, File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method return getattr(self.instance, __name)(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 1104, in set_inventory_for_provider parent_provider_uuid=parent_provider_uuid) File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 673, in _ensure_resource_provider name=name or uuid) ResourceProviderCreationFailed: Failed to create resource provider compute-0 ', u'created': u'2019-11-12T22:51:43Z'} Expected results: It should be able to find the resource provider and migrate the VM. Additional info: We fail here: https://opendev.org/openstack/nova/src/branch/stable/queens/nova/scheduler/client/report.py#L659-L673 So I assume it fails to get / refresh the existing resource provider and moves into the if not statement: rps_to_refresh = self._get_providers_in_tree(context, uuid) if not rps_to_refresh:
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0759