1889723 – [FFU OSP13 TO 16.1] Not able to spawn instances "Failed to synchronize the placement service with resource provider information supplied by the compute host"

Bug 1889723 - [FFU OSP13 TO 16.1] Not able to spawn instances "Failed to synchronize the placement service with resource provider information supplied by the compute host"

Summary: [FFU OSP13 TO 16.1] Not able to spawn instances "Failed to synchronize the pl...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	16.1 (Train)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	z4
Target Release:	16.1 (Train on RHEL 8.2)
Assignee:	melanie witt
QA Contact:	OSP DFG:Compute
Docs Contact:
URL:
Whiteboard:
Depends On:	1904281
Blocks:
TreeView+	depends on / blocked

Reported:	2020-10-20 13:07 UTC by Ravi Singh
Modified:	2023-03-21 19:36 UTC (History)
CC List:	8 users (show)
Fixed In Version:	openstack-nova-20.4.1-1.20201114041747.el8ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1904281 (view as bug list)
Environment:
Last Closed:	2021-03-17 15:33:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1901120	None	None	None	2020-10-23 04:12:21 UTC
OpenStack gerrit	759348	None	MERGED	Omit resource inventories from placement update if zero	2021-02-11 21:44:03 UTC
Red Hat Product Errata	RHBA-2021:0817	None	None	None	2021-03-17 15:39:05 UTC

Description Ravi Singh 2020-10-20 13:07:41 UTC

Description of problem:

Post FFU[OSP13 to 16.1] neither able to spawn new instances nor able to restart old instances.

 1. For new instance getting this error in the nova-conductor log which is in sync with something observed in nova-compute too[1]

 2. Galera & rabbit are in good condition.

~~~
Full List of Resources:
  * Container bundle set: galera-bundle [cluster.common.tag/openstack-mariadb:pcmklatest]:
    * galera-bundle-0	(ocf::heartbeat:galera):	Master overcloud-controller-0
    * galera-bundle-1	(ocf::heartbeat:galera):	Master overcloud-controller-1
    * galera-bundle-2	(ocf::heartbeat:galera):	Master overcloud-controller-2
  * Container bundle set: rabbitmq-bundle [cluster.common.tag/openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	Started overcloud-controller-0
    * rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	Stopped overcloud-controller-1
    * rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	Started overcloud-controller-2
~~~

 3. All nova containers are in healthy condition.
 4. nova-compute service is UP. 
 5. MySQL related errors observed but didn't re-appear post containers restart/


~~~
2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines     result.read()
2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines   File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 1349, in read
2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines     first_packet = self.connection._read_packet()
2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines   File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 991, in _read_packet
2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines     packet_header = self._read_bytes(4)
2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines   File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 1037, in _read_bytes
2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines     CR.CR_SERVER_LOST, "Lost connection to MySQL server during query")
2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')
2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines [SQL: SELECT 1]
2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines (Background on this error at: http://sqlalche.me/e/e3q8)
2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines 
~~~
[1]
~~~
/var/log/containers/nova/nova-conductor.log:2020-10-20 12:26:33.412 25 ERROR nova.scheduler.utils [req-881321df-b86e-4d8a-a3ed-263a4ce7590a 4c77c40906394a83b43160ad2d9ca766 93d38a65596945f3ae874746a050d8a5 - default default] [instance: 43d28fd4-3205-485d-a5be-5e082c30f579] Error from last host: overcloud-compute-1.localdomain (node overcloud-compute-1.localdomain): ['Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line 1329, in catch_all\n    yield\n', '  File "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line 1418, in update_from_provider_tree\n    context, pd.uuid, pd.inventory)\n', '  File "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line 950, in set_inventory_for_provider\n    raise exception.ResourceProviderUpdateFailed(url=url, error=resp.text)\n', 'nova.exception.ResourceProviderUpdateFailed: Failed to update resource provider via URL /resource_providers/5812a7ab-f8e7-4c57-aba8-f283fedfd04b/inventories: {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server could not comply with the request since it is either malformed or otherwise incorrect.\\n\\n JSON does not validate: 0 is less than the minimum of 1  Failed validating \'minimum\' in schema[\'properties\'][\'inventories\'][\'patternProperties\'][\'^[A-Z0-9_]+$\'][\'properties\'][\'total\']:     {\'maximum\': 2147483647, \'minimum\': 1, \'type\': \'integer\'}  On instance[\'inventories\'][\'DISK_GB\'][\'total\']:     0  ", "code": "placement.undefined_code", "request_id": "req-fba91085-f470-4c85-9183-29b1d7cae026"}]}\n', '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2407, in _build_and_run_instance\n    limits):\n', '  File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner\n    return f(*args, **kwargs)\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line 232, in instance_claim\n    self._update(elevated, cn)\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line 1237, in _update\n    self._update_to_placement(context, compute_node, startup)\n', '  File "/usr/lib/python3.6/site-packages/retrying.py", line 68, in wrapped_f\n    return Retrying(*dargs, **dkw).call(f, *args, **kw)\n', '  File "/usr/lib/python3.6/site-packages/retrying.py", line 223, in call\n    return attempt.get(self._wrap_exception)\n', '  File "/usr/lib/python3.6/site-packages/retrying.py", line 261, in get\n    six.reraise(self.value[0], self.value[1], self.value[2])\n', '  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise\n    raise value\n', '  File "/usr/lib/python3.6/site-packages/retrying.py", line 217, in call\n    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line 1215, in _update_to_placement\n    allocations=allocs)\n', '  File "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line 1421, in update_from_provider_tree\n    self.set_traits_for_provider(context, pd.uuid, pd.traits)\n', '  File "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__\n    self.gen.throw(type, value, traceback)\n', '  File "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line 1341, in catch_all\n    raise exception.ResourceProviderSyncFailed()\n', 'nova.exception.ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.\n', '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2161, in _do_build_and_run_instance\n    filter_properties, request_spec)\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2537, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', 'nova.exception.RescheduledException: Build of instance 43d28fd4-3205-485d-a5be-5e082c30f579 was re-scheduled: Failed to synchronize the placement service with resource provider information supplied by the compute host.\n']

nova-compute


2020-10-20 11:32:45.484 7 ERROR nova.compute.manager nova.exception.ResourceProviderUpdateFailed: Failed to update resource provider via URL /resource_providers/5812a7ab-f8e7-4c57-aba8-f283fedfd04b/inventories: {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server could not comply with the request since it is either malformed or otherwise incorrect.\n\n JSON does not validate: 0 is less than the minimum of 1  Failed validating 'minimum' in schema['properties']['inventories']['patternProperties']['^[A-Z0-9_]+$']['properties']['total']:     {'maximum': 2147483647, 'minimum': 1, 'type': 'integer'}  On instance['inventories']['DISK_GB']['total']:     0  ", "code": "placement.undefined_code", "request_id": "req-7fc88053-8e43-4c18-9810-73ad05c3a993"}]}
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager 
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager During handling of the above exception, another exception occurred:
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager 
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager Traceback (most recent call last):
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 8740, in _update_available_resource_for_node
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     startup=startup)
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line 887, in update_available_resource
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     self._update_available_resource(context, resources, startup=startup)
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     return f(*args, **kwargs)
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line 972, in _update_available_resource
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     self._update(context, cn, startup=startup)
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line 1237, in _update
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     self._update_to_placement(context, compute_node, startup)
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/retrying.py", line 68, in wrapped_f
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     return Retrying(*dargs, **dkw).call(f, *args, **kw)
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/retrying.py", line 223, in call
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     return attempt.get(self._wrap_exception)
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/retrying.py", line 261, in get
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     six.reraise(self.value[0], self.value[1], self.value[2])
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     raise value
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/retrying.py", line 217, in call
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line 1215, in _update_to_placement
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     allocations=allocs)
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line 1421, in update_from_provider_tree
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     self.set_traits_for_provider(context, pd.uuid, pd.traits)
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     self.gen.throw(type, value, traceback)
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line 1341, in catch_all
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     raise exception.ResourceProviderSyncFailed()
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager nova.exception.ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.
2020-10-20 11:32:45.484 7 ERROR nova.compute.manager  
~~~

Version-Release number of selected component (if applicable):

OSP16.1
How reproducible:

100%
Steps to Reproduce:
1.
2.
3.

Actual results:
Not able to spawn or restart old instances.

Expected results:
Should be able to restart.

Additional info:

Comment 1 Ravi Singh 2020-10-20 13:09:19 UTC

Let me know if any logs are required .

Comment 2 melanie witt 2020-10-23 03:44:50 UTC

(In reply to Ravi Singh from comment #0)
> Description of problem:
> 
> Post FFU[OSP13 to 16.1] neither able to spawn new instances nor able to
> restart old instances.
> 
>  1. For new instance getting this error in the nova-conductor log which is
> in sync with something observed in nova-compute too[1]
> 
>  2. Galera & rabbit are in good condition.
> 
> ~~~
> Full List of Resources:
>   * Container bundle set: galera-bundle
> [cluster.common.tag/openstack-mariadb:pcmklatest]:
>     * galera-bundle-0	(ocf::heartbeat:galera):	Master overcloud-controller-0
>     * galera-bundle-1	(ocf::heartbeat:galera):	Master overcloud-controller-1
>     * galera-bundle-2	(ocf::heartbeat:galera):	Master overcloud-controller-2
>   * Container bundle set: rabbitmq-bundle
> [cluster.common.tag/openstack-rabbitmq:pcmklatest]:
>     * rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	Started
> overcloud-controller-0
>     * rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	Stopped
> overcloud-controller-1
>     * rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	Started
> overcloud-controller-2
> ~~~
> 
>  3. All nova containers are in healthy condition.
>  4. nova-compute service is UP. 
>  5. MySQL related errors observed but didn't re-appear post containers
> restart/
> 
> 
> ~~~
> 2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines     result.read()
> 2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines   File
> "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 1349, in read
> 2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines     first_packet
> = self.connection._read_packet()
> 2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines   File
> "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 991, in
> _read_packet
> 2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines    
> packet_header = self._read_bytes(4)
> 2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines   File
> "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 1037, in
> _read_bytes
> 2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines    
> CR.CR_SERVER_LOST, "Lost connection to MySQL server during query")
> 2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines
> oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2013,
> 'Lost connection to MySQL server during query')
> 2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines [SQL: SELECT 1]
> 2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines (Background on
> this error at: http://sqlalche.me/e/e3q8)
> 2020-10-20 11:38:41.465 24 ERROR oslo_db.sqlalchemy.engines 
> ~~~

As you have mentioned, this error ^ is indicating a loss of connection to the database.

> [1]
> ~~~
>
> ...
>
> nova-compute
> 
> 
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager
> nova.exception.ResourceProviderUpdateFailed: Failed to update resource
> provider via URL
> /resource_providers/5812a7ab-f8e7-4c57-aba8-f283fedfd04b/inventories:
> {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server
> could not comply with the request since it is either malformed or otherwise
> incorrect.\n\n JSON does not validate: 0 is less than the minimum of 1 
> Failed validating 'minimum' in
> schema['properties']['inventories']['patternProperties']['^[A-Z0-
> 9_]+$']['properties']['total']:     {'maximum': 2147483647, 'minimum': 1,
> 'type': 'integer'}  On instance['inventories']['DISK_GB']['total']:     0 
> ", "code": "placement.undefined_code", "request_id":
> "req-7fc88053-8e43-4c18-9810-73ad05c3a993"}]}

This error ResourceProviderUpdateFailed ^ is saying that an attempt to update inventories on resource provider (compute node) 5812a7ab-f8e7-4c57-aba8-f283fedfd04b failed because of an attempt to set 'total' of resource 'DISK_GB' equal to 0. The request schema requires a minimum value of 1.

> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager 
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager During handling of the
> above exception, another exception occurred:
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager 
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager Traceback (most recent
> call last):
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File
> "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 8740, in
> _update_available_resource_for_node
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     startup=startup)
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File
> "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line
> 887, in update_available_resource
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager    
> self._update_available_resource(context, resources, startup=startup)
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File
> "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328,
> in inner
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     return f(*args,
> **kwargs)
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File
> "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line
> 972, in _update_available_resource
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager    
> self._update(context, cn, startup=startup)
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File
> "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line
> 1237, in _update
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager    
> self._update_to_placement(context, compute_node, startup)
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File
> "/usr/lib/python3.6/site-packages/retrying.py", line 68, in wrapped_f
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     return
> Retrying(*dargs, **dkw).call(f, *args, **kw)
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File
> "/usr/lib/python3.6/site-packages/retrying.py", line 223, in call
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     return
> attempt.get(self._wrap_exception)
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File
> "/usr/lib/python3.6/site-packages/retrying.py", line 261, in get
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager    
> six.reraise(self.value[0], self.value[1], self.value[2])
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File
> "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     raise value
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File
> "/usr/lib/python3.6/site-packages/retrying.py", line 217, in call
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     attempt =
> Attempt(fn(*args, **kwargs), attempt_number, False)
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File
> "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line
> 1215, in _update_to_placement
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     allocations=allocs)
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File
> "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line
> 1421, in update_from_provider_tree
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager    
> self.set_traits_for_provider(context, pd.uuid, pd.traits)
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File
> "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager    
> self.gen.throw(type, value, traceback)
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager   File
> "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line
> 1341, in catch_all
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager     raise
> exception.ResourceProviderSyncFailed()
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager
> nova.exception.ResourceProviderSyncFailed: Failed to synchronize the
> placement service with resource provider information supplied by the compute
> host.
> 2020-10-20 11:32:45.484 7 ERROR nova.compute.manager  
> ~~~

And while handling ResourceProviderUpdateFailed, the cache is cleared and then ResourceProviderSyncFailed is raised.

The bug is that the libvirt driver is reporting 'total' for 'DISK_GB' as 0 when the placement API does not allow this. This bug also exists on the master branch.

Instead, for all resources we need to check whether there is > 0 available and only report them if that is the case. Else, we should not include them in the PUT request, which will result in the resource inventory being removed from placement (as PUT does a full replace [2]), which is the way that placement is intended to work. 

We are already doing this for other resource classes like 'VCPU' [3].
 
[1] https://github.com/openstack/nova/blob/17a233cf77f811f278dfb4b64dd0b0d76ad4b719/nova/scheduler/client/report.py#L1322-L1341
[2] https://docs.openstack.org/api-ref/placement/?expanded=#update-resource-provider-inventory
[3] https://github.com/openstack/nova/blob/6a5e158756edc6c01bb23c26dcfc4c3a3df703dc/nova/virt/libvirt/driver.py#L7718

Comment 14 errata-xmlrpc 2021-03-17 15:33:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0817

Comment 15 errata-xmlrpc 2021-03-17 15:38:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0817

Note You need to log in before you can comment on or make changes to this bug.