Description of problem: In current product documentation, we remove service record for the compute node, which we remove from the cluster[1], while we keep its resource provider record. [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/director_installation_and_usage/sect-Scaling_the_Overcloud#sect-Removing_Compute_Nodes This remaining resource provider record causes problem, when we add a new node with the same hostname as a removed node. The remaining resource provider record causes conflict when the new compute node tries to register its resource provider record, and new compute node to report its status. While we see the status of the new compute node is "up", we can not assign any instances to the compute node, with errors caused by the above conflict situation. How reproducible: - Scale down compute nodes, and add a new node with the same hostname as the removed compute node Steps to Reproduce: 1. Remove overcloud-compute-1 from the cluster, according to the doc 2. Add a new node with the same hostname as the removed node, using HostnameMap example. ~~~ parameter_defaults: HostnameMap: overcloud-compute-2: overcloud-compute-0 ~~~ 3. now you see errors in nova-conductor on controller nodes and nova-compute on the new compute node 4. disable the other compute node and create a new instance, which should result in "No valid host found" Actual results: - users can't assign instances to the node without any error Expected results: - the added node can join to cluster and users can assign instances to the node without any error Additional info:
> 1. Remove overcloud-compute-1 from the cluster, according to the doc Sorry, this should be overcloud-compute-0 The following example describes the error from nova-compute running on the added compute node. ~~~ 2019-03-01 12:00:01.000 1 ERROR nova.scheduler.client.report [req-27cc5d5c-c1bf-4cf3-a454-d43765307948 - - - - -] [req-6193ae48-8f14-4e9f-b7ac-91c6806e01c9] Failed to create resource provider record in placement API for UUID 9e1448f1-2c20-4b16-9144-6f685ac9e3e1. Got 409: {"errors": [{"status": 409, "request_id": "req-6193ae48-8f14-4e9f-b7ac-91c6806e01c9", "detail": "There was a conflict when trying to complete your request.\n\n Conflicting resource provider name: overcloud-compute-0.localdomain already exists. ", "title": "Conflict"}]}. 2019-03-01 12:00:02.000 1 ERROR nova.compute.manager [req-27cc5d5c-c1bf-4cf3-a454-d43765307948 - - - - -] Error updating resources for node overcloud-compute-0.localdomain.: ResourceProviderCreationFailed: Failed to create resource provider overcloud-compute-0.localdomain ~~~
See 1591788 for details on the update made. *** This bug has been marked as a duplicate of bug 1591788 ***