Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1685766

Summary: Resource provider record for the removed compute node should be removed when scaling down compute nodes
Product: Red Hat OpenStack Reporter: Takashi Kajinami <tkajinam>
Component: documentationAssignee: Irina <igallagh>
Status: CLOSED DUPLICATE QA Contact: RHOS Documentation Team <rhos-docs>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: dcadzow, igallagh, mzheng, pkesavar
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-01 10:17:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Takashi Kajinami 2019-03-06 01:12:51 UTC
Description of problem:

In current product documentation, we remove service record for the compute node,
which we remove from the cluster[1], while we keep its resource provider record.

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/director_installation_and_usage/sect-Scaling_the_Overcloud#sect-Removing_Compute_Nodes

This remaining resource provider record causes problem, when we add a new node with the same hostname as a removed node.
The remaining resource provider record causes conflict when the new compute node tries to register its resource provider record,
and new compute node to report its status.

While we see the status of the new compute node is "up", we can not assign any instances to the compute node,
with errors caused by the above conflict situation.


How reproducible:

- Scale down compute nodes, and add a new node with the same hostname as the removed compute node


Steps to Reproduce:
1. Remove overcloud-compute-1 from the cluster, according to the doc

2. Add a new node with the same hostname as the removed node, using HostnameMap

example.
~~~
parameter_defaults:
  HostnameMap:
    overcloud-compute-2: overcloud-compute-0
~~~

3. now you see errors in nova-conductor on controller nodes and nova-compute on the new compute node

4. disable the other compute node and create a new instance, which should result in "No valid host found"

Actual results:

- users can't assign instances to the node without any error

Expected results:

- the added node can join to cluster and users can assign instances to the node without any error

Additional info:

Comment 1 Takashi Kajinami 2019-03-06 01:24:09 UTC
> 1. Remove overcloud-compute-1 from the cluster, according to the doc
Sorry, this should be overcloud-compute-0


The following example describes the error from nova-compute running on the added compute node.

~~~
2019-03-01 12:00:01.000 1 ERROR nova.scheduler.client.report [req-27cc5d5c-c1bf-4cf3-a454-d43765307948 - - - - -] [req-6193ae48-8f14-4e9f-b7ac-91c6806e01c9] Failed to create resource provider record in placement API for UUID 9e1448f1-2c20-4b16-9144-6f685ac9e3e1. Got 409: {"errors": [{"status": 409, "request_id": "req-6193ae48-8f14-4e9f-b7ac-91c6806e01c9", "detail": "There was a conflict when trying to complete your request.\n\n Conflicting resource provider name: overcloud-compute-0.localdomain already exists.  ", "title": "Conflict"}]}.
2019-03-01 12:00:02.000 1 ERROR nova.compute.manager [req-27cc5d5c-c1bf-4cf3-a454-d43765307948 - - - - -] Error updating resources for node overcloud-compute-0.localdomain.: ResourceProviderCreationFailed: Failed to create resource provider overcloud-compute-0.localdomain
~~~

Comment 4 Irina 2019-07-01 10:17:25 UTC
See 1591788 for details on the update made.

*** This bug has been marked as a duplicate of bug 1591788 ***