1685766 – Resource provider record for the removed compute node should be removed when scaling down compute nodes

Bug 1685766 - Resource provider record for the removed compute node should be removed when scaling down compute nodes

Summary: Resource provider record for the removed compute node should be removed when ...

Keywords:
Status:	CLOSED DUPLICATE of bug 1591788
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	documentation
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Irina
QA Contact:	RHOS Documentation Team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-03-06 01:12 UTC by Takashi Kajinami
Modified:	2020-01-21 20:10 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-07-01 10:17:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Takashi Kajinami 2019-03-06 01:12:51 UTC

Description of problem:

In current product documentation, we remove service record for the compute node,
which we remove from the cluster[1], while we keep its resource provider record.

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/director_installation_and_usage/sect-Scaling_the_Overcloud#sect-Removing_Compute_Nodes

This remaining resource provider record causes problem, when we add a new node with the same hostname as a removed node.
The remaining resource provider record causes conflict when the new compute node tries to register its resource provider record,
and new compute node to report its status.

While we see the status of the new compute node is "up", we can not assign any instances to the compute node,
with errors caused by the above conflict situation.

How reproducible:

- Scale down compute nodes, and add a new node with the same hostname as the removed compute node

Steps to Reproduce:
1. Remove overcloud-compute-1 from the cluster, according to the doc

2. Add a new node with the same hostname as the removed node, using HostnameMap

example.
~~~
parameter_defaults:
HostnameMap:
overcloud-compute-2: overcloud-compute-0
~~~

3. now you see errors in nova-conductor on controller nodes and nova-compute on the new compute node

4. disable the other compute node and create a new instance, which should result in "No valid host found"

Actual results:

- users can't assign instances to the node without any error

Expected results:

- the added node can join to cluster and users can assign instances to the node without any error

Additional info:

Comment 1 Takashi Kajinami 2019-03-06 01:24:09 UTC

> 1. Remove overcloud-compute-1 from the cluster, according to the doc
Sorry, this should be overcloud-compute-0


The following example describes the error from nova-compute running on the added compute node.

~~~
2019-03-01 12:00:01.000 1 ERROR nova.scheduler.client.report [req-27cc5d5c-c1bf-4cf3-a454-d43765307948 - - - - -] [req-6193ae48-8f14-4e9f-b7ac-91c6806e01c9] Failed to create resource provider record in placement API for UUID 9e1448f1-2c20-4b16-9144-6f685ac9e3e1. Got 409: {"errors": [{"status": 409, "request_id": "req-6193ae48-8f14-4e9f-b7ac-91c6806e01c9", "detail": "There was a conflict when trying to complete your request.\n\n Conflicting resource provider name: overcloud-compute-0.localdomain already exists.  ", "title": "Conflict"}]}.
2019-03-01 12:00:02.000 1 ERROR nova.compute.manager [req-27cc5d5c-c1bf-4cf3-a454-d43765307948 - - - - -] Error updating resources for node overcloud-compute-0.localdomain.: ResourceProviderCreationFailed: Failed to create resource provider overcloud-compute-0.localdomain
~~~

Comment 4 Irina 2019-07-01 10:17:25 UTC

See 1591788 for details on the update made.

*** This bug has been marked as a duplicate of bug 1591788 ***

Note You need to log in before you can comment on or make changes to this bug.