Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1693838

Summary: Scaling overcloud does not remove entries of Compute Services list or network agent list
Product: Red Hat OpenStack Reporter: coldford <coldford>
Component: openstack-tripleo-heat-templatesAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: Cédric Jeanneret <cjeanner>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.0 (Train)CC: aschultz, cjeanner, dcadzow, dpeacock, dvd, emacchi, jcoufal, kecarter, lyarwood, markmc, mburns, mcornea, mschuppe, pweeks, scohen
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: 16.0 (Train on RHEL 8.1)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.1-0.20191211120233.5ca908c.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-06 14:39:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1698021    
Bug Blocks:    
Attachments:
Description Flags
Scaledown log none

Description coldford@redhat.com 2019-03-28 18:45:44 UTC
Description of problem:
As per https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/director_installation_and_usage/sect-scaling_the_overcloud#sect-Removing_Compute_Nodes

When compute nodes are removed it necessary to manually clean up the service entries. Our client is requesting this be automated.

Version-Release number of selected component (if applicable):
OSP 10

How reproducible:
Every time


Steps to Reproduce:
1. Remove computes
2. Forced to clean up cruft service records

Actual results:
- Service entries require manual clean up.

Expected results:
- No manual clean up.

Comment 5 Alex Schultz 2019-05-14 17:37:30 UTC
We now have scale down tasks and have landed the code to run scale down actions on nova-compute https://review.opendev.org/#/c/653893/

Comment 13 Cédric Jeanneret 2019-12-11 14:04:14 UTC
Moving back to ON_DEV:
- overcloud with 10 computes, tried to drop compute-1, there's an issue because it seems to take compute-10 in addition
- same overcloud, dropped compute-9: services are still up there, "enabled" but "down"

Run log looks suspicious (see coming attachement) and, in addition, the /var/lib/mistral/<stack-name> is removed at some point, making debug complicated....

Comment 14 Cédric Jeanneret 2019-12-11 14:05:30 UTC
Created attachment 1643897 [details]
Scaledown log

This log was generated with the following command:
openstack overcloud node delete -y --stack overcloud-0 464817dd-e129-402a-93bf-dc2ed216471a 2>&1 | tee ~/scaledown-1.log

That's the only trace I can get for now since /var/lib/mistral/overcloud-0 is dropped :(.

Comment 15 Cédric Jeanneret 2019-12-11 14:38:40 UTC
Just created a new BZ for the ansible directory removal: https://bugzilla.redhat.com/show_bug.cgi?id=1782379

Comment 17 Cédric Jeanneret 2019-12-13 13:38:26 UTC
I just tried the new RPM, and it works fine!

Please note, only nova services are cleaned right now, because Network DFG didn't provide relevant code for the auto-cleanup.

Verification method:
- deploy Director and Overcloud using the provided t-h-t package (and some dependencies). Overcloud has 1ctrl + 2 computes
- remove one of the computes, and check the output of `openstack compute service list' using the overcloudrc env

Comment 19 errata-xmlrpc 2020-02-06 14:39:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0283